III. An Information-Theoretic
Account of Same-Different Discrimination
The stronger transfer to novel displays that we obtained with
stimuli that were even more complex than those used in most prior
research raises the intriguing possibility that the larger number
of items actually helped our pigeons discriminate Same
from Different displays and acquire an abstract same-different
concept. Young, Wasserman, and Garner (1997) recently showed that this possibility
is a reality. In tests of arrays comprising fewer than 16 items
-- both after initial training with 16-item arrays (Experiment
1) and during acquisition itself (Experiment 2) -- the pigeon's
discrimination of Same from Different arrays deteriorated as
the number of items was reduced. Why? An understanding of variability
will provide the framework for an answer.
Variability in a continuous variable is typically quantified
by variance. A 16-icon display, however, represents the
frequency distribution of a categorical variable -- icon
type. For each display, there are 16 possible icon types, each
of which may have a frequency ranging from 0 (not present) to
16 (the only type present). When a pigeon responds to an array
with an intermediate degree of variability, it must determine
whether the frequency histogram represented by the array is more
similar to those represented by Same arrays or to those represented
by Different arrays. Intuitively, we recognize that the Same
arrays exhibit the lowest possible variability and that the Different
arrays exhibit the highest possible variability. But, intuition
is an inadequate way to quantify the variability of the wide
range of possible variability levels. Information theory (Shannon
& Weaver, 1949) offers a simple metric -- entropy
-- that nicely accomplishes that end.
Entropy measures the amount of variety or diversity in a categorical
variable by a weighted average of the number of bits of information
that are required to predict each of the categories of the variable.
Rare or low frequency categories convey a great deal of information
(i.e., they are very important), whereas common categories convey
very little information (i.e., they are less important). Predicting
the category of a variable that is observed to have only one
value is easy and requires no information: entropy is zero. When
all of the values of a categorical variable are equally likely,
entropy is maximal for the given number of observed categories.
Our Same and Different arrays represent these two endpoints
of the entropy dimension: the Same arrays have minimal
entropy and the Different arrays have maximal entropy
for the observed categories.
To quantify entropy, we used the following equation (Shannon
& Weaver, 1949):
where H(A) is the entropy of categorical variable A,
a is a category of A, and pa is the
proportion of observed values within that category. When a display
has 16 identical icons, there is only one category with a probability
of occurrence of 1.0. Because log2 (1.0) = 0.0, the
entropy of the Same displays is 0.0. The Different displays consist
of one occurrence of each of 16 icons or categories, yielding
an entropy of: -.0625 * log2(.0625) * 16, or 4.0.
During testing, the pigeons could thus have responded to a display
based on whether its entropy is closer to 0.0 or to 4.0.
We have provided an Entropy Calculator (in the form of a Java applet)
that you can use to calculate the entropy of arrays involving
up to 16 categories. Simply enter the frequency of occurrence
for each category and press the "Calculate" button
to find the entropy. For example, to determine the entropy for
a 4D/12S array, enter "1" next to the first four categories
and "12" for the fifth category.
When we fit our pigeons' properly scaled
responding to the entropy measure, the accord was remarkably
good (Figure 9); entropy accounted for 85% of the variance
due to display type across three separate experiments (each involving
four subjects) in which entropy was changed in three entirely
different ways. When it was compared to many other predictors
(the number of icon types in a display, the frequency of the
most common icon type, the frequency of the least common icon
type, and the mean icon frequency across all icon types), entropy
eclipsed the explanatory power of each of these other perfectly
plausible predictors (see the following table).
Predictor
|
R2
|
Entropy |
.85 |
Number of icon types |
.70 |
Frequency of most prevalent icon |
.60 |
Frequency of least prevalent icon |
.05 |
Mean frequency of icon types |
.63 |
Note: The data fits did not include
the trained Same and Different arrays. All rival fits were significantly
inferior to the Entropy fit, p < .001.
Interpreting same-different classification as entropy detection
also explained the pigeon's superior discrimination of 16-item
Same from Different arrays compared to their discrimination of
arrays comprising fewer items. When pigeons are trained to discriminate
16-icon Same from Different arrays, they should learn to make
one response to displays with an entropy of 0.0 and a second
response to displays with an entropy of 4.0. During testing,
a bird would then be expected to distribute its responses to
novel arrays as a function of their entropy; arrays with entropies
closer to 0.0 should be more likely to be classified as "same,"
whereas arrays with entropies closer to 4.0 should be more likely
to be classified as "different." The entropy of Same
displays is 0.0 regardless of the number of items within the
display (there is only one category in the display no matter
how many items are present). Thus, 2-item Same displays should
be classified as "same" after initial training with
16-item Same and Different displays. The entropy of Different
displays, however, is reduced as the number of display items
is decreased (the display comprises fewer and fewer categories).
Thus, 2-item Different displays (which have an entropy of 1.0)
should be judged to be more similar to 16-item Same displays
(which have an entropy of 0.0) than to 16-item Different displays
(which have an entropy of 4.0). These counterintuitive predictions
were confirmed in Experiment 1 of Young, Wasserman, and Garner (1997). After
initial training with 16-item Same and Different arrays, the
pigeon's classification of Same arrays was unchanged across a
range of smaller array sizes, whereas Different arrays were increasingly
likely to be classified as "same" as the number of
items was reduced; indeed, 2-item Different arrays were consistently
and otherwise inexplicably classified as "same."
The smaller entropy difference between 2-item Same and Different
displays (0.0 vs. 1.0) and the larger entropy difference between
16-item Same and Different displays (0.0 vs. 4.0) also led us
to expect that acquisition of the same-different discrimination
would actually be more difficult with fewer items. As noted earlier,
in Experiment 2 of Young, Wasserman, and Garner
(1997), we found just that effect of item number. These results
are thus an important and possibly counterintuitive confirmation
of the pigeon's use of entropy in the classification of complex
visual displays.
Next section: Successive
Same-Different Discrimination