Young and Wasserman Entropy Discussion

The stronger transfer to novel displays that we obtained with stimuli that were even more complex than those used in most prior research raises the intriguing possibility that the larger number of items actually helped our pigeons discriminate Same from Different displays and acquire an abstract same-different concept. Young, Wasserman, and Garner (1997) recently showed that this possibility is a reality. In tests of arrays comprising fewer than 16 items -- both after initial training with 16-item arrays (Experiment 1) and during acquisition itself (Experiment 2) -- the pigeon's discrimination of Same from Different arrays deteriorated as the number of items was reduced. Why? An understanding of variability will provide the framework for an answer.

Variability in a continuous variable is typically quantified by variance. A 16-icon display, however, represents the frequency distribution of a categorical variable -- icon type. For each display, there are 16 possible icon types, each of which may have a frequency ranging from 0 (not present) to 16 (the only type present). When a pigeon responds to an array with an intermediate degree of variability, it must determine whether the frequency histogram represented by the array is more similar to those represented by Same arrays or to those represented by Different arrays. Intuitively, we recognize that the Same arrays exhibit the lowest possible variability and that the Different arrays exhibit the highest possible variability. But, intuition is an inadequate way to quantify the variability of the wide range of possible variability levels. Information theory (Shannon & Weaver, 1949) offers a simple metric -- entropy -- that nicely accomplishes that end.

Entropy measures the amount of variety or diversity in a categorical variable by a weighted average of the number of bits of information that are required to predict each of the categories of the variable. Rare or low frequency categories convey a great deal of information (i.e., they are very important), whereas common categories convey very little information (i.e., they are less important). Predicting the category of a variable that is observed to have only one value is easy and requires no information: entropy is zero. When all of the values of a categorical variable are equally likely, entropy is maximal for the given number of observed categories. Our Same and Different arrays represent these two endpoints of the entropy dimension: the Same arrays have minimal entropy and the Different arrays have maximal entropy for the observed categories.

To quantify entropy, we used the following equation (Shannon & Weaver, 1949): Entropy Equation

where H(A) is the entropy of categorical variable A, a is a category of A, and p_a is the proportion of observed values within that category. When a display has 16 identical icons, there is only one category with a probability of occurrence of 1.0. Because log₂ (1.0) = 0.0, the entropy of the Same displays is 0.0. The Different displays consist of one occurrence of each of 16 icons or categories, yielding an entropy of: -.0625 * log₂(.0625) * 16, or 4.0. During testing, the pigeons could thus have responded to a display based on whether its entropy is closer to 0.0 or to 4.0.

We have provided an Entropy Calculator (in the form of a Java applet) that you can use to calculate the entropy of arrays involving up to 16 categories. Simply enter the frequency of occurrence for each category and press the "Calculate" button to find the entropy. For example, to determine the entropy for a 4D/12S array, enter "1" next to the first four categories and "12" for the fifth category.

en we fit our pigeons' properly scaled responding to the entropy measure, the accord was remarkably good (Figure 9); entropy accounted for 85% of the variance due to display type across three separate experiments (each involving four subjects) in which entropy was changed in three entirely different ways. When it was compared to many other predictors (the number of icon types in a display, the frequency of the most common icon type, the frequency of the least common icon type, and the mean icon frequency across all icon types), entropy eclipsed the explanatory power of each of these other perfectly plausible predictors (see the following table).

Predictor	R²
Entropy	.85
Number of icon types	.70
Frequency of most prevalent icon	.60
Frequency of least prevalent icon	.05
Mean frequency of icon types	.63

Note: The data fits did not include the trained Same and Different arrays. All rival fits were significantly inferior to the Entropy fit, p < .001.

Interpreting same-different classification as entropy detection also explained the pigeon's superior discrimination of 16-item Same from Different arrays compared to their discrimination of arrays comprising fewer items. When pigeons are trained to discriminate 16-icon Same from Different arrays, they should learn to make one response to displays with an entropy of 0.0 and a second response to displays with an entropy of 4.0. During testing, a bird would then be expected to distribute its responses to novel arrays as a function of their entropy; arrays with entropies closer to 0.0 should be more likely to be classified as "same," whereas arrays with entropies closer to 4.0 should be more likely to be classified as "different." The entropy of Same displays is 0.0 regardless of the number of items within the display (there is only one category in the display no matter how many items are present). Thus, 2-item Same displays should be classified as "same" after initial training with 16-item Same and Different displays. The entropy of Different displays, however, is reduced as the number of display items is decreased (the display comprises fewer and fewer categories). Thus, 2-item Different displays (which have an entropy of 1.0) should be judged to be more similar to 16-item Same displays (which have an entropy of 0.0) than to 16-item Different displays (which have an entropy of 4.0). These counterintuitive predictions were confirmed in Experiment 1 of Young, Wasserman, and Garner (1997). After initial training with 16-item Same and Different arrays, the pigeon's classification of Same arrays was unchanged across a range of smaller array sizes, whereas Different arrays were increasingly likely to be classified as "same" as the number of items was reduced; indeed, 2-item Different arrays were consistently and otherwise inexplicably classified as "same."

The smaller entropy difference between 2-item Same and Different displays (0.0 vs. 1.0) and the larger entropy difference between 16-item Same and Different displays (0.0 vs. 4.0) also led us to expect that acquisition of the same-different discrimination would actually be more difficult with fewer items. As noted earlier, in Experiment 2 of Young, Wasserman, and Garner (1997), we found just that effect of item number. These results are thus an important and possibly counterintuitive confirmation of the pigeon's use of entropy in the classification of complex visual displays.