Avian Visual Cognition

 Home Page

Next Section: A Synthetic Approach Using Natural Stimulus Classes



IV. A Prototypical Point Of View

According to prototype theory, categorization is accomplished by the acquisition of a prototypical representation of a category via a form of abstraction process. The prototypical representation is assumed to be a summary representation that corresponds to the 'central tendency', such as the arithmetic mean (Posner, 1969) or the mode (Neumann, 1977), of all the exemplars that have been experienced. New exemplars can then be classified on the basis of their similarity to this 'best example'. Consider, for example, what happens when one is asked to imagine a category such as 'tree'. Usually a typical instance of this category is immediately called to mind. The experimental investigation of this prototype effect began with Attneave (1957), and lead to the prototype view becoming as firmly established in the field of human cognitive psychology as exemplar or feature theories (see Smith & Medin, 1981; Medin & Smith, 1984; Homa, 1984, for a review).

In contrast, evidence in support of the prototype theory is rather poor and contradictory in pigeons. While the first attempts to demonstrate prototype effects produced negative results (Pearce, 1989; Watanabe, 1988), more recent attempts yielded positive findings (Aydin & Pearce, 1994; Huber & Lenz, 1996; White, Alsop & Williams, 1993). However, this does not warrant the conclusion that pigeons are able to abstract prototypes. There is also little agreement concerning the specific type of representation involved in prototypical categorization, and the specific conditions that facilitate categorization. The common denominator for all research in this field is a demonstration of the prototype effect. The prototype effect refers to the fact that the prototype, or exemplars that bear a close resemblance to it, are instantaneously classified. Sometimes even more readily than those exemplars of the same category that have already been experienced.

In an experiment, which we conducted to investigate this issue (Huber & Lenz, 1996), our primary aim was to determine what conditions facilitate prototype formation, rather than to provide empirical support for the 

existence of some kind of abstract representation. An examination of the tasks that have failed to elicit prototype formation in pigeons suggests that category structure is crucial in determining whether pigeons store associations with  instances or features, or whether they abstract prototypes. We adopted the classical "distortion rule" used in human prototype experiments (e.g. Attneave, 1957; Fried & Holyoak, 1984; Homa & Chambliss, 1975; Posner & Keele, 1968) to obtain patterns which varied around a prototype. For rules of this type, the prototype represents a kind of average or central tendency of the distortions. In the archetypal experiment, Posner and Keele (1968) showed that humans readily identify a prototype dot figure (a triangle) underlying a series of distorted patterns of dots (in Figure 17 to the left, numbers represent the level of distortion given in bits/dots). As a result, the more similar a novel pattern was to a category prototype, the easier it was to classify.

Pigeons appear unable to abstract prototypes in this manner. After having successfully discriminated triangular patterns of dots, of a given distortion level, from arbitrary dot patterns, the pigeons responded to a novel complete triangle less, rather than more, vigorously (Watanabe, 1988). Note that this behavior can easily be described as generalization from S+. Pearce (1989), who required pigeons to discriminate between artificial polymorphous classes comprised of 'histogram patterns', has provided further negative evidence. Three vertical bars, of varying heights, were used as the defining feature dimensions, in that their combined height defined their category membership. Pigeons that were trained on patterns with the combined height of 15 and 9 were then tested with those patterns that corresponded to the class average and patterns that corresponded to the extremes of the short-tall continuum. In contrast to the predictions of prototype theory, the pigeons' spontaneous classification of the extreme patterns was superior to that of the prototype patterns. 

The same pattern of results, namely greater responding to extreme values than to any other instance along a continuum from positive to negative classes, was also found in our faces study (Huber & Lenz, 1993), and in von Fersen and Lea's (1990) study. In principle, this pattern of results also corresponds to the 'super-releaser effect' found in many classical ethological examinations of the sign stimulus eliciting fixed-action patterns (e.g., Tinbergen, 1951). We have interpreted this 'supernormal stimulus effect' in terms of feature learning, rather than in terms of the acquisition of an abstract representation of the category. However, those experiments, in which extreme values of the underlying feature rule failed to evoke 'supernormal' classification behavior (cf. Jitsumori, 1993; Lea & Harrison, 1978), have also failed to show any signs of a prototype effect. Therefore, they have been more parsimoniously explained in terms of learning about the entire patterns or some combination of features. 

We suggested that the abstraction of prototypes is facilitated only when polymorphous stimulus classes are composed of many similar stimuli that share a "point of departure" or origin, but that cannot be separated in terms of the simple combination of feature dimensions. We sought to design a prototypical categorization task that fulfilled both of the following requirements:

  • There is no opportunity to discriminate among classes by the adoption of a linear feature rule
  • Stimulus classes are very large so as to impede exemplar memorization. 

The first requirement can be satisfied by designing stimulus classes that have features in common, but occupy regions in a multidimensional stimulus space that are not linearly separable. For instance, one cannot decide--as in the Huber and Lenz (1993) study--whether a Brunswik face is positive by summing up the values of the eye, brow, nose, and chin feature. The classes were designed using a distortion principle and thus occupied highly overlapping regions in the multidimensional feature space. Because the experimental design should favor the abstraction of categorical information rather than learning individual instances by rote, however, classification had to be facilitated on the basis of the central tendency of the global stimulus distribution. In other words, correct classification decisions should be determined not by reference to an undetermined number of previously experienced training stimuli (exemplar view) but by reference to the class prototype. A popular formulation of the latter strategy is the "distance rule of classification" (Posner, 1969; but also see Reed, 1972). According to this rule, in our study (Huber & Lenz, 1996) the criterion for class membership was the  differential similarity distance from the prototype (that I shall label in geometric terms "radius"), measured by summing up the differences on every feature dimension. Positive stimuli are equally similar and close to Click Here To View Figure 18 the prototype; negative  stimuli are equally dissimilar and far from it. If imagined in three-dimensional space, the stimulus classes might be seen as hollow concentric spheres with different radius lengths around a common center (Figure 18).Despite the typicality structure of the stimulus space, pattern learning is still possible. Only a kind of trial-unique stimulus presentation design (Bhatt et al., 1988; Wright et al., 1988), in which the repetition of stimuli during training is minimized, is likely to inhibit pattern learning. We, therefore, created a stimulus library that includes several thousands of patterns. In order to enable the results of this experiment to be compared with those of Huber & Lenz (1993), we used the same stimuli as in our former study, but only increased the degree of variability. 

The distortion rule to create two 'prototypical' categories

In Huber and Lenz (1993), each of the features eye, brow, nose, chin of the Brunswik faces could take on one of three values (-1, 0, +1). Here (Huber & Lenz, 1996), each feature could take one of nine values (-4 to +4), and also differed on each dimension in equal steps. This procedure generated 6561 (94) different patterns. As in the classical prototype studies, the primary criterion for class assignment was the similarity distance of a pattern from the prototype (P) or, in the language of standard terminology, the distortion level. Distortion levels were manipulated by varying the similarity distance between sets of stimuli and a chosen prototype. Similarity distance was defined by the additive combination of the difference between the values of all four feature dimensions (a city block metric). For example, when comparing a pattern containing the medium value on each dimension (i.e., 0, 0, 0, 0), with one containing extreme values (i.e., +4, -4, +4, -4), we obtain the a priori overall similarity by summing the absolute difference between each dimension (i.e., |4| + |4| + |4| + |4| = |16|). Using this 'transparent' distortion procedure, we were able to specify 'distortion' according to well-known feature dimensions, rather than by arbitrarily moving dots. In other words, it was possible to compare two faces by finding that one has, for example, a larger nose and more closely spaced eyes than the other, rather than by comparing two arbitrary dot patterns. I will return to this issue later when discussing our findings. 

The positive category (POS), in which all stimuli were positively correlated with reinforcement, was defined by a distance equal to 4 (according to our similarity measure, see sidebar). The negative category (NEG), in which all stimuli were negatively correlated with reward, was defined by a distance equal to12. As a result of selecting two different patterns as prototypes, slightly different categorization tasks emerged. For one group (the Prototype Identical Group, PIG), the prototype was designated as 0, 0, 0, 0. If we average across all positive stimuli shown in training (n=192), as well as across all negative stimuli shown in training (n=528), the resulting arithmetic mean is identical for both classes, namely again 0, 0, 0, 0. For the other group (the Prototype Different Group, PDG), the prototype was designated as -2, +2, +2, -2. As a consequence of the asymmetrical feature distribution in the second group (a lower frequency of negative values for the features eye and chin and a higher frequency of positive values for the features brow and nose), averaging across the feature dimensions of all negative patterns shown in training (n=607) resulted in an arithmetic mean that was slightly different than that of the positive category (n=164). This enabled us to investigate whether it is easier to divide classes on the additional information of different prototypes than on the differing distance from a single prototype alone.

Training involved the presentation of arbitrarily chosen members of the two distortion levels. The implementation of a random generator function in the program that created and presented the faces permitted the use of a presentation order that was out of the control of the experimenter. The difficulty of this task meant that the training of the five pigeons lasted several months. Therefore, a high number of the stimuli from each category (several hundreds) were shown to the birds. The discrimination procedure used was in general the same as that used by Huber and Lenz (1993). 

In contrast to all the birds used in the former study, only three of five pigeons achieved a satisfactory discrimination level (r = 0.8) within 50 sessions of 40 trials each. The two subjects that failed to solve the original classification task were subjected to a single-negative condition involving a single negative pattern rather than a negative class comprised of hundreds of patterns. This task was quickly solved.

The most interesting part of this experiment was the birds' performance during Click here to view Figure 19generalization tests, which involved patterns that varied according to their distance from the prototype. Inspection of Figure 19 suggests the task to be a middle-value discrimination representing a band of positive stimuli surrounded by a band of negative stimuli on a single dimension. The most interesting question concerned the form of the generalization slope at the stimulus region inside category POS. A prototype effect (but also a peak shift effect, see below) would occur if the slope continues to rise in the center of the distance dimension, whereas orderly generalization around S+ would be found if theClick here to view Figure 20 slope falls. In Figure 20, the peck rate to individual stimuli is plotted against their distance from the prototype (radius). In contrast to the findings of Watanabe (1988), stimuli positioned close to the prototype (marked in red) elicited even more responses than the positive training stimuli at Radius 4. More generally, according to the distance-from-prototype model, there is a significant negative correlation between response rate and radius; the greater the difference between the transfer stimuli and the prototype, the less frequently the subjects pecked at the response key. This extremely orderly correlation (Pearson product-moment correlations between -0.906 and -0.975) confirms that a large proportion of the variance in pecking rate could be accounted for by its linear relationship with the distance dimension.

As in our former study, how strongly the individual face features controlled classification performance merits attention. This is particularly valuable with respect to the Prototype Different Group. In this group, the different prototypes to which the classes were anchored led tClick here to view Figure 21o the unequal distribution of feature values across classes (see the colored bars in Figure 21). Consequently, there was a strong correlation between class membership and attribute values. For example, a negative value for features 'e' and 'c' occurred approximately 1300 times in instances of the positive class, whereas a positive value for these features occurred only about 50 times as a signal for food. Simple linear feature rules, such as "if eye distance is short then peck", would make category discrimination possible. By and large, the combination of only two such feature rules would guarantee significant (r>0.68) classification ability.

Control of the pigeons' pecking behavior by independent features was examined  by analyzing response rates to stimuli as a function of their feature value (see the colored lines in Figure 21). As found in the previous study (Huber & Lenz, 1993), a linear feature model predicts a linear relationship between response rate and feature value, with extreme values eliciting both the highest and the lowest number of pecks. In contrast, a prototype model would predict that responding would be most pronounced in the presence of the prototypical feature attributes (for Bird CG2 the values -2 and +2, respectively). Clearly, the data support a prototype model rather than a feature model.

Interestingly, the pecking behavior of the subjects that were successfully trained Click Here To View Figure 22 with a single negative pattern (SNG) produced a different pattern of results (Figure 22). In contrast to the subjects of the Category Group, classification can be best described in terms of a linear feature rule; responding was most pronounced in the presence of the extreme (-4 and +4, respectively), rather than the prototypical, values. These different findings alone confirm the importance of considering category structure as an essential factor in categorization.

In our original paper (Huber & Lenz, 1996) we argued at some length that all three models of human categorization could account for some part of the present findings. Let me summarize the advantages and disadvantages of each theory.

A Feature Account

The clearest evidence for feature learning comes from the birds belonging to the Single Negative Group. A linear feature mechanism of categorization easily accommodates their performance. The birds learned to combine--in additive manner--the information from all four feature dimensions until their judgments corresponded to a discriminant function that accurately predicted class membership. The ease with which these previously unsuccessful birds solved this task suggests that classes separated by a linear discriminant function are easy to identify. The pattern of responding shown by the remaining birds in this study also fits with the basic tenets of feature theory. Classification behavior may be correlated, not with the features as dimensions, but with the reinforcement rates of  their discrete values. In order to explain the bell-shaped generalization functions around the prototypical feature attributes (see Figure 21) it is only necessary to consider that the prototypical feature attributes on each dimension are those that have occurred most frequently on reinforced trials, whereas the remote values on each dimension are correlated with non-reinforcement. This explanation is not only consistent with the Rescorla-Wagner (1972) theory, but also with frequency theories of feature abstraction and concept learning in the human literature (e.g. Goldman & Homa, 1977; Neuman, 1974).

An Exemplar Account

As in the case of feature theory, the amount of data that can be explained by the exemplar view depends upon which version is applied. The simple notion that a novel stimulus is classified on the basis of its degree of similarity to all stored training stimuli is hard to defend, since the multidimensional average of the positive class was either identical (PIG), or highly similar (PDG), to the average of the negative class. This means that comparisons would balance out and the classes would become indiscriminable. A modified exemplar view based on Pearce's configural theory can also account for the present findings in much the same way as it accounted for Aydin and Pearce's (1994) findings. It rests on the possibility that an increased response rate to novel stimuli near the positive prototype may be caused by decreased inhibitory generalization from members of the nonreinforced category. The contrary effect, namely a reduction in response frequency during novel stimuli more distant from the prototype than the negative training patterns, may be caused by a loss of excitatory generalization. Following Spence's (1937) theory of gradient interaction, the prototype effect may be caused by a peak shift (Hanson, 1959; Purtle, 1973; Rilling, 1977).

The peak shift effects reported in the literature, however, clearly indicate that the distance over which the generalization peak shifts away from the S+ value, is rather short. While it may be difficult to estimate whether the prototype value lies within this distance, a negative peak shift is clearly absent. For both birds, CG2 and  CG6, the decreasing trend of response rates away from S- (marked green in Figure 20) continued more than twice the distance between S- and S+ (marked blue). Finally, the generalization gradients found in the present study were quite different from the ones obtained by Watanabe (1988) in a similar experiment. If generalization from stored S+ and S- exemplars is a general classification strategy for pigeons, then why did Watanabe's pigeons not demonstrate a steady response increase towards the prototype?

A Prototype Account

A more coherent account of the transfer performance of the three successful birds is offered by prototype theory. We only need to propose the acquisition of a linear distance rule as a predictor of class membership in order to explain the strong correlation between distance from the standard pattern and response rate. Furthermore, this model is the most parsimonious in terms of memory load and most economical in terms of determining membership. A pigeon must keep in memory, for comparison with a novel unseen exemplar, only the (single) representation of the central tendency of the previously experienced class instances. In conclusion, this dual classification mechanism will cope easily with large, open-ended stimulus classes that show a polymorphous and typological (dense center--sparse periphery) structure.

Next Section: A Synthetic Approach Using Natural Stimulus Classes