V.
A Synthetic Approach Using Natural Stimulus Classes
Complex
Visual Classes
Research on pigeon categorization
has produced ample evidence in favor of all three theories of categorization:
exemplar, feature, and prototype theory. The most pessimistic conclusion
might be that the artificial stimuli, or the artificial feature combinations,
devised by the experimenter produced experimental artifacts in the pigeons'
behavior. The birds may have simply learned "a set of nonsense stimuli
associated with reward rather than relying on the type of classification
they may use outside the test chamber" (Watanabe et al., 1993; p. 372).
In their natural visual worlds, pigeons are faced with objects rather than
with the simplistic stimuli used in psychophysical experiments (Fetterman,
1996). It is very likely that the current inability to specify the kind
of features extracted by pigeons from objects or complex scenes, is the
main reason for the success of associative theories. In order to provide
evidence for either of the other two categorization theories, a precise
knowledge of the information that enters the cognitive process of feature
integration would be required. However, since the perceptual processes
by which the patterns of excitation in sensory nerves are transformed into
stable representations of objects and classes are unspecified, neither
categorization model has a chance of surviving.
Many of the above experiments have
been interpreted in terms of human language concepts. The same cognitive
revolution that earlier washed across the study of human learning had also
a (delayed) impact on the study of animal behavior. In that era, comparative
psychologists were tempted to employ a high-level semantic framework to
describe complex classes appropriately. From the bird's point of view,
however, even these semantic classes may have allowed a solution to the
classification in terms of a few global dimensions of invariance, detected
by surveying the entire scene. The basic stimulus aspects to which the
pigeon is pre-adapted when performing complex categorization tasks still
remains open to question. This lack of knowledge is not a sufficient reason
for supposing "that pigeons are doing anything more complex than associating
a large number of pictures and/or the features they contain with a reward,
and then showing transfer to new pictures to the extent that they contain
features previously associated with a reward" (Mackintosh, 2000, p.
125).
The Role
of Texture and Two-Dimensional Shape in Images of Human Faces
It is a paradox that, despite the
impressive progress physiologists and psychologists have made in understanding
the pigeon's visual capacities, little progress has been made in understanding
the role played by of the most fundamental properties of any object in
the environment: surface and space (from here on called "texture" and "shape").
Objects are described predominantly in terms of their morphology, or the
shape and the geometry of their components, while their surface properties
are often ignored. Only color and overall luminous flux have been considered
in a post-hoc analysis (Lubow, 1974). Texture is mostly related to the
surface properties of aerial photographs, landscapes or industrial materials.
Furthermore, it has been reported that pigeons perceive textured displays
much like humans, i.e. with the preattentative global perception of contrasting
textural regions (Cook, 1992a; Cook, 1992b; Cook, 1993; Cook, Cavoto &
Cavoto, 1995; Cook, Cavoto & Cavoto, 1996). We know of no attempt to
investigate the role of texture in natural categorization tasks, only some
vague speculations that texture might be a cue controlling some discriminations
(e.g. in Lubow, 1974; Cook et al., 1990; Jitsumori & Ohkubo, 1996).
One reason for this may be the difficulty
in finding the appropriate stimuli. On the one hand, the stimuli should
be sufficiently complex to contain both shape and texture information relevant
to and diagnostic for class membership. On the other hand, the stimuli
should be sufficiently simple to enable the experimenter to control the
amount of texture and shape information. In a series of recent experiments
(Loidolt et al., 1997; Troje et al., 1998; Troje et al., 1999; Huber et
al., submitted.) we used human faces as stimuli and the concept "sex" to
define class membership.
The stimuli were selected for three reasons:
1) "Male" and "female" are two categories that reflect natural
stimulus variation, and evolved to be classified correctly. Although it
is not easy to quantify the differences between them, male and female faces
are sufficiently different to be easily discriminated by humans (96% correct;
Bruce et al., 1993) and artificial neural networks (98% correct; Troje
& Vetter, 1998).
2) Pigeons are naive with respect to the task of classifying human faces
according to sex, although they have undoubtedly experienced them. Training
is thus completely under the control of the experimenter.
3) Human faces provide complex variation in terms of both texture and
shape.
In our initial experiment, we compared the classification performance of
pigeons presented with different versions of the same set of stimuli. The
stimuli could be distinguished according to their texture and shape information,
and were derived from laser scanned models of the faces of 100 men and
100 women (Troje & Buelthoff, 1996). The faces were free from any kind
of accessories such as glasses or earrings. The men were carefully shaven
and the hair on the head was digitally removed from the 3D-models. The
200 faces were randomly sdivided
into two sets (A and B), each containing 50 male and 50 female faces. Group
O were shown the original images (see
a sample in Figure 23), while Groups T and S were shown images
only
after they had been subjected to a technique described in Vetter &
Troje (1997) which involved separating the texture and shape components
of each image. Group T
was shown images generated by combining the original texture of each face
with an average shape. This yielded an image set that varied with
respect to texture but not shape (see
a sample in Figure 24). Group S
was shown images generated by combining the original shape of each face
with an average texture, which yielded an image set that varied with respect
to shape but not texture (see
a sample in Figure
25).
The 100 faces of Set A shown to Group O are simultaneously depicted in Figure
26. See also the homepage
of N. F. Troje to learn more about the correspondence-based
representations of faces.
The results of this experiment indicated
that Groups O and T learned very
quickly and accurately to discriminate faces, whereas Group S failed to
do so (Figure 27).
Comparing the overall performance at the end of training (i.e. the last
16 presentations of each image), revealed that all of the Group O (min
r = 0.865; mean r = 0.951) and Group T (min r = 0.870; mean r = 0.913)
subjects distinguished between the classes very well, while only three
of the Group S subjects achieved r values of greater than 0.86 (mean of
Group S: r = 0.713). These results suggest that pigeons are extraordinarily
sensitive to texture differences, but that they find it very difficult
to discriminate shapes.
The ability to generalize between
stimuli is a widely used measure of whether an open-ended categorization
capacity has been acquired. Examining their spontaneous responses to novel
images of human faces tested the generalization ability of pigeons. The
pigeons were presented with the 100 images from the previously unseen set
of images. If the pigeons were using the information that generally divides
male and female faces, then generalization to novel patterns should be
easy. However, if pigeons were not using this information then generalization
should be impaired. The results of this experiment indicated that the subjects
assigned to Groups O and T generalized to novel faces. The 100 test patterns
were divided into male and female at a level significantly beyond 0.001
for each bird (Mann-Whitney U-test), and response rates to the test
patterns were nearly identical to response rates to the training patterns
shown in these sessions (Figure
28). However, subjects assigned to Group S failed to show good transfer.
Only those three birds that eventually mastered the original task transferred
to the novel patterns, while the other five subjects failed to do so. The
results of this transfer test support the conclusion that learning was
not specific to the original training patterns; the 50 novel male and female
faces lay within the pigeons' classification schemes. Even the three Group
S subjects acquired a rule for separating male and female faces.
The Role
of Low-Level Features in Images of Human Faces
The stimulus parameters that controlled
the performance of successful subjects remain to be determined. Male and
female faces differ both in average size and in average intensity. Female
faces are generally smaller and brighter than male faces. Therefore, we
computed the rank correlation between pecking rate to individual faces
and either the average size or the average intensity of these images. The
five parameters that describe the texture of images (energy, contrast,
entropy, homogeneity, and hurst) as well as the three components that describe
their color (red, green, and blue) were also quantified. In order to exclude
the partial correlation between pecking rate and sex that was not due to
the parameter under investigation, we computed the correlation separately
for male and female faces. The pecking rates of almost all Group O and
Group T subjects correlated significantly with intensity (R>0.533; P<0.001),
but not with any of the other texture parameters or size. Pigeons assigned
to these groups appeared to use the intensity of faces as a cue to discriminate
between male and female faces. However, in Group T there was one interesting
exception. The pecking rate of the animal with the highest r value did
not correlate with intensity. In Group S, we found no correlation between
pecking rate and average intensity, although for the three birds that showed
reasonable classification performance there was a weak correlation between
pecking rate and size (R>0.320; P<0.05).
Despite the high correlation between
pecking rate and intensity, the performance of the successful Group O and
Group T subjects cannot be explained in terms of the exclusive use of this
stimulus parameter as a cue. Ranking the stimuli according to their average
intensity reveals a r of 0.787, which is considerably smaller than the
r values for response rates of Group O and T subjects. In contrast, while
size is a much better cue for assessing the sex of a human face (r=0.924),
only three of the eight Group S subjects were able to capitalize on it.
In a second test the spontaneous
responses of pigeons to images that were normalized with respect to their
average intensity (Group T) or size (Group S) was measured. Group S subjects
were presented with male faces from the test set that were the same average
size as female faces, and female faces that were the same average size
as male faces. The male faces were now smaller than the female faces, and
if the Group S pigeons used size as a cue their pecking behavior should
be reversed. The same logic was applied to the texture group for which
intensity was normalized. Finally, as a control for the generality of our
conclusion that texture information is much more readily used by pigeons
to classify faces than shape information, we tested Group O using the two
versions of the 100 test faces that were shown to Groups T and S during
training.
The
performance of Groups T and S in this test is shown in Figure
29. While the birds were able to discriminate the training stimuli,
transfer to the test stimuli was impaired. For Group T, the difference
between pecking rates in response to positive and negative test stimuli
decreased to a level that was no longer significant. Thus, pecking in this
group seemed to be strongly controlled by the brightness of the faces.
For Group S, pecking behavior was slightly reversed with respect to training,
indicating that size differences played an important role in the classification
strategy of the successful subjects in this group. Finally, the question
of whether texture was preferred over shape as a cue when both sources of
information were available can be
answered. Group O subjects showed a much weaker generalization decrement
in the presence of test stimuli that differed with respect to texture than
in the presence of test stimuli that differed with respect to shape (Figure
30).
The results of these experiments showed that the performance of successful birds was tightly controlled
by differences in the average intensity of faces. However, average intensity
could not have been the only cue that the birds were using because pecking
behavior was not reversed during the transfer test of Group T, in which
this parameter was exchanged between classes. In order to determine whether
the birds would succeed on this task if average intensity was removed as
a class discriminator, 15 subjects from the former Groups T and S were
subjected to further training using texture only stimuli that were normalized
with respect to their overall intensity. Although this meant that intensity
was factored out as a discrimination cue, the subjects mastered the task
and performed at a level greater than chance by the end of training. Clearly,
another stimulus property had taken the role of class predictor. If the
pigeons had possessed memories of the faces that they experienced during
the first training and testing phase, then they would not have to relearn
the classification during this phase of training.
In order to provide quantitative
support for the feature account, we investigated the pigeons' subjective
separation of the feature space according to the experimenter- defined
class rule using principal component analysis (PCA). We found significant
correlations between pecking rate and some of the dimensions captured by
PCA. There was a high correlation between pecking rate and the stimulus
projection values of the second, and partly the first and the third, principal
axes. Since it is impossible to determine what stimulus properties the
pigeons extracted, we created synthetic faces that varied along these PC
axes. Two opposite faces (each 6 standard deviation units away from the
mean) from
the first 20 principal components are depicted in Figure
31. From inspection of these images, we were able to tentatively conclude
which aspects the pigeons were attending to. The first PCA picks up a difference
in relative luminance between the upper and the lower half of the face;
which is stronger in men than in women, presumably because of the shadow
created by the beard. The second PCA includes a subtle difference in color
between male and female faces; male faces are more red than female faces,
while female faces are more blue and green than male faces. The third PCA
is related to patterns of shading.
In order to determine whether the
pigeons actually used these stimulus properties, or some
combination of them, birds were subjected to test sessions involving the
presentation of these PCA images. This revealed that their classification
behavior was controlled by these feature parameters. In the case
of the first two axes, animals perceived even small variations within +/-
1 standard deviation unit and responded in terms of a category decision
(Figure 32).
The strongest effect was found in the second axis representing color differences.
This is not very surprising given the fact that pigeons have an extraordinary
physiological capacity for the exploitation of color (Thompson, Palacios
& Varela, 1992; Varela, Palacios & Goldsmith, 1993).
Although the above results cannot
sufficiently disprove the possibility that item-specific details or higher-order
stimulus aspects guided the pigeons' classification strategy, we were able
to show that this was quite improbable. We measured the pigeons' spontaneous
classification of interspersed
test images that were derived from the original color images by substantially
destroying the higher-order stimulus properties. For example, using a Gaussian
filter we produced blurring and using a mosaic filter we produced
block-portraits of the faces. Accurate responding was maintained across
a large range of destruction in both Gaussian (Figure
33) and Mosaic tests (Figure
34).
In summary, these experiments were
an exercise in animal visual categorization. Considered from a purely behavioral
point of view, the present outcome would fit seamlessly into a list of
experiments that provide evidence to suggest that pigeons form complex
concepts (Herrnstein, 1985; Wasserman, 1995; Watanabe et al., 1993). When
presented with the proper stimuli, pigeons learned quickly and generalized
widely. Although pigeons have strong resources for learning specific exemplars,
and display surprising cognitive capacities, neither categorization in
terms of exemplar memorization nor in terms of abstract concept formation
is plausible. Common to both of these theories is that they underestimate
the pigeon's ability to instantaneously adopt a perceptual description
of visual classes that are coextensive with natural categories. Cerella's
(1979) oak leaf experiment, but also two experiments with blue jays discriminating
species of moths (Pietrewicz & Kamil, 1977) and patterns of leaf damage
due to different species of caterpillars (Real et al., 1984) support this
notion. These findings raise question of whether animals might sort the
complex objects of the natural environment, even the so-called higher-order
concepts like "persons" and "fish", by fixing on some specific, single
feature.
The surface properties of objects
represent a feature domain that provides enough possible codes to reflect
the actual distribution of reinforcement in the environment (Haralick,
1979, Pentland, 1984). Unfortunately, the surface properties of images
have never been seriously considered as providing the appropriate descriptor
of seemingly complex stimulus classes. In contrast, considerable effort
has been made to construct artificial categories out of simple forms such
as line drawings in order to control for feature content. In our own experiments
we were able show that surface properties are not only sufficiently informative
for pigeons to easily classify a particular complex natural category, but
are perhaps, at least for this species, superior to shape attributes.
Even if a class definition based
on surface attributes remains obscure to the experimenter, pigeons may
utilize this lower-order statistic inherent in pictures by the effortless,
preattentative processes of perception (Marr, 1982). A sophisticated texture
analyzing system might be of great value for viewpoint independent object
recognition, for the recognition of objects without concrete boundaries,
and for the recognition of degraded or partially occluded objects (Julesz,
1981; Julesz & Kröse, 1988).
VI. Conclusions