III. Overview and Theoretical Considerations
Pigeons and People Compared
The visual systems in primate and avian species appear to have undergone widely different
evolutionary histories (see
Husband & Shimizu (2001). As a result, the anatomical layout of avian
and human visual systems is strikingly different. There do, however, appear to be analogous
pathways, operating in parallel. Moreover, pigeons and primates have evolved with many of the same
pressures: objects must be recognized from different vantage points, different distances, when
they are partially occluded by other objects, etc. So, it is possible that common selective pressures
may have yielded visual systems in which there are analogous mechanisms underlying visual functioning.
The study of avian visual cognition can provide insight into the extent to which
the structurally different avian and primate visual systems may function similarly or differently.
The set of studies in the present chapter on "Object Recognition" examined the attributes of
objects that contribute to the perception and recognition of familiar images. The particular
training objects and manipulations of object features were chosen because they are similar
to those used in the study of object recognition in humans by Biederman and colleagues. However,
the training methods employed in the human and pigeon studies are quite different, a factor
which may contribute to any observed differences in performance between the two species.
Geon Structural
Descriptions. Pigeons demonstrated a
sensitivity to the spatial organization of object components. When the geons were spatially
displaced from their normal position, recognition accuracy suffered. Moreover, pigeons were
able to discriminate between different spatial arrangements of the same geons. Finally,
the pigeons demonstrated generalization to pseudo-objects that contained the same spatial
organization as a reinforced pseudo-object (see section on relative importance
of geons and interrelations). The geons themselves were also seen to control behavior. First,
there was above-chance performance to scrambled versions of objects, that must have been due to the
presence of local features other than the vertices between components. The only local features that
were available were the geons and their component pieces of contour. Second, the principle of three-geon
sufficiency was observed: the deletion of a single geon had no effect on recognition, but the deletion
of three geons had a large detrimental effect (see section on importance of components).
Third, pigeons demonstrated substantial generalization to pseudo-objects containing the same geons as a
reinforced pseudo-object (see section on relative importance
of geons and interrelations). These combined results indicate that pigeons, too, may store geon
structural descriptions of objects. The similarities in primal object recognition among pigeons and
people encourages the use of a single theoretical account that may apply to many species of animal.
Invariant Object
Recognition. In the study of
invariances in object recognition
in pigeons, there was significant transfer to stimuli that were rotated in depth,
moved, or altered in size (see section on invariance). However, only translational
invariance was complete. Both rotation in depth and changes in size produced systematic decrements in
accuracy: as the degree of rotation or change in size of the training object was increased, greater decrements
in performance were observed. In human subjects, these kind of systematic decrements have been observed in studies on rotational
and size invariance, but usually the costs are in reaction time, not recognition accuracy (e.g., Bartram,
1974; Hayward & Tarr, 1997; Jolicoeur, 1987; Shepard & Metzler, 1972; Tarr et al., 1998). In
other studies, changes in rotation or size produced no effect on recognition accuracy or speed
(e.g., Biederman & Bar, 1999; Biederman & Cooper, 1992; Biederman & Gerhardstein, 1993;
Fiser & Biederman, 1995).
Thus far, the only operation that has produced complete invariance in the pigeon is training with
multiple
examples along the to-be-manipulated dimension. In the work on rotational invariance, training with widely spaced
views results in a flat generalization gradient. Complete translational invariance was also only observed when multiple
training locations were used (but, the number and particular set of training locations has not been systematically
studied). We did not conduct investigations with multiple training sizes, so we don't know whether complete
size invariance can be observed.
In sum, it is difficult to determine the degree of similarity between pigeons and humans in their to sensitivity
to variations in the size, location, or angle of view because (1) humans sometimes act like pigeons, demonstrating
a systematic generalization decrement, but at other times demonstrate complete invariance, and (2) rotation in depth
or change in size most commonly affects reaction times in humans, but affects recognition accuracy in pigeons -- this difference
may indicate different underlying mechanisms (see below). Further work may determine the conditions which
affect the degree of invariance that is observed. Important considerations are likely the features of the
stimuli in the training set (see, for example, Beiderman & Gerhardstein, 1993) and the experimental
procedure which is employed in training.
Theoretical Evaluation
Many of the experiments presented in this chapter were designed to contrast Recognition-by-components (Biederman, 1987),
a model which incorporates both features and their organization, with Particulate feature theory (Cerella, 1986),
a feature-only model. Therefore, the chapter was organized around contrasting these two competing views.
From the experiments on the importance of spatial organization alone, it is
apparent that a feature-only account is insufficient to explain the mechanism of object
recognition in the pigeon. The remainder of the discussion will therefore focus on evaluating RBC and contrasting
it with other template models of object recognition.
Recognition-by-components. A number of the present results are consistent
with RBC. First, the dual importance of object components and their spatial organization
is consistent with the notion that object representations are stored as
geon structural descriptions, which specify the object components and their spatial interrelations.
Second, the spatial interrelations appeared to be more salient than component shape. RBC contends that there
are a small number of primitives (about 30) that make up all objects in the environment,
much like the small number of phonemes that make up human language. These primitive components can be
combined to make up hundreds of thousands of objects through the use of variations in the
particular sets of components and their organizational rules. Therefore, two objects may contain the same
geons, but in different interrelations, thereby producing two discriminably different images
(see cup-pail example). The organizational specifications of objects
therefore can provide information above and beyond the specification of the constituent components.
Third, there was considerable generalization to objects that appeared in novel locations, with altered
viewpoints, or were changed in size. RBC predicts that object recognition should be invariant across
these kinds of manipulations, providing that the same structural description can be extracted from the
original and modified objects.
However promising RBC may seem in explaining a wide range of phenomena in pigeon visual perception,
it does appear to suffer from some limitations. In particular, RBC has difficulty in dealing with the
systematic deficit in accuracy that occurred when an object was rotated in depth away from the training viewpoint.
Biederman and Gerhardstein (1993) argued that if the operation of rotation alters the geon structural
description, then complete rotational invariance will not occur. The testing viewpoints did
involve changes in the geon structural descriptions, so these changes may have produced the observed
generalization decrement. Accordingly, training with multiple views may have resulted in a more
complete structural description, thereby allowing for the complete generalization that occurred in
the second experiment.
A recent study by Peissig et al. (2000), however, disclosed similar results using single geons rotated
in depth. They reported that training with only a single viewpoint of an individual geon resulted in
a generalization gradient, with systematic fall-offs in performance on either side of the training
orientation. These results are problemmatic for RBC because the images were geons. By definition,
geons are supposed to be able to be detected relatively indepdently of viewpoint. The geon structural
description of Peissig et al.'s images should not have changed with rotation.
An equally problemmatic finding was the systematic decrement in accuracy that was observed when
object size was manipulated, relative to the training size. Here, it is difficult to see how RBC
could account for the lack of true size invariance. None of the geons or their spatial interrelations
were altered, so there should have been complete generalization across all sizes.
Because true rotational and size invariances are not always observed in human subjects, it is possible that
these limitations of RBC are not restricted to pigeons. What needs to be determined are the factors that
affect the degree of rotational and size invariance that occur, in both species.
Template
models. Template models propose that metric specifications
of an object such as its viewpoint (Edelman & Bülthoff, 1992; Tarr & Bülthoff, 1995;
Tarr & Pinker, 1989; Ullman, 1989) or size (Kosslyn, 1987; Ullman, 1989) are stored in memory
along with specifications of the object's shape. (See Heinemann & Chase, 1990 and the chapters by
Don Blough
(2001) and Chase
& Heinemann (2001) for further descriptions.)
When the object is now encountered with new metric
specifications, the recognition process engages in transformations of the current object in order to
find a match to a stored representation. When a new viewpoint is encountered, the transformation is
mental rotation; when a new size is encountered, the transformation is mental zooming.
Template models would predict costs in time or accuracy in recognizing objects when metric
specifications have changed. Because larger changes in size or angle of view produce the need
for more extensive transformations, template models predict the systematic decrement in recognition accuracy
that was observed in the size and rotational invariance studies. Moreover, template models
that allow for the storage of multiple views of frequently encountered objects (e.g., Tarr &
Pinker, 1989), would also predict better performance to novel views when training occurred with
multiple viewpoints.
Although template models may provide a better description of the rotational and size invariance
studies than RBC, they have difficulty explaining the remainder of the results. These models have
not been extended to deal with the importance of components and their spatial organization to
object recognition. It is not even clear how these models would attack this problem. Template models
assume that an integrated representation of an object is stored. Therefore, any transformations
that may occur operate at the level of the entire object. A reorganized object would therefore be
treated as an entirely new object. This did not appear to be the case in our pigeon data (or in humans).
The pigeons responded to scrambled versions significantly above chance, indicating that they must
have recognized the fact that the scrambled versions contained the same components as the original, but
they also recognized the fact that the parts were not in the proper spatial concatenation. Template models
also have difficulty in explaining the fact that pigeons can generalize to a set of complementary contours
after having been trained with the opposing half of the contours (Van Hamme, Wasserman,
& Biederman, 1992 -- see section importance of components for a further description).
Here, there are no shared features to mediate generalization between the original and complementary set
of contours, so there would be no basis for template matching to occur.
A Multiple Systems Approach
Biederman and colleagues (e.g., Biederman & Cooper, 1992; Beiderman & Gerhardstein, 1993)
have argued that metric properties of objects such as size, location, and viewpoint are processed
separately from shape attributes. In macaques, ablation of the inferior temporal cortex results in
gross impairments in shape discriminations, whereas ablation of the posterior parietal region impairs
the use of spatial cues in targeting the location of an object (Mishkin & Appenzeller, 1987;
Ungerleider & Mishkin, 1982). Biederman and Cooper (1992) contended that the dorsal system may
encode metric attributes such as rotation in depth and size in addition to spatial position:
When we pick up a cup by its handle, our motor movements are exquisitely tuned to the cup's
position, size, and the orientation of the handle in depth. Thus, in a single skilled movement, we may
reach to the right in the direction of the cup, simultaneously bend our wrist if the handle is on the right
side (keeping our wrist straight if the handle is directly in front) and making a bridge between our
thumb and fingers just wide enough to accept the handle. None of the information critically important for this
act appears to be required for speeded object recognition; conversely, the identity of the object need not affect
how it is picked up. (p. 130). |
Accordingly, Biederman and colleagues have argued that cases in which there were costs in recognizing
rotated (e.g., Bartram, 1974; Hayward & Tarr, 1997; Shepard & Metzler, 1972; Tarr et al., 1998)
or sized (e.g., Jolicoeur, 1987) stimuli may reflect the participation of the dorsal pathway in the recognition process, whereas
cases in which rotational (e.g., Biederman & Bar, 1999; Biederman & Gerhardstein, 1993)
or size (Biederman & Cooper, 1992; Fiser & Biederman, 1995) invariance are complete may reflect the sole operation of the ventral pathway.
The determination of whether the ventral system alone or both systems participate may rely on the experimental
procedure that is used. One interesting feature of the multiple systems approach is that it may allow for an understanding of the pattern
of similarities and differences in object recognition by pigeons and humans. It is possible that shape recognition
systems in humans and pigeons may operate similarly, but the system for determining metric properties may differ in
its neural mechanisms and/or in the extent to which it participates in entry-level object identification.
Conclusions
At the present time, it appears that no current theory of visual object recognition is capable of accounting
for all of the facts of pigeon (or human) visual perception. However, RBC provides a reasonably good
qualititative description of many facts of object recognition. Given that the object recognition process in the human and the pigeon
may be quite similar, it may be possible to employ a single theory of object recognition for both species.
Next Section: References