Invariance across Change

Avian Visual Cognition

Kirkpatrick
Home Page

Next Section:
Theory
___________

InvarianceRotational Invariance

When an object is seen from a new viewpoint, the retinal projection may differ extensively from the original viewpoint. According to Recognition-by-components (RBC; Biederman, 1987), if at least a subset of the original components can be recovered from the new viewpoint, and their interrelations are unaltered, then recognition will occur. Object recognition in humans appears to occur despite differences in viewpoint (e.g., Biederman & Bar, 1999; Biederman & Gerhardstein, 1993), but there has been considerable controversy regarding whether true rotational invariance occurs (c.f., Biederman & Gerhardstein, 1995; Tarr & Bülthoff, 1995). A growing literature has demonstrated that there are costs in recognition accuracy or speed that are directly related to the degree of rotation from the familiar viewing orientation (e.g., Bartram, 1974; Hayward & Tarr, 1997; Shepard & Metzler, 1972; Tarr et al., 1998). These controversies are primarily relevant in discriminating between rival theoretical accounts of object recognition (for more, see Overview and Theoretical Considerations). Regardless of the exact mechanism by which novel viewpoints are recognized, it is clear that humans do spontaneously recognize, to varying degrees, objects when seen from novel vantage points.

Early evidence from studies of rotation in the depth plane offered little encouragement that pigeons might also recognize familiar objects from a variety of vantage points (Cerella, 1977; Lumsden, 1977). However, both of these studies employed training objects which may not have provided effective three-dimensional information for generalization along the depth dimension (Lumsden used a brick with an attached semicircle and Cerella used line drawings of regular and distorted cubes). Logothetis et al. (1994) determined that monkeys could recognize stimuli that were rotated in depth, and that generalization to novel viewpoints was better if renderings of realistic objects served as a training stimuli than if wire-frame or blob-like shapes were used. It is possible that viewpoint invariance may occur in pigeons, provided that they are trained with richer, more realistic renderings of objects that contain multiple cues for inferring three-dimensionality.

Training with a single orientation. Pigeons were trained with a four-key choice procedure to discriminate a chair, an airplane, a lamp, and a flashlight (see below). Each group received only a single viewpoint, but different groups of pigeons (n=4) received different viewpoints: groups 0, 33, and 67 received training with viewpoints of 0°, 33°, and 67°, respectively.

Training Viewpoints for Groups 0, 33, and 67

0°

33°

67°

Once the pigeons had learned the original discrimination task, they were tested with novel orientations of the training objects, ranging from -100° to 167°. Group 0 and group 67 received three novel orientations on either side of the training orientation and group 33 received four novel orientations on either side of the training orientation. All three groups produced generalization gradients to the novel orientations. Choice accuracies were highest for the original training stimuli. There was a fairly symmetrical generalization decrement on either side of the training stimulus, but even the most extreme orientations were recognized above the chance level of 25% in all three groups. See Blough (2001) for a further description and analysis of stimulus generalization gradients.

Although there was substantial generalization to novel viewpoints, a result that stands in contrast to Particulate feature theory, the birds did not demonstrate true viewpoint invariance. According to Biederman and Gerhardstein (1993) three conditions must be met for viewpoint invariance to occur: (1) the object must be decomposable into viewpoint-invariant parts so that a structural description of the object's geons and their spatial interrelations can be determined; (2) the structual description of the object must be distinctive; and (3) multiple views of the same object must have the same structural description. The four training objects and their rotated viewpoints do meet conditions 1 and 2, but not condition 3 because as the images are rotated in depth some parts are lost while other are gained as in the animated example to the right.

Training with a multiple orientations. A second experiment examined whether training with multiple orientations would enhance the degree of generalization to novel viewpoints by exposing the birds to parts of the objects that were not seen during training with only a single viewpoint. Logothetis et al. (1994) reported that training monkeys with multiple views of a stimulus produced better generalization to rotated versions than training with a single view. It is possible that multiple-view training allows for a more complete specification of the stimulus parts, thereby allowing for superior generalization to novel viewpoints in which the parts may be slightly altered.

Pigeons were trained with multiple viewpoints of the chair, airplane, lamp, and flashlight. All four groups of pigeons (n=4) received training at the nominal orientation of 33°. Group 33 only received training at this orientation. The other three groups received training with two additional orientations at different degrees of separation. Group 0, 33, 67 received additional drawings at orientations of 0° and 67°; group -33, 33, 100 received additional drawings at orientations of -33° and 100°; and group -100, 33, 167 received additional drawings at orientations of -100° and 167°.

All four groups of pigeons received the full set of test drawings from the first experiment, ranging from -100° to 167°. The test drawings were incorporated into sessions which also contained the original training trials. As can be seen in the next figure, Group 33 produced a generalization gradient that was similar in shape to the gradients obtained in the first experiment. Discrimination performance was maximal at the training orientation of 33� and then fell off on either side as the objects were rotated farther from the original viewpoint. All of the orientations were recognized above the chance level of 25%. The other three groups of birds demonstrated a much flatter generalization gradient than group 33. The degree to which the gradient was broadened was related to the spacing of the training orientations, with the most widely spaced training orientations (group -100,33,167) resulting in the flattest generalization gradient.

The results suggest that experience with multiple vantage points of an object may counteract the effect of the alteration of the structural description of an object that is produced by rotation in depth in these objects. However, template models of visual perception would also predict superior generalization performance when multiple training views are experience (for more see Overview and Theoretical Relevance section).

The failure on the pigeon's part to demonstrate true rotational invariance, when trained with only a single orientation may have been due to the change in the structural description of the rotated viewpoints. When an object is rotated in depth, it is possible for familiar parts to move out of view and new parts to rotate into view. When pigeons were trained with three widely spaced viewpoints they demonstrated relatively constant recognition accuracy, suggesting that viewpoint invariance is achievable if all of the object's parts are experienced in the training phase. The demonstration that pigeons generalize to objects that are rotated in depth is problematic for Particulate feature theory because the local features change so drastically with rotation in the depth plane. Although the structural description of an object also changes with rotation in depth, those changes are less drastic than the changes in the local features because most of the geons are still available, and their interrelations are maintained. The demonstration of rotational invariance does not, however, necessitate the use of recognition by components for theoretical explanation. Template models (see Overview and Theoretical Relevance) that allow for the storage of multiple views of an objects, or the use of mental rotation in the search for a match, can also account for the above results (also see Chase and Heinemann (2001) for an account of template matching in pigeon object recognition).
Translational Invariance

When an imaged is viewed, its position on the retina likely changes from occasion to occasion. In fact, even within a single viewing, the retinal position that the image activates likely changes with movements of the observer's head and body. For the sake of processing efficiency alone, it seems plausible that the visual system might independently code the shape and position of an object, so that a representation of object shape would not have to be duplicated for every retinal position that is encountered. If such dual representations were to exist (see Ungerleider & Mishkin, 1982 for indirect evidence in primates), it would allow for effortless identification of object shape attributes, despite variations in position. As a result, complete translational invariance would be observed. In humans, complete translational invariance has been reported without any measurable costs in recognition accuracy or speed (Biederman & Cooper, 1991; Ellis et al., 1989). Cheng and Spetch (2001) have demonstrated the pigeons can learn to find a goal relative to one or more landmarks, even though the location of the goal (and array of landmarks) moves about on the viewing screen from trial to trial. These results indicate that location in the environment (here, the viewing screen) can be ignored by pigeons. Although these results are encouraging, they do not speak directly to the issue of translational invariance.

In order to determine the extent to which pigeons would demonstrate translational invariance in their recognition of line drawings of objects, four pigeons were first trained to discriminate among four drawings of a watering can, an iron, a desk lamp, and a sailboat using a four-key choice procedure. The training stimuli were presented in one of four locations on the viewing screen: upper-center, lower-center, left-center, and right-center. This was done to encourage attention to the entire viewing screen during training. In order to receive reinforcement, the pigeon had to peck a particular choice response key in the presence of an object, regardless of its location on the viewing screen. Once the pigeons had achieved a high level of accuracy on the original discrimination, they received the training objects in novel locations on the viewing screen, randomly intermixed with occasions where the object appeared in its original training location. There were four novel locations: upper-left, upper-right, lower-left, and lower-right.

The next figure presents the accuracy scores for the original and novel viewing locations for each of the four birds. There was no noticeable effect of moving the entire object to a new location on the viewing screen (M = 85% correct), compared to presenting the object in its normal location (M = 89% correct). The successful generalization to novel locations indicates that the pigeon's recognition of line drawings of objects is translationally invariant, much like visual perception in humans.

Although complete translational invariance was observed, the results should be interpreted with some caution. A critical factor in observing translational invariance in the pigeon is that the original training involve the presentation of objects in more than one location. An earlier pilot study by Kirkpatrick-Steger and Wasserman in which training objects were presented solely in the center of the screen revealed only modest evidence of generalization (approximately 50% correct) to new locations. Training with multiple locations may facilitate transfer simply because the pigeon learns to attend to the entire screen. Alternatively, it is possible that the original training explicitly taught the pigeons to ignore the dimension of location. In any event, the effect of number and placement of locations in the training set on degree of transfer to novel locations are factors that should receive further experimental attention.

Size Invariance

When a particular object impinges on the retina the size of the retinal image often varies from occasion to occasion. The primary cause of retinal image size variations is viewing distance. When an object is viewed from farther away, the retinal image will undoubtedly be smaller than when the object is viewed from a closer distance. Size constancy (or invariance) is the ability of the visual system (in humans) to infer that a given object is the same size when it is viewed from different distances. Size constancy allows for the recognition of an object, even though the retinal surface area that the image activates is quite different. So, it is possible to recognize both of the images to the left as the same parakeet, even though one of the images is half the size of the other. A related phenomenon is the ability to use the retinal image size to infer distance perception so that objects that are smaller are inferred as being farther away. In a three-dimensional environment this inference would operate correctly, but in a two-dimensional image, the inference can create illusions: Although the smaller parakeet is actually above and left-of the larger parakeet, our visual system interprets the smaller bird as being in the distance. Combined, these two phenomena compose the size-distance problem, an ability that allows us, as humans, to recognize an object from different distances and to use the size of an object to aid in judgements about the object's distance.

Size invariance appears to occur readily in humans (Biederman & Cooper, 1992; Fiser & Biederman, 1995), although in some paradigms recognition may occur with a time cost (e.g., Jolicoeur, 1987). The present experiment addressed the issue of size constancy to determine whether pigeons could recognize line drawings of objects that were larger or smaller than their ordinary size. Pigeons can engage in both categorization (e.g, Bhatt et al., 1988; Cerella, 1979; Herrnstein, Loveland, & Cable, 1976) and oddity-matching (Lombardi & Delius, 1990) tasks when size was an irrelevant cue. These results indicate that the pigeon is capable of classifying stimuli, even though their sizes differ. (See Urcuioli (2001) for further information and photographs of the kinds of stimuli employed in categorization studies.)

The present study attempted to directly determine whether pigeons would continue to correctly identify objects, even though their sizes had changed. Pigeons were trained with a four-key choice procedure to discriminate among the watering can, the iron, the desk lamp and the sailboat. During the initial training phase, the pigeons only experienced a single size of each stimulus, and all four of the training stimuli were the same size. Once the pigeons were performing at a high level on the original discrimination task they then received test trials in which the training stimuli were either smaller or larger than the original size. There were seven test stimuli, three smaller, three larger, and one original size. The test stimuli ranged from 25% to 250% of the original size. The scaling was relative so that the aspect ratio of the altered stimuli remained the same as the original size. Specifications of the height, width, and area of each stimulus can be found in the adjoining table.

The next figure displays recognition accuracy, in percent correct, as a function of relative image size. Relative image size is scaled logarithmically to equate the sizes in relative distance units. Performance was best to the original training size, but there was substantial generalization to both smaller and larger sizes. All of the images but the smallest size were recognized above chance. The generalization gradient was somewhat asymmetrical: the decrement in performance was more modest to larger sizes than to smaller sizes. This may have been due to the fact that the larger sizes were not varied over as large of a range as the smaller sizes, due to constraints of the size of the viewing screen. (See D. Blough's (2001) chapter for more on generalization gradients.) The results indicate that pigeons can recognize stimuli that have been altered in size, but with a cost. The degree of cost is related to the degree of change in the stimulus size. The implications of the cost in recognition accuracy is discussed in the next section, and viewed in the context of the results from studies on rotational and translational invariances.

Next Section: Theory