Measuring Similarity

Avian Visual Cognition

Next Section: Similarity and Categorization
__________

III. Experimental Measurement

Experimental Measurement and Computation of Similarity

Measurement is the process of assigning numbers to objects according to a set of rules. This process serves to describe and organize phenomena, and it provides a means of testing theories about the measured objects. For nonverbal subjects, the raw materials of similarity measurement consist primarily of generalization, discrimination and transfer data based on errors, response rates, and reaction times. This section describes how one goes from these raw materials to the assignment of similarity numbers, reviews a few of the advantages and disadvantages of different approaches, and suggests some implications of similarity measures for models of avian performance.

Stimulus Generalization
A few years ago a chapter on similarity in a book about animal behavior would have been titled “Stimulus Generalization,” and similarity would be defined by relative responding across stimuli in generalization and discrimination tasks. This approach may work for problems that can be addressed by using stimuli that vary on one dimension and similarities need only be roughly known. However, individual generalization gradients are quite limited as a general tool for conceptualizing and measuring similarity. Some time ago, in a book on generalization, Shepard and I both pointed out some of the limitations (D. Blough, 1965; Shepard, 1965), which are briefly summarized here.
Consider a prototypical generalization gradient published by Guttman and Kalish (1956). Pigeons were rewarded on an intermittent schedule for pecking a key illuminated by monochromatic light of 550 nm. Reward then ceased, and 11 wavelengths from 490 to 610 nm were presented a number of times in random order. Figure 8 shows the mean across birds of the number of responses made to each stimulus.
What does the gradient in Figure 8 reveal about the similarities between test and training stimuli and among the various test stimuli? Assume at least an ordinal relation between response rate and similarity, that is, the more a pigeon pecks at a stimulus the more similar that stimulus is to the rewarded stimulus. Then the gradient permits the ordering of similarities between the training stimulus, 550 nm, and the various test stimuli, from 560 nm (marked "C"), which is most similar, to 490nm, which is least similar.

However, it is hard to say much more than this. First, of course, the gradient provides comparisons only between the test stimuli and the training stimulus, 550 nm. For example, one cannot assume that stimuli giving the same number of responses on the test are similar to each other; 570 nm (at D) is probably not similar to 540 nm (at B), though the pigeons responded about equally to both.

More importantly, the metric properties of the scales used to plot the gradient are unknown. On the response scale, for example, C differs from A by about half the number of responses as does D, but we cannot say that C twice as similar to A as is D. As for the stimulus scale, there is no reason to assume that wavelength is related to similarity in a simple way. For example, the Guttman & Kalish gradient in Figure 8 suggests that 560 nm is more similar to 550 nm than is 540 nm, though both differ from 550 nm by 10 nm. In other words, the “true shape” of the generalization gradient is indeterminate from these data alone (Blough, 1965; Shepard, 1965).

Fortunately, Guttman & Kalish (1956) collected generalization gradients based on training at several different stimuli, in addition to the one shown in Figure 8. This permitted Shepard (1965) to resolve the stimulus scaling problem in a unique and powerful way. He noted that the slopes of the gradients tended to covary where they overlapped. For example, gradients centered at 550, 570, 580 and 600 nm were all rather flat in the vicinity of 570 nm. (This can be seen near D in Figure 8, where the slope is less than that at B ). Shepard found that by rescaling the abscissa such that data points near 570 nm were pushed horizontally closer together (in addition to similar, smaller changes elsewhere), he could make all the gradients take on approximately the same shape.

The rescaling process is illustrated in Figure 9. The top panel shows the gradient from Figure 8 (around 550 nm) in blue, together with an approximation to the empirical gradient around 570 nm, in green. Note that the blue gradient falls less steeply on the right than on the left, whereas the green gradient falls less steeply to the left and not so sharply peaked. The bottom panel shows the result of rescaling the abscissa, done mostly by reducing the distance between 560 nm and 580 nm. The gradients are now about the same shape. They are also approximately symmetrical around the training wavelength. The transformation necessary to bring this about provides an equal-interval similarity scale. (Shepard's analysis gained power because it transformed 5 gradients, not just two, to about the same shape.)

The success of such rescaling is not preordained. Rescaling will produce a common shape for overlapping gradients only if the same underlying similarity scale determines all the gradients. But when rescaling is successful, the resulting scale would be expected apply generally to pigeon experiments using this range of wavelengths. The potent idea that a number of interlocking measurements from the same set of stimuli can uniquely determine a similarity scale has been applied widely. It is at the core of non-metric multidimensional scaling, to which we next turn.

Multidimensional Scaling
Multidimensional scaling is the measurement procedure that corresponds most closely with the geometric approach to similarity described earlier. The procedure is typically applied to a matrix of values each of which represents a behavioral measure of the similarity between two objects in a set; all possible pairs of objects are usually represented in the matrix. For the behavioral measure, human subjects may rate object pairs for similarity; humans or other animals may provide error or latency data from discrimination or search tasks. Using these data as input, a computer algorithm provides a spatial map, in which interstimulus distances correspond to dissimilarities between stimuli. This map can efficiently describe patterns or structures within such data that may bear directly on models for the mental representation of similarity (e.g. Nosofsky, 1992; Shepard, 1987; Tversky, 1977).
Multidimensional scaling and related analyses can reduce a large amount of data to a relatively simple structure that is often easy to visualize and can present important relationships in an economical way. The effectiveness of this analysis rises rapidly with the number of different stimuli employed. Each pairing of objects provides a data value, and the number of binary relations (R) between pairs of a set of objects rises approximately as the square of the number of objects; specifically R = (n² – n )/ 2, if self-similarity is excluded.   Thus, for example, the 26 letters of the alphabet number can be paired in 325 ways, providing a rich set of highly interconnected values.
A geometric map of similarity is like a geographic map; just as the latter compactly represents thousands of distances among pairs of cities or other locations, a similarity map captures many relations among perceptual objects. A point in a similarity space corresponds to each object, and distances between these points represent dissimilarities between the objects; the smaller the distance between objects, the greater is their similarity.
This multidimensional map of similarity is most compelling when the stimulus objects have an inherent dimensionality. In actual cases, these dimensions often correspond to physical attributes such as size or intensity, although they need not do so. Even in such cases, however, as mentioned above for the generalization gradient, the correspondence between the physical and psychological measures is typically non-linear, and the nature of the psychological similarity scale is generally unknown in advance.
Fortunately, non-metric multidimensional scaling algorithms assume nothing about the data measurement scale except the ordinal relation. They attempt to match as closely as possible the rank order of the data values (errors, latencies, etc.) with the rank order of the distances in multidimensional space. It might appear that much is lost when all information in the data other than rank order is discarded, but Shepard (e.g. 1980) and others have shown that with a sufficient number of objects the metric structure of the space can be accurately constructed from rank order alone. A forceful demonstration takes the direct airline distances between 15 or more cities in the United States and distorts them with a transformation that leaves their order unimpaired - for example, by taking their logs, squaring, or raising them to an exponent.   With the distorted data as input, a non-metric scaling program such as ALSCAL will place the cities accurately in a two-dimensional map of the US, with distances conforming once again to a ratio scale.   Significantly, the program will also discover and plot the transformation that was used to distort the data.
Thus, using minimal assumptions, non-metric scaling can recover the "psychological space" within which the similarities of objects are distributed, and it can also recover the transformation that relates the behavioral measure to similarity. Various methods can yield appropriate input data in avian subjects, and the following are a few illustrations from studies with pigeons.

Figure 10. (redrawn from Riggs et al, 1972)

Among the earliest scaling experiments done with pigeons were descriptions of hue similarity in the pigeon. In a study designed to investigate early color processing, Riggs, P. Blough, and Schafer (1972) presented a stimulus field consisting of a pattern of stripes that alternated rapidly in wavelength. The luminance of the stimuli was equated, following D. Blough (1957). The electrical response of the retina to this alternation increased non-linearly with wavelength difference, and it could be taken as a measure of the perceptual similarity of the stimuli. A matrix of the electrical responses to all pairings of 12 wavelengths from 495 to 660 nm provided input to a nonmetric multidimensional scaling program. Figure 10 shows the result, which can be seen as a partial color circle for the pigeon. Possibly the function would have formed a more complete circle if a wider range of wavelengths had been used.

Figure 11. A color circle for the pigeon; the colored spots suggest the appearance of the stimuli to a human observer. (Redrawn from Schneider, 1972, with colors added.).

At about the same time, Schneider (1972) used a behavioral method to derive a more complete color circle. His pigeons discriminated between identical and different pairs of wavelengths in a yes/no signal detection task. The two lights appeared on the two halves of the center key of a three-key chamber. If the wavelengths were the same, right key pecks were rewarded; if they differed, left key pecks were rewarded. The accuracy of performance to each possible pair of 11 wavelengths was taken as a measure of the dissimilarity of the pair, and non-metric scaling yielded the result shown in Figure 11.
These experiments suggest that similarities among colors for pigeons are organized in a manner generally like those for humans. This is quite interesting in view of the anatomical differences between the species, particularly the colored oil droplets that filter the light reaching the pigeon's cone receptors. (see P. Blough (2001) and Husband & Shimizu (2001) for more information regarding the structure and evolution of the pigeon retina).
In these experiments with wavelength, multidimensional scaling was applied to a relatively well-understood continuum, and the similarity results clarify aspects of the pigeon's visual function.   This sort of scaling can also be applied to stimulus sets especially constructed to explore the processes involved in the identification and discrimination of objects. We next consider an example of this sort, which also introduces a task that is convenient for studying inter-object similarity in pigeons.
This study, from D. Blough (1988) used an "odd item" search task. An array of 32 forms appeared on the screen of a computer monitor. The forms were all identical, with one exception, which was the target item; the pigeon got food on an intermittent schedule for pecking at the target, which was randomly placed from trial to trial. Pigeons learn this task readily, perhaps because it resembles a natural foraging situation. Depending on the circumstances, search accuracy or search speed may be the primary measure. Search is swift and accurate if the odd form is quite different from all the others; search is slow and may be inaccurate if the odd form is similar to the others. This odd-item task is particularly efficient because all possible pairs of items may be presented in random order and usually appear in the same experimental session; this scheme automatically provides equal treatment of the stimuli and counterbalancing for all the target items.

       One of the sets of forms that was used appears in Figure 12 to the left (D. Blough, 1988).  Each of the 16 items is a black block that varies in size together with a U that varies in width. There are 4 values on each of the two physical dimensions, though the objects might be described in other ways as well, such as the overall size of the pair of stimuli taken together. The two forms that appeared on any trial were drawn from this set of 16; one was the target, the other, repeated 31 times, was the distractor. Over the course of a session each item appeared as the target, paired with every other item as the distractor.

       Two typical displays based on this set of forms appear in the Figure 13 to the right. In the upper one, the target differs from the distractors on both block size and U width dimensions. In the lower display , the form that was the target for the upper display has become the distractor. The new target is harder to find; it differs from the distractors only in that its block size is somewhat smaller. The data consisted of mean search speeds collected for each of the 240 possible pairings of these forms.

Non-metric scaling based on the resulting matrix produced the structure shown in Figure 14. The figure suggests that a psychological dimension corresponds to each of the physical dimensions, and that similarity among the forms can be roughly equated with their distance in this similarity space (also see Cook, Katz, & Cavoto, 1997 for another example of using scaling analyses to look at avian discrimination behavior). As to the metric relation between the behavioral measure, search reaction time, and similarity, I showed that to a good approximation there is an exponential relation between the momentary probability of detecting a target (which determines variations in search speed) and the similarity between the target and the distractor (D. Blough, 1988).

Metrics, Dimensions, and Attention
The data displayed thus far have all been scaled in a space analogous to physical space. Distances in this space follow the so-called Euclidean metric;   in two dimensions, the Pythagorean theorem governs the relation between the coordinates of points and the distances between them. Thus, if the the coordinates are a and b and the distance is c:
(1)                        c = ( a²   +   b² ) ^1/2
or, in general, with several dimensions:
(2)            c = ( a²   +   b²   +   c² +   .... )^1/2Euclidean Metric
There are other rules for computing distance. If one is walking between opposite corners of a block filled with buildings, the sidewalk distance is given by the simple sum of adjacent sides of the block rather than by equation (1).    This distance rule is aptly named the “city-block” metric:
(3)            c = ( a + b + c +   ... )          City-block Metric
where again c is the direct distance and a and b are the distances along the coordinate axes.
Both Euclidean and City Block are special cases of the Minkowski metric:

(4) c = ( a^N + b^N + c^N + ..... )^1/N
where N=1 for the city-block metric and N=2 for the Euclidean metric.
A crucial property of the Euclidean metric is that it yields a constant distance no matter how the coordinates are moved or rotated with respect to objects in the space. It seems obvious that the distance between objects does not change with axis rotation; it is less obviously true that Euclidean space is unique in this respect. If the exponent N in equation (4) does not equal two, distances change when the axes are rotated. This result is exemplified in Figure 15, which shows two dots separated by a distance “c” . Two sets of coordinates are shown, one rotated with respect to the other. Euclidean distance "c" between the dots is 5 regardless of the position of the coordinates. However, by the city-block metric, the distance between the two dots, computed from their coordinates, is 7 for one set of axes and 5 for the other set of axes (see Figure 15 below).

c = ( 4 ² + 3 ² ) ^½ = ( 25 ) ^½ = 5 Euclidean
c = (4 ¹ + 3 ¹ ) ¹ = 7 City-block

c = ( 5 ² + 0 ² ) ^½ = ( 25 ) ^½ = 5 Euclidean
c = ( 5 ¹ + 0 ¹ ) ¹ = 5 City-block

Figure 15. This figure illustrates the different effects of axis rotation on distances determined by the Euclidean metric and city-block metric. Euclidean distance between the dots is constant regardless of rotation. City-block distance changes with a change in axis orientation.

The significance of all this for similarity scaling is that the metric structure of similarity space is initially unknown and it may reflect important properties of inter-object similarity and the cognitive operations involved in computing it. One example is the degree to which a perceiver analyzes and attends to the dimensions that underlie similarity computation.  For example, when human subjects are asked to estimate the similarity between pairs of colored patches varying in hue, lightness and saturation, the matrix of data they produce is best fit by placing the colored patches in a Euclidean space.
Three such patches appear in Figure 16 (the stimuli on the left). Within limitations of the display, the spots differ somewhat in hue (vertical) and lightness (horizontal). One would expect the dissimilarity between the stimuli connected by line c to be given by equation 2, the Euclidean metric.

Figure 16. Example of "Integral" Dimensions

Figure 17. Example of "Separable" Dimensions

This outcome agrees with other evidence that these dimensions are "integral" – that is, observers do not analyze them into component dimensions when they compare them. This is one implication of the rotation-invariance of the Euclidean metric; the psychological space has no preferred axes.

In contrast to the colored patches, forms varying in distinct, separately identifiable aspects may fit best into a city-block similarity space. One example for which this has been demonstrated in humans are circles that vary in size with inscribed radii that vary in angle. This is illustrated by the three forms in Figure 17 (the stimuli to the right). Similarity judgments for these forms were better fit by a city-block than by a Euclidean metric; thus, in Figure 17 the dissimilarity represented by line c should be best approximated by equation 3.
This finding agrees with other evidence that these dimensions are "separable”; it suggests that observers analyze the stimuli into their two prominent aspects, determining a and b, and summing these to get c.   The presence of specific preferred axes corresponds to the non-rotatability of the city-block representation (Shepard 1987, 1991).

This analysis suggests another look at the pigeon similarity data for U/ block forms shown in Figure 14 above. To us these forms appear to be separable; it seems easy to focus attention on the upper U or the lower block. This may be true for pigeons as well; for these forms, unlike other tested items, the city-block metric provided a slightly better fit to the data than the Euclidean fit shown in Figure 14 (D. Blough, 1988).
A final implication of this analysis is that, in the case of separable dimensions (city-block metric), differential attention may alter the observed similarities among objects by changing the relative weights given the dimensions. This is somewhat analogous to the attentional variation of feature weighting that Tversky built into his contrast model, discussed above in the theory section. Other data also suggest that the pigeon may attend to different parts of simple forms (D. Blough, 1993), and this matter awaits further exploration.  For further discussion, see D. Blough (1989, 1991, 1992, 1993).

Features and Cluster Analysis

Many natural objects do not lend themselves very well to description in terms of dimensions, and even fairly simple forms often lack obvious dimensionality, unless they are specifically designed with dimensions in mind, such as those in Figures 12 and 17. Additional problems with the geometric approach were discussed in the theoretical section on features. Classification is an alternative to a spatial representation, and it can also suggest how subjects perceive objects; inspection of the results may suggest the features of the objects that affect this classification.
A number of algorithms have been devised for use in classifying objects on the basis of similarity. As a relatively simple example, consider the popular program, CLUSTER, which may be applied to matrices of object-comparison data like those already considered . Using such a data set, CLUSTER puts the objects into a space with as many dimensions are there are objects, and it then computes interobject distances in this many-dimensional space. Finally, it places objects together in groups based on the squared Euclidean distance between each pair of items.
A specific avian example is provided by a cluster analysis of pigeon letter confusion data (D. Blough, 1985). To construct a cluster tree from distances computed as just described, CLUSTER first grouped U and V, the two letters that were separated by the least distance. This pair is plotted at the bottom of Figure 18. D and O were separated by the next smallest distance, and they are plotted next, and so on for NW, then BP.

However, at this point, the next larger distance was not that between two individual letters. Rather, it was determined that the average of the M-N and the M-W distance was less than any similar value, so at this point CLUSTER added M to NW to create a new group, MNW. This process continued, with the distance criterion for joining a group gradually relaxed until a final single cluster was reached. In Figure 18 the vertical axis reflects the average distance between items in a cluster. Because U and V had the smallest distance, they are the lowest in the picture, and so on.

Figure 18. (Redrawn from D. Blough, 1985).

The potential relation between this cluster analysis and the feature approach discussed in the theory section is evident if one considers the features that seem common to clusters - for example, the small upper loop common to ARBP, the open centers of DOQ, the straight verticals of ITL, and so forth. Human observers have commented that the pigeon tree diagram in Figure 18 looks very plausible to them, and in fact cluster diagrams based on human judgments are quite similar to the pigeon diagram shown here. This suggests that humans and pigeons see these forms in similar ways; the result also is consistent with the idea that an analysis into features plays a role in the perception of such forms, though this is by no means a necessary conclusion. Further discussion of the featural aspects of these data, and related matters, may be found in D. Blough (1985) and Blough & Franklin (1985).
In a different context, Dooling (e.g. 1990) faced the problem of relating the differential behavioral effects of bird calls to the birds' perceptual classification of those calls. To investigate this matter, calls of budgerigars in a large breeding colony were recorded under several conditions (e.g. for "contact calls", the birds were separated; for "alarm calls" they were disturbed. Frequency spectra of samples of the calls are shown in Figure 19). Then, in a discrimination experiment, birds were rewarded for pecking a key within two sec following the successive presentation of two different calls, but they were not rewarded if the calls were identical. The bird's response latency on "different" trials was taken as an index of stimulus similarity. Data analysis was based on a matrix of these latencies that came from pairing each call with every other call in a set. For the data shown in Figure 20 ten calls were used, five collected in a "contact" situation, five in an "alarm" situation. A two-dimensional multidimensional scaling solution and the output of a tree cluster analysis in Figure 20 (Figures redrawn from Dooling et al, 1990). It is evident from these results that the two types of calls are indeed distinct for the birds, and they are relatively similar within the alarm and contact categories. Contact calls are less similar to each other than are alarm calls, and this fits with other evidence that birds use differences in contact calls for recognition of individual birds.
Summary
Measures of similarity exemplified in this section are conceptual and analytic tools of considerable potential for the analysis of cognitive processes in birds and other animals. Some key aspects:
(1) A variety of behavioral data can be used as input (e.g. search speed, same-different discrimination speed and accuracy, generalization functions).
(2) Data on similarities can clarify the functioning of sensory and perceptual systems (e.g. Schneider's pigeon color circle).
(3) Similarities determined by these methods may with some confidence be compared across species boundaries (e.g. pigeon and human letter similarities).
(4) Similarity data may be used in conjunction with behavioral observations to clarify the functional classification of stimuli (e.g. the work of Dooling on bird calls).
(5) Scaling procedures can reveal the transformations that relate behavioral measures of generalization and discrimination to perceptual similarity (e.g. Shepard's Universal Law of Generalization).
(6) The spatial metrics determined by scaling can give information on the analysis into features or dimensions performed by perceptual systems, and can help to predict how attention or other variables may affect this analysis (e.g. results of letter and artificial form experiments).
Next Section: Similarity and Categorization