Theories of Similarity

Similarity may seem to be an irreducible psychological primitive, like “red”, but various theorists have tried to show how it relates to other fundamental considerations. Let us assume, as do most writers, that stimulus objects are internally represented, and that similarity between objects comes from some sort of comparison between their representations. One then confronts several interlocking questions, for example: (1) What information is carried in the stimulus representations? (2) How is this information combined or structured within the representation? (3) How are representations compared in arriving at a “similarity”? (4) Given a set of stimuli, how are their similarities determined and best represented?

The rest of this section summarizes five different theoretical approaches to similarity:

The section is not a comprehensive review of this large and complex subject, but suggests aspects of it that one may encounter in the avian literature or, in the case of feature models, seem promising lines for future research with animals. Different approaches stress one or the other of the above questions, making their appearance in a single list somewhat problematic. Consideration of the last question is postponed to the measurement section of this chapter.

Common Elements Approach

Figure 2. The elements of two stimuli are represented by "x"s. The proportion of elements common to the stimuli (in red) determines their similarity.

One of the best known attempts to suggest processes underlying similarity or generalization follows from the representation of stimuli as collections of elements (e.g. Estes, 1955). In applications to conditioning (e.g. Wagner, 1981), the elements typically carry excitation or inhibition, and they mediate transfer by appearing in more than one stimulus, as suggested in Figure 2. If desired, similarity can be calculated by counting up numbers of common elements relative to other elements and/or by summing their values.
This scheme was developed primarily to model associative processes, and it is ill suited to handle similarity among stimuli of any complexity. However, it can yield useful quantitative predictions of generalization phenomena along a simple continuum by assuming a series of stimuli each of which shares elements with its neighbors. Thus, reinforcement strengthens elements contained in the reinforced stimulus, and responding generalizes to other stimuli to the extent that they contain elements in common with the reinforced stimulus.

Appropriately conjoined with other theoretical elements, this scheme applies beyond simple generalization. For example, I used a variation of this approach, coupled with the basic associative process of the Rescorla-Wagner model (Rescorla & Wagner, 1973) to predict the remarkable phenomenon of dimensional contrast (D. Blough, 1975; see also D. Blough 1983). The

phenomenon is a behavioral "edge effect"; it arises (for example) when all stimuli on a continuum are reinforced except one "negative" stimulus, which appears frequently without reinforcement. Paradoxically, subjects then respond more strongly to stimuli fairly similar to the negative stimulus than they do to more distant stimuli. Data from pigeons that exemplify this effect appear in Figure 3.

A model that generates this dimensional contrast effect is diagrammed in Figure 4.

Here, stimuli adjacent on the abscissa share common elements. The series of graphs suggests how the presentation of successive unreinforced and reinforced stimuli alter the associative strength of these elements, resulting in the development of the contrast “edge” effect.

As just illustrated, stimuli defined in terms of undifferentiated elements may help to clarify some aspects of learning and stimulus discrimination. For most purposes, however, this simple scheme does not adequately represent the similarities among complex stimuli. We turn next to stimulus representations that carry more information.

Template Models

Template Approach

Figure 5. Template models use a point to point correspondence check (green lines) between image-like representations. The chance that the representations will be judged similar may be improved by using a fuzzy representation (here, filtered to remove high spatial frequencies), among other transformations. In this figure, the sharp representation might be from current input, the fuzzy one a representation drawn from memory.

Template models were developed as an answer to the problem of object recognition, and they incorporate at least implicitly the idea of similarity comparison. The representations assumed by template models carry much more detailed information about stimulus structure than do the element representations just described. These models are usually applied to spatially extended visual objects, and their representation can be thought of as being spatially organized. Similarity is based on the degree of correspondence in a point-for-point spatial comparison between the representations being compared, as suggested in Figure 5. Such models have often been dismissed because they seemed incapable of detecting similarities among forms that are displaced, rotated, or enlarged. However, such objections have been countered by evidence for preprocessing operations that may transform forms to comparable orientation or size; there is also evidence that when training is controlled, subjects may not, after all, generalize very easily to objects that are expanded, contracted or rotated (e.g. Tarr & Bulthoff, 1998).

I used a template scheme in a modestly successful attempt to predict data on pigeon alphabetic letter and random-dot form similarities from superposition of fuzzy representations (D. Blough, 1985). As the upper form in Figure 5 suggests, a fuzzy remembered representation of a reinforced target was compared with a representation of the stimulus input. As in many such schemes, the forms were first brought into correspondence around their centers of gravity. The “fuzziness” then allowed the comparator to register degrees of correspondence even when an exact match did not occur at a given point of comparison.

A more general and sophisticated example in the avian literature is the template matching that appears as a component of the theory of pattern recognition proposed by Heinemann & Chase (1990), which is an extension of a general model for learning and behavior (e.g. Heinemann, 1983; Chase & Heinemann, 2001). This model represents stimulus objects by pixels, each of which is characterized by its spatial coordinates, as well as by hue, saturation and brightness. Such representations exist both as current input and as exemplars of previous inputs that are stored in memory. Simulations with the model involve the calculation of the correspondence between a current input and a sample of exemplars drawn from memory. The memory representations are fuzzy in that pixels in the memory representation are represented by bivariate Gaussian distributions, so that points near to, but not exactly corresponding with, input pixels can contribute to a match between input and memory representations. Contributing also to variability in the similarity computation are Gaussian noise that is added to the memory items at the time of storage, as well as the variety in both input and stored items due to variations resulting from stimulus distance and similar factors in the images.

The Heinemann and Chase template model has provided good fits to a variety of data, including generalization to 2-dimensional forms altered in size, position, and rotation (Heinemann & Chase, 1990) and alterations of visual context (Donis, Heinemann & Chase, 1994). It was about as successful in predicting letter and random-dot form similarities as my fuzzy template model (D. Blough, 1985; Heinemann & Chase, 1990). All in all, this seems a promising approach well worth exploring in future research.

Geometric Models

Geometric Approach

Figure 6. This set of birds is arranged in a similarity space of two dimensions. Small birds go near the top, big ones near the bottom. Red birds go to the left, yellow ones to the right. Intermediate values on either dimension find their appropriate places. In the simplest case, the length of a straight line drawn between any two birds would correspond to its empirically determined similarity. The section on measurement goes into that matter in some detail.

The geometric approach stresses the representation of similarity relationships among the members of a set of objects. An individual stimulus object is represented simply by its coordinates in a "similarity space." Similarity is given by distance between objects in this space; the closer together two objects are, the more similar they are. The approach assumes (1) that objects can be represented by values on a few continuous dimensions, and (2) that similarity can be represented by distance in a coordinate space. Figure 6 shows an example of stimulus objects placed in a space defined by size and color dimensions.

The geometric approach to similarity does not go beyond the simplest representation of objects themselves, and in itself has little to say about the cognitive processes through which similarity relationships are determined (but see Shepard, 1987). However, the outcomes of scaling procedures based on the geometric idea can suggest the qualitative nature of the dimensions on which stimulus representations may vary and they can also suggest how that information is combined. Further, such procedures yield information on the relationship between similarity and the data input to scaling algorithms. The most notable example is Shepard's Universal Law of Generalization, which states that the probability of generalization decays exponentially with dissimilarity, and does so in accordance with one of two metrics. A more detailed look at these matters is given below in the section on multidimensional scaling.

Feature Models

Although the geometric approach has theoretical beauty and practical advantages, its assumptions limit its applicability. In a classic article Tversksy (1977) pointed out that both of the two major assumptions of the geometric approach are open to question. First, if dissimilarity is to be represented as a metric distance, it must follow the three metric axioms of minimality, symmetry, and the triangle inequality (see Notes on the Metric Axioms), but data contrary to these axioms have arisen in various experimental situations with humans. Secondly, few stimuli differ from each other in only a few continuous dimensions such as size and color. Most stimuli seem to be more effectively described by the presence or absence of qualitative features. We consider these objections in turn.

Most of Tversky’s examples of the failure of metric axioms are based on human judgments involving abstract relations among objects. For example, minimality implies that an object is most similar to itself, but sometimes an object is identified more often as another object than as itself. Also, the probability that two identical objects are judged “same” varies with the objects judged. Symmetry implies that object A is as similar to object B as B is to A, but this often fails; for example North Korea is judged more similar to China than is China to North Korea.

The failure of these axioms in avian data could be helpful in tests of theoretical accounts, but few examples seem to be available. Minimality seems to be violated when, as sometimes happens in a generalization test, a pigeon significantly and repeatedly responds more strongly to a novel stimulus than to the training stimulus. Symmetry failed in a discrimination task in which pairs of letters appeared on the display screen, one letter as target, the other as distractor. For some letter pairs, performance was distinctly better when one of the two letters was the target than when the other letter was the target (D. Blough, 1985). In that case the asymmetry could arise from a preference or bias; Tversky lists various other sources.

As mentioned above, a second problem with the geometric model is that even with its metric assumptions intact the approach seems inappropriate for objects that seem to differ in a number of qualitative ways rather than in a few ways that correspond to continuous dimensions. For this reason, Tversky and others have assumed that an object is represented by a set of features or attributes. Usually these are binary variables (e.g., voiced or unvoiced consonant) or parts that are present or not (e.g., eyes; tail; horizontal bar), but they may be ordered sets of properties like color or size.

Feature Approach

Figure 7. Representation of two objects that each contains its own unique features and also contains common features. An important aspect of Tversky's model is that similarity depends not only on the proportion of features common to the two objects but also on their unique features. Each letter here represents a feature.

Based on this and several other assumptions, Tversky derived the following relationship:

(1) S(a,b) = xf(a and b) – yf(a-b) – zf(b-a).

Here, S is an interval scale of similarity, f is an interval scale that reflects the salience of the various features, and x, y and z are parameters that provide for differences in focus on the different components.

Tversky's “Contrast Model” (1977) systematizes this feature approach. A central assumption of the model is that the similarity of object a to object b is a function of the features common to a and b ( "A and B"), those in a but not in b (symbolized "A-B") and those in b but not in a (" B-A"). A diagram exemplifying this appears in Figure 7. Note especially that similarity is not just a function of common features, but depends also on features that are unique to each object, and that the relative importance of these varies with the parameters y and z.

This formulation makes principled sense of several characteristics of similarity data that contradict the metric assumptions discussed above. The most troubling is probably asymmetry. This often goes along with task asymmetry; for example, "how similar is A to B" may give a different answer than "how similar is B to A". Avian examples of task asymmetry are the generalization test, where a visible test stimulus is compared with a remembered training stimulus, and the search task, where a remembered, searched-for target is compared with irrelevant distractors. Tversky suggests that when the subject focuses on a particular stimulus, such as the search target, the features of that stimulus are weighted more heavily than the features of alternative comparison stimuli. Thus, in Figure 7, if Object a is the focus of attention, its features (shown in red and green) will tend to be heavily weighted; those unique to it (green) are the ones that asymmetrically affect the similarity computation, for the parameter y in equation (1) is larger than z. If, instead, Object b becomes primary, its features are more heavily weighted and a different similarity S is computed in equation (1).

Apart from attention or the role of the stimulus in the task, the number and salience of unique features can also affect the computation, as suggested by the size and number of features allotted to Object b in Figure 7. Apart from the examples suggested above, little note has been taken in the non-human literature of the considerable analytic possibilities that Tversky’s approach may suggest for avian cognition.

Geon Theory

Like template theory, Biederman’s geon theory (e.g. Biederman, 1987) relates primarily to object recognition and centers on the representation of visual form. According to geon theory, stimulus objects are represented by primitive shapes or elementary parts, like cylinders, bricks, or cones, that stand in particular relations to one another. According to the theory, generalization between two objects will occur if the same parts and relations are visible in both, even if details of the images of the various parts change considerably. For example, if an object is rotated but none of parts or relations is obscured the object is still recognizable and, presumably, the rotated and unrotated images are similar. Wasserman and his colleagues have made a notable attempt to apply this theory to pigeon discrimination and transfer among visual stimuli, and an account that compares it with other approaches, particularly template theory, may be found in Wasserman et al, 1996. Though it is clearly relevant to accounts of similarity, geon theory does not pretend to be a general theory of similarity. (For an extensive discussion of geon theory, see Kirkpatrick, 2001)