Exemplar Memory and Discrimination

Avian Visual Cognition

Sheila Chase & Eric G. Heinemann
Department of Psychology, Hunter College, CUNY

The role of memory in cognitive processes of both human and non-human beings has been the subject of much scientific study in recent years. The study of human memory has a long tradition that goes back to the remarkable work of Ebbinghaus (1885). The study of memory in species other than ours began quite recently. The reason for the long neglect of memory may be that many investigators of learning processes preferred the framework of behaviorism.

This chapter presents a specific theory of memory called “exemplar memory.” As the name implies, this memory stores exemplars or specific instances of past events. It is often contrasted to memories that store rules, algorithms, or abstract representations of the elements that objects have in common. We shall show in this chapter that an exemplar memory provides a comprehensive account of much of how birds acquire knowledge and act upon this knowledge.

The first part of this chapter will describe results from the domains of acquisition, discrimination, categorization, generalization, and pattern recognition, that we think any formal model of information processing by birds should be able to explain. The second part describes our exemplar model and shows how it represents each of the phenomena and processes mentioned above. The final section gives some examples which show in detail how the model is applied in some specific cases.

Chapter Outline

I.   Overview
      The Importance of Discrimination Training
      The Characteristics of Exemplar Memory

II. Representative Experiments
       The Situation
       Acquisition of a Discrimination
       Categorization and Generalization
       Effects of Increasing Stimulus Dimensionality
       Pattern Recognition

III. The Natural Intelligence Model
       Representation of Events in Memory
       The Decision Rule

IV. Applications of the Model
       Acquisition of a Discrimination
       Categorization and Generalization
       Effects of Increasing Stimulus Dimensionality
       Pattern Recognition

V. Conclusions
VI. References

I. Overview

In order to survive, organisms must respond appropriately in an enormous number of stimulus situations. They must learn which stimuli provide information relevant to their goals, and use that information to attain these goals. How is this accomplished? The purpose of this chapter is to show how a simple exemplar model of memory and decision making can account for the acquisition and use of information. We express the model as a computer program that we refer to as the Natural Intelligence Model (NIM). This model has been developed to account for the behavior of pigeons seeking food under controlled conditions. However, we believe that, with modification and expansion, an exemplar model such as NIM is applicable to the analysis of behavior in more complex situations, as well as behavior motivated by incentives other than food, and that it is applicable to other organisms as well as pigeons.¹ The model is a developing entity that we make available here as an executable program. The program may be used to compare discrimination of stimuli that vary in discriminability (d') along one or two dimensions including outline drawings.

The Importance of Discrimination Training
As an exemplar model of memory, NIM assumes that individual events or instances, rather than an abstraction of the common features of these instances, are represented in memory. Whether or not a particular exemplar is stored in memory depends on the consequences of the organism’s interaction with its environment; that is, information stored in exemplar memory is selected as a result of differential reinforcement. The term reinforcement is used here in a sense similar to that proposed by Guthrie (1959), and Estes (1950), that is, reinforcement is the co-occurrence of stimulation, behavior, and the consequences of the behavior. Following this definition, we define exemplars as representations of sensations induced by environmental stimuli, the subject’s behaviors in the presence of these stimuli, and the consequences of these behaviors. The term differential reinforcement refers specifically to the differences in the consequences of the behavior that occurred in the presence of specific stimuli.
Differential reinforcement is the outcome of the training procedure referred to as discrimination training. There are two types of discrimination training procedures. In the “go, no-go” procedure the response is reinforced in the presence of one stimulus (the positive stimulus), but not in the presence of another (the negative stimulus). This procedure provides a signal (stimulus) that a behavior (response) will or will not be reinforced. For example, a pigeon may be trained to peck at a disk (pecking key) only when it is illuminated. The second type of discrimination training involves choice between two or more alternatives. For example, more than one key may be illuminated and the pigeon must decide which key to peck. If the correct key is pecked, reinforcement follows; if the incorrect key is pecked both keys are darkened and neither choice will provide reinforcement until the keys are illuminated once again. In the experiments to be described here, we shall deal primarily with situations involving choice between two alternatives.
Numerous experiments (e.g. the often cited experiment of Jenkins & Harrison, 1960) have shown that discriminations established by differential reinforcement are more precise than those obtained with non-differential procedures in which a selected response is simply reinforced repeatedly in the presence of the chosen stimulus. Differential reinforcement affects the detection of the discriminative stimuli. It determines whether or not changes in these stimuli will be noticed and remembered. It figures importantly in all aspects of learning, from discriminations based on tonal frequency, to pattern recognition and to categorization.
The Characteristics of Exemplar Memory
The term “exemplar memory” is most often used to account for the results of experiments in which it is evident that specific details of the discriminative stimuli are remembered. This is most apparent in experiments in which the stimuli are complex such as the pictures of natural scenes generally used in experiments requiring sorting according to categories such as tree, people, fish or even paintings by artists whose work has a characteristic style (Watanabe, 1995). In such experiments, a set of pictures illustrative of the category, and a set of pictures in which exemplars of this category are absent or are exemplars of another category, are selected. These usually are quite varied so that the defining characteristics of each set cannot be clearly specified. Following training, in which differing responses to these two sets of pictures are reinforced, additional pictures, representative of these two sets, are presented. Almost invariably excellent generalization to these novel instances is found. It appears that pigeons are able to categorize the stimuli according to the experimenter’s definition of the category in question. This has lead to the suggestion that the pigeon bases its choice on an “overarching principle,” “family resemblance” or a “polymorphous concept.” (see Huber, 2001, and Urcuioli, 2001, for examples of such experiments and a review of the literature). Whether or not categorization is based on abstraction of a unifying concept is yet unresolved (see Young & Wasserman, 2001). What is of interest here is the finding that, in many cases, pictures may be categorized on the basis of specific details that have no relation to the concept in question.
For example, Greene (1983), in an attempt to determine whether pigeons can recognize that the same scene was shown a second time, discovered that pigeons are able to identify slides that differ only in very minor details. For this experiment photographs of the same scene were taken by hand in rapid succession with the intention of minimizing the availability of cues other than the intended concept, “first presentations are positive, repetitions negative.” The same slides were used for first and second presentations of the scene. After the task was learned the second slide was presented first. This manipulation should not have disrupted performance if the “repetition concept” had been acquired. This was not the case. The disruption in performance indicated that the pigeons solved the task largely by relying on subtle differences in the appearance of the slides. Greene concluded that “pigeons are extremely good at sorting out stimuli based on very small perceptual features and on the association of these features with reinforcement.” (p. 216). By arbitrarily designating the pictures “positive” (reinforced) or “negative” (not reinforced) Greene and Vaughan (1983) also found that it is not necessary to provide a unifying concept for pigeons to categorize pictures on the basis of their relation to past reinforcement. In fact, Vaughan and Greene (1984) showed that pigeons were able to discriminate between 160 pairs of such slides and perform this discrimination when tested two years later. Further evidence that pigeons remember details of the pictures comes from experiments in which the background of the pictures controlled behavior more than the intended unifying concept (Greene, 1983; Jitsumori, 1996).
In the experiments described above, the go, no-go discrimination training procedure was used, with rate of response serving as the measure of learning. Pigeons can be trained to respond differentially to pictorial stimuli in a choice situation as well. Typically, the trial is initiated by the appearance of the discriminative stimulus. A peck at this stimulus illuminates the choice keys signaling the opportunity to make a choice. In an experiment similar to that of Vaughan and Greene (1984), Heinemann, Ionescu, Stevens and Neiderbach (unpublished) trained pigeons to peck the left key in the presence of 320 slides and the right key in the presence of another set of 320 slides. As in the Vaughan and Greene experiment, individual slides were arbitrarily assigned as “correct” for each of the keys. After substantial training, during which the number of slides was gradually increased, errors fell towards a proportion of .20. The training data are shown in Figure 1.
A model of memory must take into account findings, such as those of Vaughan and Greene (1984) which show that pigeons can remember a large number of pictorial stimuli arbitrarily assigned to two categories (the limits are yet to be established). How can these data be accounted for? A possible solution to this problem is to reduce the details of the stimulus to a set of defining features or analyzers (e.g. Jitsumori, 1993, also see the chapter by Huber (2001) for a description of feature models). In NIM we do not take this approach but instead treat the remembered stimuli as isomorphic with the original stimulation.Before presenting the model we shall describe some results obtained in representative experimental situations.
II. Representative Experiments
The Situation
In our earlier experiments we used stimuli that even well practiced human observers would have difficulty telling apart. To describe the discriminability of such stimuli we used d', the measure of sensitivity that is used in signal detection theory. If the sensations that are induced by a very large number of stimulus presentations fall into a normal distribution, then d' is the difference in standard deviation units between the means of the sensation distributions induced by two different stimuli. As a unit-free (dimensionless) quantity d' provides a means of describing differences in sensitivity between and within sense modalities.
In order to understand how information is organized and retrieved from memory, we chose to work with a very simple situation. Our subjects were hungry pigeons that were foraging for food in the sparse, controlled environment of the operant chamber. We used artificial stimuli to minimize the effects of previous experience with the stimuli encountered in the experimental situation. These stimuli were lights or sounds that differed from each other only in intensity, or they were outline drawings in the form of dot-matrix patterns that were displayed on a computer monitor. To simplify the situation further, during discrimination training, pecks on one of two illuminated disks (keys) were followed by a period of access to grain (usually for 2 seconds). This procedure gives us a relatively direct measure of probability of response, namely, the relative frequencies of occurrence of the alternative choices.
The birds are initially trained to peck an illuminated key regardless of its position. Trials usually start with illumination of a centrally located area (a disk or a more extended surface), the display key. A peck on the display key produces one of the stimuli to be discriminated, and illuminates the two choice keys, one to the left and one to the right of the display key. A peck on the choice key designated “correct” provides access to food. Each error is followed by repetition of the trial until the pigeon pecks the correct key. Trials are separated by a short period (usually 10 seconds) during which all keys are dark and the behavior of the pigeon has no programmed consequences. Typical daily sessions consist of 80 trials. Figure 2 is an illustration of a pigeon in the type of apparatus used in these experiments.

Acquisition of a Discrimination
Figure 3 shows acquisition curves for birds that were trained to discriminate between two sounds that differed in intensity by 5, 3 or 1 dB. These curves are typical of those of other birds that were trained under these conditions. The size of the difference between the training stimuli (discriminability) affects acquisition in three ways: (1) The smaller the difference the longer the presolution period, the initial period of chance performance seen here as the flat region early in training when the curves for correct and incorrect choices intertwine. (2) The smaller the difference the slower the discrimination develops. (3) The smaller the difference the smaller the separation between curves for correct (solid lines) and incorrect (dotted lines) responses after extensive training.

Categorization and Generalization
In categorization experiments, more than one stimulus is associated with each response. The stimuli may be as simple as intensities of lights or sounds, or as rich as the colored photographs used in concept experiments. In a categorization experiment that used unidimensional stimuli, Heinemann and Avin (1973, Experiment 1) trained pigeons to categorize sound intensities as “soft” or “loud” by presenting, on each trial, one of 10 levels of white noise which ranged from 60 to 96 dB re. 0002 dyne/cm². Pecks on one key, R1, were reinforced in the presence of five sound intensities that were less than or equal to 83 dB and pecks on the other key, R2, were reinforced in the presence of five intensities equal to or greater than 86 dB. Click here for interactive simulation of this procedure using brightness instead of sound. If categorization were perfect, the proportion of R2 responses would be zero for stimuli at or below 83 dB and at 1.0 for stimuli at or above 86 dB. Figure 4 shows choice curves (proportion of R2 responses as a function of three-day blocks of training) for four pigeons that were trained to make this discrimination. At first, (days 1-3) the proportion of responses to each of the choice keys was virtually independent of sound intensity. The flat curves show that birds were not processing the sound intensities, that is, they were in the presolution period. As training continued, the difference between the lower and upper asymptotes of the choice curves increased, and the curves became steeper.

It has been suggested that “abstraction of a concept” is required to account for the excellent generalization to new exemplars of categories used in training. The need for that notion is less apparent if generalization involves categorization of simple stimuli (e.g. lights or sounds that differ only in intensity) than when the stimuli are photographs. In experiments with simple stimuli typically only two values are used in training. Responding to additional stimuli is examined in the absence of differential reinforcement. Figure 5 shows the results of one such experiment. Heinemann, Avin, Sullivan and Chase (1969) trained pigeons to discriminate between two levels of white noise. The stimulus differences were 29 dB (top row), 6 dB (middle row) and 2.3 dB (bottom row). After choice accuracy ceased to improve, the birds were presented with 11 additional sound intensities. The results of this generalization test are very similar to those obtained at the completion of training in the categorization experiment of Heinemann and Avin (1973). Although the 11 new intensities of sound were encountered for the first time during the generalization test, the birds responded to the test stimuli much as they did to the stimuli in the categorization experiment, that is, they tended to divide the continuum of sound intensities into two categories, “soft” and “loud.”

Effects of Increasing Stimulus Dimensionality
Chase & Heinemann (1972) and Heinemann and Chase (1970) also trained pigeons with compounds made up of lights and sounds differing in intensity. They found that fewer errors are made when values on both dimensions are correlated with the correct key choice. Following are some examples of data from these experiments.
In the Chase and Heinemann (1972) experiment, R1 was correct in the presence of a soft sound paired with a dim light; R2 was correct in the presence of a loud sound paired with a bright light. For all birds the sound intensities used in training differed by 35 dB. The light intensities differed by 0.6 log units for three birds and by 1.4 log units for three other birds. After training, the pigeons were tested for generalization to eight sound intensities, each of which was combined with eight light intensities, a total of 64 test stimuli. Figure 6 shows the proportion of R2 responses that were made during the generalization tests. With the smaller light intensity difference (top row) the proportion of R2 responses increases with both light and sound intensity, evidence of control of response choice by both dimensions. The sound gradients are virtually flat following training with the larger light intensity (bottom row) although the sound intensity difference used in training was the same for all birds. This “overshadowing” of the less discriminable by the more discriminable stimulus in a compound was first described by Pavlov (1927) in classical conditioning and has often been attributed to limited attentional resources (e.g. Sutherland & Mackintosh, 1971). In Chase and Heinemann (1972), we show that it was the optimal use of both dimensions that resulted in flatter gradients for sound in the presence of a larger light intensity difference. This was not a failure of attention.
In the Heinemann and Chase (1970) experiment, the training stimuli were four compounds: These were a soft sound paired with a dim or bright light and a loud sound paired with a dim or bright light. For the three pigeons, for whom results of the generalization test are shown in the top row of Figure 7, R1 was correct only in the presence of a soft sound that was accompanied by the dim light. For the three pigeons, whose results of the generalization test are shown in the bottom row, R2 was correct only in the presence of the loud sound paired with the bright light. The shape of the generalization surface shows that choices were based on the combination of sensations associated with both dimensions.

Pattern Recognition
We have done a number of experiments in which pigeons were required to categorize outline figures that were shown as dot-matrix patterns on a computer monitor. In one such experiment, Donis and Heinemann (1993) trained pigeons to discriminate between two lines tilted 45 degrees from the vertical that were either shown alone or in a context provided by the addition of an identical element, namely an L-shaped form. These stimuli are shown in Figure 8. Stimuli 1 and 3 were correct for the left key; stimuli 2 and 4 were correct for the right key. Click here for interactive demonstration of this procedure.

Twelve birds served in these experiments (for eight birds the stimuli were white on a black background and for four the stimuli were black on a white background). For 11 of the 12 birds, the proportion correct was consistently higher for the inclined lines alone (stimuli 1 and 2) than for these lines in context (stimuli 3 and 4). The data for the birds trained with the figures on the black background are shown in Figure 9.
These results are in sharp contrast to those found for human observers in both accuracy (Enns & Prinzmetal, 1984) and reaction time (Pomerantz, Sager & Stoever, 1977). For example, the mean reaction time of observers required to detect as quickly as possible which of the four quadrants in an array (4 items in each quadrant) was different from the others was 641 ms for the lines in context and 1,480 ms for the oblique lines alone (Pomerantz et al., Experiment 4). One possible explanation is that the lines in context could be named (Stimulus 3 an “arrow” and Stimulus 4 a “triangle.”). Enns and Printzmetal suggest that the context may provide a redundancy gain due to the creation of a dimension such as “triangle-arrow” that is correlated with the angular orientation of the single lines. Pigeons had to depend on vision alone. In the discussion of NIM we shall show why, we think, the addition of the uninformative context in the form of an L-shape increased the number of errors made by the pigeons.
III. The Natural Intelligence Model (NIM)

Representation of Events in Memory
NIM treats remembered events as records. Each remembered event is represented in memory by three components: the discriminative stimulus, the behavior, and the outcome of the behavior. In developing the model we have concentrated on the treatment of the discriminative stimuli with minimal attention to the stimuli produced by the responses and the results of these responses. Therefore, in the description of NIM that follows, we identify choice behaviors, e.g., pecking on a choice key located to the left or right of the display key, by the labels R1 and R2. Similarly, we refer to the outcomes of behavior simply as “positive” or “negative.”

Figure 10. Sensations induced by stimuli that differ in value on a single dimension are shown here as two overlapping distributions of sensations.

Following conventions adopted from signal detection theory, sensations are treated as varying along a normal deviate (z-score) axis. Differences in the sensations associated with the discriminative stimuli are represented by the distances between the means, d', of the distributions of sensation associated with each of the discriminative stimuli. The relation between our present analysis and signal detection theory is discussed in Chase and Heinemann (1991). In the simplest case, for example, for two stimuli that vary along a single dimension, such as light intensity, the sensations associated with these stimuli may be illustrated as shown in Figure 10.
Sensations arising from stimuli that differ in value on two dimensions may be visualized either in three dimensional space or as contours showing equal probability densities. In Figure 11 below, the bivariate distributions representing the two light-sound compounds used as training stimuli in the Chase and Heinemann (1972) experiment are shown in three dimensions. In Figure 12 below the bivariate distributions representing the four light-sound compounds used as training stimuli in the Heinemann and Chase (1970) experiment are shown by contours of equal probability density.

Figure 11. Sensations produced by compound stimuli, such as those used in the Chase and Heinemann (1972) experiment, are represented here by two joint probability (bivariate) distributions. The marginal distributions for the two light-sound compounds used in training are also shown. X3 and X6 refer to the two light intensities and Z1 and Z8 to the two sound intensities. The line that passes through this surface shows the optimal decision boundary for the birds whose data are shown in Figure 6 (top row).

Figure 12. Sensations produced by compound stimuli, such as those used in the Heinemann and Chase (1970) experiment are represented here by four joint probability distributions. The four concentric circles are isodensity contours for the four joint probability density distributions of the training stimuli. The curve that passes through this surface shows the optimal decision boundary for the birds whose data are shown in Figure 7 (top row).

The sensations associated with the points on dot-matrix figures, such as those illustrated in Figure 8, are represented in the same way as compound stimuli. Each point is treated as a bivariate distribution with its location defined by its x,y coordinates in Euclidean space. For dot-matrix figures only the centers of the bivariate distributions are shown (see Figure 13). The isodensity contours of the bivariate distributions shown in Figure 12 were omitted to avoid clutter.

We assume that, when a motivationally significant event occurs, the sensory information that accompanies this event is stored as a record in exemplar memory (XM). As illustrated in Figure 14 (below), each record shows the response that was made, the sensation induced by the stimulus in the presence of which this response was made and the outcome of the event. The + in this illustration indicates that food was obtained (the motivationally significant event for a hungry pigeon.). On each trial a stimulus is presented. We refer to the sensations produced by this stimulus as the current input. Given a specific current input, the subject decides which key choice is most likely to be followed by reinforcement (here access to grain). According to NIM, the decision is made as follows: On each trial a few records are retrieved from randomly determined locations in XM and placed in working memory. The number of records in working memory appears to differ somewhat among individuals (see Heinemann, 1983a) and, more significantly, among species (see Chase, 1989). Chase’s simulations have shown the number to be relatively small (between 3 and 18).2

Figure 14. Diagram representing storage of information in exemplar memory

Figure 15. Five records of remembered events in working memory. Two are records of the sensation when R1 was reinforced (green) and three of records of sensation when R2 was reinforced (blue).

We assume that the sensation induced by the stimuli represented by records in working memory differ somewhat from the sensations originally experienced (we describe such records as “noisy”). The distortions in the remembered sensations may result from trial-to-trial differences in the experienced sensation, as well as events that occur during storage and/or retrieval from XM. It is assumed that these changes vary normally, with small deviations common and large ones rare. Figure 15 shows five remembered sensations that represent stimuli that vary along a single dimension. The two records, shown in green, represent remembered sensations when R1 was reinforced; the three records, shown in blue, represent the remembered sensations when R2 was reinforced.

The Decision Rule
The choice of response (to make R1 or R2) is based on a comparison of the sensation experienced, the current input (shown in Figure 15 by an arrow), and the information provided by the records in working memory. Given only this information, the response made is the one that is most likely to be correct. The process of response selection amounts to obtaining the sum of the heights (probability densities) of the R1 curves above the point that represents the current input, doing the same for the R2 curves, and making the response for which the sum is greater. We refer to the sum of the densities as the decision quantity (DQ) for that response. In Figure 15 the probability density at the current input for the two R1 records is .30 and .11 (sum = .41). For the three R2 records the probability densities are .20, .14 and .08 (sum = .42). R2 is made.

Choice proportions are obtained over many trials. The records in working memory differ from trial-to-trial. The remembered sensations cluster around the sensations associated with the training stimuli. Therefore, small stimulus differences used in training are represented in working memory by closely spaced records with different response labels. In this case errors will occur frequently (for the same current input the DQ for the correct response will be higher on some trials and lower on others). Large stimulus differences are represented in memory by separate clusters of records representing each response; the separation between the clusters is dependent upon the stimulus difference. When the stimulus difference is large the DQ for the correct response tends to be consistently higher than for the incorrect response. In such situations errors are rare.
For stimuli that vary along two dimensions, e.g. stimuli that vary in light intensity and in sound intensity, or outline drawings expressed as dot-matrix patterns, the DQ is computed as follows: At each point on the current input calculate the arithmetic mean probability density contributed by each point on the memory record. This will yield as many means as there are points on the current input. The DQ is the geometric mean of these densities. In our simulations these events occur sequentially. However, implementation of the model in real time (e.g. in the pigeon) probably occurs in parallel.

Figure 16. A frame from NIM showing a point (P) on a current input (purple) being compared to a point (PR) on the record (white circle).

The decision process may be observed by using the NIM simulation program included in this chapter. For illustrative purposes, stimuli similar to those used by Donis and Heinemann (1993) are embedded in the program. Figure 16 is a single frame from this program. In this frame, the current input corresponding to the stimulus to be identified, an “arrow,” is represented by the pattern of yellow squares. The remembered sensation (record) to which it is being compared, a “line tilted to the left,” is superimposed upon the current input. Blue circles represent the points on the record. Note that the positions of the circles on the record deviate from that of a straight line. This reflects the assumption that sensations are not remembered exactly (see Figure 15). Only the mean of the bivariate distribution corresponding to each record point is shown. In this frame a point on the current input (shown in purple) is being compared to a point on the record (shown in white). The shortest distance between these points in z-scores is displayed as well as the probability density at the current input contributed by the record point.
If the sample of records is uninformative, that is, none of the records is “reasonably similar” to the stimulus to be identified, a new sample is drawn. If repeated sampling fails to yield useful information, the pigeon bases its choice on reinforcement probability alone. In the current form of the program, an uninformative sample is one in which none of the points on any record in the sample is within 5 z-scores of any point on the current input. A sample is also uninformative if there is a point on the current input that is more than 5 z-scores from any point on all records in the sample.3
When a motivational significant event occurs, a record of that event is placed in a randomly determined location in XM. This record replaces the record that previously occupied that location.
This is the basic form of NIM. Without further assumptions the model provides a quantitative account of the choice behavior of pigeons in a wide range of situations. In comparing simulations produced by the model to the data only two parameters are allowed to vary — the number of records in working memory and the discriminability, the distance between the means of the sensation distributions. The number of records in working memory seems to depend upon the processing capacity of the organism. The distance between the means of the sensation distribution is related to the organism’s sensitivity to stimulus differences.
IV. Applications of the Model
Acquisition of a Discrimination
Early in training, especially if the discrimination is difficult, there is a period, the presolution period, during which there is no evidence that the selection of the response is controlled by the potential discriminative stimuli (see Figures 3, 4 and 17). It appears that during this period the pigeon discovers which of the many stimuli present when a given response was reinforced is likely to be a good predictor of reinforcement in the future. Though there is no evidence of learning during this period, the presolution period is an important stage in acquisition. Information processed during the presolution period makes possible selective storage of information in exemplar memory.
We assume that the response that is selected is based on the most informative records in working memory. Until the end of the presolution period, the only information available when a decision is made is the proportion of trials on which each choice was reinforced. After the presolution period, the relatively uninformative records in XM are gradually replaced with records that contain information about the discriminative stimuli. As the number of informative records in working memory increases, fewer errors are made. This is the result of better representation of the frequency with which each response was reinforced and reduction in the variability associated with remembered sensations (an effect equivalent to the decrease in the standard error of the mean as the sample size is increased).
The rate of learning and asymptotic accuracy depends primarily upon discriminability. Errors are inevitable if identical sensations are produced by stimuli that require different responses. The proportion of trials on which this occurs varies inversely with the size of the stimulus difference (see Figure 3) as well as the individual’s sensitivity to such stimulus differences.
Rate of learning is also affected by the capacity of exemplar memory. Simulations of choice behavior in situations as varied as probability learning, reversal of a discrimination, abolition of a discrimination as well as variations in stimulus discriminability showed that the capacity of exemplar memory of pigeons seeking food in the simplified environment of the operant chamber is about 1200 records.4 The number of records in working memory varied somewhat but for most simulations the data were well fit with a sample size of 10. The biggest difference among birds was in their sensitivity to stimulus differences, in this case the difference in the intensity of white noise (see Figure 17).

Figure 18. Five records of remembered sensations (the green curves are for R1 and the blue curves for R2). Hypothetical test stimuli are shown as triangles (black for new stimuli, green and blue for stimuli used in training) at different positions along the sensation axis.

Categorization and Generalization
    According to NIM the same processes are operative in categorization and generalization as during discrimination training. In discrimination training only two stimuli are presented. In categorization and tests for generalization more than two stimuli are presented (see Figure 18). Categorization tasks differ from tests for generalization only in that in the former case the remembered sensations are more varied. In categorization experiments that use unidimensional stimuli, choice curves for R2 will rise from proportions of zero to 1.0. No errors are made in categorizing stimuli near the extremes of the continuum if the sensations induced by these stimuli do not have different response labels. Choice curves obtained in generalization tests that follow training with two similar stimuli (such as those for the birds shown in the bottom row of Figure 5) tend to have asymptotes other than at zero and 1.0. According to NIM, if the test stimuli are far from the training stimuli, the sample of records in working memory may be uninformative on many trials, that is, all probability densities at the current input for the test stimulus are close to zero. Under these conditions the pigeon bases its choice on response probability alone, that is, it “guesses.”
Effects of Increasing Stimulus Dimensionality

Our work with compounds has shown that, at least for light and sound intensities, pigeons use both dimensions in response selection (see Figures 6 and 7). In Figure 19 two compounds such as those used in the Chase and Heinemann (1972) experiment are shown. The d' difference between the stimuli on each dimension can be thought of as the legs of a right triangle. The distance between the means of the compound is the hypotenuse of this triangle. The improvement in discriminability of a compound in which d' on each dimension is equal is increased by a factor of the square root of 2. Increasing the dimensionality of the stimuli, thus, increases d' between stimuli that require different responses. This results in fewer errors.

We have shown in our signal detection analysis of discrimination learning (Chase & Heinemann, 1972) that this increase in d' is reflected in the decision strategy of pigeons trained with stimuli varying along two dimensions. The birds use both dimensions when deciding which key to peck. The weight given to each dimension depends upon the relative discriminability, d', of the stimuli on the two dimensions. Equal weight is given to both dimensions when d' for the elements is equal — this was approximately true for the birds whose data are shown in the top row of Figure 6. The relative difference in discriminability determines the relative weight given to each dimension — the flatter gradients for sound intensity shown in the bottom row of Figure 6 reflect the greater weight given to the light intensity difference, not inattention to sound. According to signal detection theory decisions are based upon a criterion, a decision line or boundary, separating the continuum of sensations into a region in which R1 is more probable from that in which R2 is more probable. The response made is the one for which the probability density is greatest, that is, the response that is most likely to be correct. This is not unlike the decision rule used by NIM (see Figure 15). Statistical decision theory provides a mathematical description of the data, an exemplar model such as NIM shows how these decisions may come about. 5
Pattern Recognition
Our treatment of dot-matrix patterns follows from our treatment of compounds; that is, discriminability is based on distances in Euclidean space. The decision rule we use to compare two dot-matrix patterns may be followed step-by-step by using the
version of NIM that may be downloaded here. (A more general version of this program is used to simulate performance under other conditions, and thus compare NIM’s performances to available data and to test theoretical predictions). This demonstration program uses patterns similar to those used in the experiment of Donis and Heinemann (1993) – see Figures 13 and 16, and may be used to simulate performance under conditions in which the stimuli are remembered exactly (noiseless memories), under conditions in which a single record of each of the stimuli is used in the comparison process and under conditions in which records are retrieved randomly from exemplar memory. The effects of varying the number of records in the sample that was retrieved from exemplar memory, can also be examined, as can the discriminability of the stimuli about which decisions are made.
Figure 20 shows the effect of varying discriminability and the number of records in the sample, on acquisition curves for the patterns used by Donis and Heinemann (1993). These simulated curves may be compared to the acquisition curves of the real birds whose data are shown in Figure 9. The simulation shows that the difference between the accuracy for the lines alone and the lines in context is robust. In spite of large differences in discriminability (SP) and in the number of records in the sample (5 or 10), the proportion of correct responses in the presence of the lines alone is consistently greater than that for the lines in context. This results primarily from the decreased differences between the DQs in the presence of identical parts of the patterns (the added L-shape). According to NIM, the more the similarity in the DQs for stimuli that require different responses, the greater the probability of an incorrect response choice.

NIM has successfully simulated the results of other pattern recognition experiments with pigeons as subjects. These include simulations of the confusion matrixes for the letters of the alphabet and random dot patterns obtained by Blough (1985), simulations of the effects of distance of irrelevant contexts on accuracy obtained by Donis, Heinemann, and Chase (1994) and the effects on accuracy of distortion of patterns composed of elementary forms obtained by VanHamme, Wasserman, and Biederman (1992). Although these results are encouraging, other interpretations of the data are possible (see, for example, Kirkpatrick’s (2001) chapter in this volume for a discussion of the VanHamme et al. experiment). NIM was originally developed to account for discrimination and generalization of diffuse stimuli. Our success in applying these same principles to pattern recognition is encouraging, however, whether NIM provide the most parsimonious and general model of pigeon visual cognition awaits much further testing and refinement.
V. Conclusions
We have shown that a model such as NIM provides a quantitative account of how exemplar learning may work. By expressing it in the form of a computer program, we have been able to examine data in detail. In some cases this has led to unexpected findings which the model was originally not designed to deal with, e.g., Donis and Heinemann’s (1993) finding that a redundant context makes it harder for pigeons to identify the orientation of oblique lines. Apart from the application of the model to gain deeper understanding of specific problems, we have found NIM, as a quantitative model, to be extremely useful in improving and fine-tuning itself and specifying where it needs modification.
So far we have focused our analyses on the sensory effects produced by the discriminative stimuli. While an exemplar model such as ours provides a good account of choice behavior, we have not yet dealt with some of most interesting demonstrations of the cognitive abilities of birds that are described in this cyberbook. Among these are, for example, emergent relationships resulting from common responses or outcomes such as those described by Urcuioli (2001). It would be interesting to see whether some of the findings Urcuioli describes could be accounted for by an exemplar model if the responses and outcome of the behavior were treated in more detail than we do here. Can an exemplar model deal with categorization of stimuli as “same” or “different” (Cook, Katz, & Cavoto, 1997; Young and Wasserman, 2001) if the relationship between the stimuli cannot be described in terms of perceptual similarity? These are just a few of the challenging questions raised in this book.

How to Download PC Demonstration Program of NIM

Just click here to download program to your local machine. It will be saved as "nimdemo.exe" on your PC (sorry, Mac users!). The program can then be run off-line to do your simulations.

VI. References
    Ashby, F. G. & Waldron, E. M. (1999). On the nature of implicit categorization. Psychonomic Bulletin & Review, 6, 363-378.
    Blough, D. S. (1985). Discrimination of letters and random dot patterns by pigeons and humans. Journal of Experimental Psychology: Animal Behavior Processes, 11, 261-280.
    Blough, D. S. (1996). Error factors in pigeon discrimination and delayed matching. Journal of Experimental Psychology: Animal Behavior Processes, 22, 118-131.
    Chase, S. (1983). Pigeons and the magical number seven. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior, Vol. 4: Discrimination processes (pp. 37-57). Cambridge, MA: Ballinger.
    Chase, S., & Heinemann, E. G. (1972). Choices based on redundant information: An analysis of two-dimensional stimulus control. Journal of Experimental Psychology, 92, 161-175.
    Chase, S., & Heinemann, E. G. (1989). Effects of stimulus complexity on identification and categorization. The International Journal of Comparative Psychology, 3, 165-181.
    Chase, S., & Heinemann, E. G. (1991). Memory limitations in human and animal signal detection. In M. L. Commons, J. A. Nevin, & M. C. Davidson. (Eds.) Signal detection: Mechanisms, models, and applications (pp. 121-138). Cambridge: Ballinger.
    Cook, R. G., Katz, J. S., & Cavoto, B. R. (1997). Pigeon same-different concept learning with multiple stimulus classes. Journal of Experimental Psychology: Animal Behavior Processes, 23, 417-433.
    Donis, F., & Heinemann, E. G. (1993). The object-line inferiority effect in pigeons. Perception and Psychophysics, 53, 117-122.
    Donis, R.J., Heinemann, E.G. & Chase, S. (1994). Context effects in visual pattern recognition by pigeons. Perception and Psychophysics, 55, 676-688.
    Ebbinghaus, H. (1885). Über das Gedächtnis. Leipzig: Duncker & Humbolt.
Enns, J. T., & Printzmetal, W. (1984). The role of redundancy in the object-line effect. Perception and Psychophysics, 12, 278-286.
    Estes, W. K. (1950). Towards a statistical theory of learning. Psychological Review, 57, 94-107.
    Guthrie, E. R. (1946). Psychological facts and psychological theory. Psychological Bulletin, 43, 1-20.
    Heinemann, E. G. (1983a). A memory model for decision processes in pigeons. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior, Vol.4: Discrimination processes (pp. 3-21). Cambridge, MA: Ballinger.
    Heinemann, E. G. (1983b). The presolution period and detection of statistical associations. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior, Vol.4: Discrimination processes (pp. 21-37). Cambridge, MA: Ballinger.
    Heinemann, E. G., & Avin, E., (1973). On the development of stimulus control. Journal of the Experimental Analysis of Behavior, 20, 183-195.
    Heinemann, E. G., Avin, E., Sullivan, M. A., & Chase, S. (1969). Analysis of stimulus generalization with a psychophysical method. Experimental Journal of Psychology, 80, 215-224.
    Heinemann, E. G., & Chase, S. (1970). Conditional stimulus control. Journal of Experimental Psychology, 84, 187-197.
    Heinemann, E. G., & Chase, S. (1990). A quantitative model for pattern recognition by animals and people. In M. L. Commons, R. J. Herrnstein, S. M. Kosslyn, & D. B. Mumford (Eds.), Quantitative analyses of behavior, Vol. 9: Computational and clinical Approaches to pattern recognition and concept formation (pp. 109-126). Hillsdale, NJ: Erlbaum.
    Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528-551.
    Huber, L. (2001). Visual categorization in pigeons. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: pigeon.psy.tufts.edu/avc/huber/
    Jitsumori, M. (1993). Category discrimination of artificial polymorphous stimuli based on feature learning. Journal of Experimental Psychology: Animal Behavior Processes, 19, 224-254.
    Jitsumori, M., & Ohkobo, O. (1996). Orientation discrimination and categorization of photographs of Natural Objects by Pigeons. Behaviour Processes, 38, 205-226.
     Kirkpatrick, K. (2001). Object perception. In R. G. Cook, (Ed.), Avian visual cognition [On-line]. Available: pigeon.psy.tufts.edu/avc/kirkpatrick/

    Nosofsky, R. M. (1986). Attention, similarity and identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57.
    Nosofsky, R. M. (1991). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal of Experimental Psychology: Human Perception and Performance, 17, 3-27.
    Pavlov, I. (1927). Conditioned reflexes (G V. Anrep, Trans.). Oxford University Press.
    Pomerantz, J. R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and their component parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and Performance, 3, 422-435.
    Sutherland, N. S., & Mackintosh, N. J. (1971). Mechanisms of animal discrimination learning. New York, NY: Academic Press.
    Urcuioli, P. (2001). Categorization & acquired equivalence. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: pigeon.psy.tufts.edu/avc/urcuioli/
    Van Hamme, L. J., Wasserman, E. A., & Biederman, I. (1992). Discrimination of contour-deleted images by pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 18, 387-399.
    Vaughan, W., Jr., & Greene, S. L. (1983). Acquisition of absolute discriminations in pigeons. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior, Vol. 4: discrimination processes (pp. 231-238). Cambridge, MA: Ballinger.
    Vaughan, W., Jr., & Greene, S. L. (1984). Pigeon visual memory capacity. Journal of Experimental Psychology: Animal Behavior Processes, 10, 256-271.
    Watanabe, S., Sakamoto, J., & Wakita, M. (1995). Pigeons’ discrimination of paintings by Monet and Picasso. Journal of the Experimental Analysis of Behavior, 63, 165-174.
     Young, M. E. & Wasserman, E.A. (2001). Stimulus control in complex arrays. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: pigeon.psy.tufts.edu/avc/young/

Acknowledgements

The research reported in this chapter was supported in part by PSC/CUNY and NIMH grants to the two authors. The web page and graphics were developed initially by Erich A. Heinemann.