IMATA Soundings: Dolphins & Cognition Training

Reference:
Uyeyama, R.K., Pack, A.A., & Herman, L.M. (2005). How to develop abstract concepts and understanding of complex relationship in dolphins. Soundings, 30, 4-7.

HOW TO DEVELOP ABSTRACT CONCEPTS AND UNDERSTANDING OF COMPLEX RELATIONSHIPS IN DOLPHINS

Feature article for Soundings, the Magazine of the International Marine Animal Trainers Association, Third Quarter 2005

Robert K. Uyeyama, Louis M. Herman, and Adam A. Pack

Kewalo Basin Marine Mammal Laboratory

University of Hawai`i at Manoa

The dolphin research programs at the Kewalo Basin Marine Mammal Laboratory are directed towards examining sensory perception and cognition. (For a list of scientific publications, see http://www.dolphin-institute.org/publications.html) Such research involves specific animal training techniques that may be largely unique to this context.

The obvious difficulty in relating animal training to cognitive research is that it involves studying the mental processes that lie interposed between behavior and stimuli. Modern training methods, on the other hand, stem from the operant conditioning training techniques of a behaviorist tradition (Skinner, 1938) that denies the very existence of such mental processes. In human psychology, the mature development of the field of cognition has shifted the emphasis away from a theory of external events, such as behavior and stimuli, toward a theory of the internal processing of information–our dolphin research falls under the subcategory of the comparative study of animal cognition (e.g., Roitblat, 1987). How then is it possible to train animals for studying mental representations and rules using techniques derived from behaviorism?

When new trainers are taught animal training, they are typically cautioned against anthropomorphism and are taught to avoid assigning importance to what is happening inside the animal; for the task at hand, only the observable behavior counts. Cognitive researchers (who by definition study unobservable internal processes) must also caution themselves against anthropomorphism, but instead do so through the practice of scientific parsimony, a method that in this case essentially instructs that only the simplest explanation for an animal’s behavior counts. Although the behavioristic phenomenon of associative learning is certainly one of the simpler explanations, if it can be ruled out, then the explanation of the observed behaviors may require the existence of something else, such as an internal mental operation.

The basic technique of operant conditioning, namely associating reinforcement with an operator’s behavior in the presence of a discriminative stimulus (S^D), is still valid for many of the purposes of training animals for such research. However, if there is a mental process at work, it occurs between the S^D and the animal’s subsequent behavior. Because a mental process is not directly observable, the experiment and the training must be carefully conducted in a manner that the inference of its existence can be made. In particular, we are interested in applying animal training techniques to create cases where operant conditioning does not fully explain behavior.

Before we discuss animal training, it is important to emphasize that in any animal cognition research, the final observable result will be the behavior of the animal. A conclusion regarding the use of specific mental processes requires that we first rule out all other simpler explanations including lack of experimental controls, rote associative learning by operant conditioning, and chance behaviors by the animal.

Experimental controls address the possibility that the correct behavior by the animal may occur for undesirable reasons such as observer bias or unintentional cues. Thus, the experimental observer, the trainer, and training assistants should be unaware of the experimental conditions for each trial to eliminate these biases. The elimination of extraneous environmental stimuli that may be associated with the correct response should also be carefully considered. These may include unintentional timing, environmental or physical cues, or even a predictable regularity in the sequence that trials are scheduled during a test.

Finally, we must rule out the explanation that the animal is generating its correct behaviors through the rote associative learning of operant conditioning. For this purpose, we use what is known as a “transfer test,” a generalization test. All experimental trials in animal cognition research begin with an experimental stimulus, which is followed by a behavioral response. No matter how difficult the experiment may seem, if an animal has encountered the same experimental stimulus before, the simplest explanation for that animal’s correct behavior is one-to-one associative learning, rather than the use of an internal concept.

To test for concept-formation, the transfer test presents the animal with new experimental stimuli it has never before encountered. Correct responses through past associative learning (operant conditioning) are not possible with these novel stimuli. The first encounter is known as the “first trial exposure,” or “probe trial,” and transfer test results are composed of the summed result of many such first trial exposures. A statistical test is then performed to determine if the animal’s performance level could occur through chance. Only if this result is determined to be below the threshold of a 5% probability can we conclude that the animal’s behaviors can be explained by the generalization of a mental concept.

In practice, exposing an animal to a solid block of consecutive new stimuli (probe trials) may not provide the best measurement of the animal’s abilities. Because errors are not reinforced, consecutive probes may result in enough errors that the overall frequency of reinforcement is significantly lower than in the training phase of baseline stimuli, which could cause undesirable effects such as frustration, balking, aggression, or even incorrect re-learning during the transfer test. In contrast, if the probe trials are inserted in a low density within a larger set of familiar baseline trials, then the overall reinforcement frequency will remain similarly high. Even with errors on the novel probes, we will avoid undesirable behavioral problems and can obtain an accurate measure of the animal’s abilities.

Another failsafe is the baseline pretest. Since transfer stimuli are only novel the first time they are exposed to the animal, it is important that the animal is otherwise performing optimally. If the animal is having an off period due to a change in environmental conditions or routine, or because of some other reason, it may not be capable of performing adequately on the familiar and easier baseline trials, let alone the novel probes. Exposing the animal to probe stimuli (which are often limited in number) in such a condition would be pointless. Therefore, we can prospectively plan to begin the testing with a short block of several baseline trials that is scored in comparison to a pre-decided green light threshold so the researcher may decide whether to proceed with the probes or delay them until another day.

Because of the crucial importance of transfer tests, we are concerned with improving the chances of successful generalizations. Typically, we do this by decontextualizing our animal during early training, in other words, providing it a broad variety of different task-relevant stimuli. However, how is generalization in a research setting any different from generalization in typical operant conditioning? The answer is the animal’s behavior. For example, if an animal is being trained in a desensitization task, such as tolerating a needle stick for a blood draw, the training may involve decontextualization via a wide variety of discomforts similar to a needle stick, perhaps in conjunction with a wide variety of distracting stimuli. In this case, the animal is being trained to remain motionless under all conditions. Eventually, a true needle stick can be introduced, and the animal may generalize to this new stimulus and again remain motionless. The key is that this phenomenon is known as stimulus generalization because it results in the single and same behavior where only the stimulus has changed.

In contrast, in some cognitive research tasks, the animal’s generalization to a new stimulus may manifest in an entirely new behavior never previously offered. For example, in our artificial language research (Herman, Richards, & Wolz, 1984; Herman, Kuczaj, & Holder, 1993), we trained hand signs to represent several objects and actions so that they could eventually be combined freely: BALL + OVER combined would request the dolphin to jump over the ball, and FRISBEE + UNDER would request the dolphin to swim under the Frisbee. Before the transfer test, it is reasonable to claim that each pair of combined signs were actually lumped as a single S^D, e.g., the signs BALL + OVER had been operantly conditioned as a single unit with the behavior of jumping over the ball. However, some combinations of stimuli were never exposed to the dolphin until the transfer test. Even if the dolphin had never before encountered the sequence BALL + UNDER, the correct generalization would be the behavior of swimming under the ball, a behavior that had never before been observed or reinforced. The resulting correct behavior on the dolphin’s first attempt would not be explained by operant conditioning. The conclusion could then be made that the dolphin was processing each stimulus separately using a mental rule.

In other research tasks, the animal’s generalizations to new stimuli result only in previously offered responses. However, instead of a single trained response (e.g., being motionless during needle-stick desensitization) there may be multiple old responses that are spontaneously paired by the animal in a pattern consistent with the new probe stimuli, suggesting the use of a mental rule: a set of behaviors not fully explained by the theory of operant conditioning.

In one experiment (Mercado, Killebrew, Pack, Macha, & Herman, 2000), we tested the dolphin’s ability to determine if two objects were the same or different from each other. Based upon visually inspecting these objects, the dolphin was required to choose between two response paddles, one each on the dolphin’s left and right. If the objects were the same (e.g., a ball and a ball), the dolphin was required to touch the right paddle, and if they were different (e.g., a ball and a hoop), the dolphin was to touch the left one. When presented only balls and hoops, the dolphin could easily learn four conditions by operant conditioning: go right if presented with (ball and ball) or (hoop and hoop); go left if presented with (ball and hoop) or (hoop and ball). However, if these objects were replaced by entirely new objects, such as surfboards, cones, and baskets, such rote rules (based on balls and hoops) would be inadequate to explain the animal’s correct responses upon first exposures. The use of a mental concept of sameness and differentness was indicated by the same responses of paddle-pushes to the left or right as during training, but given in a pattern that spontaneously coincided with the same or different pairings of the new objects.

Many modern training techniques do not strictly fall under the original definition of operant conditioning. Many of these are useful for training animals for cognitive research, especially timing-critical tools such as bridging a conditioned reinforcer, a no-reward marker (no-signal), and a keep going stimulus (continue signal), as well as various cues and manipulations (Pryor, 1984; Ramirez, 1999). We will discuss fading here.

Fading is generally defined as a process of teaching an animal by changing the controlling stimulus for a behavior from the initially trained (and presumably, more easily trained) stimulus to another. It is a process of stimulus generalization. In contrast, in cognitive research training, the generalization enables a shift from one strategy for producing behaviors to another–from associatively basing behavior solely on learned stimuli to basing behavior upon a conceptual operation as these stimuli are faded away, resulting in a new pattern of behavior that can pass muster on a transfer test.

The goal is to guide the animal towards applying the correct cognitive strategy. At first, we arrange the experiment to allow the animal to utilize an easily trained stimulus, a helper cue that is irrelevant to the experimental question, but reliably associated with the correct behavioral response, so that we may reinforce the animal frequently. The positive reinforcement of the animal for correct responses across multiple helper-cued trials will thus co-occur with experimentally relevant stimuli, providing the basis for latent learning (Ramirez, 1999) when these cues are faded and removed. The animal may then generalize from a simple stimulus cue to a conceptually based strategy for generating its responses, rather than from one stimulus to another as in ‘normal’ fading. Fading is thus used until a predominance of correct responses suggests to the researcher that the animal is ready for a formal transfer test and the conclusion of the experiment.

In the sense that we begin the animal with success and positive reinforcement, this process seems somewhat similar to the errorless successive approximation of operant conditioning. In contrast, however, it is a very dynamic process dependent on the trainer’s experience and intuition in skillfully changing the experimental parameters to guide the animal toward learning. The most obvious caution is that even slight over-training of the helper cue will rapidly cause the animal to depend upon it, making fading very difficult and necessitating significant effort in un-training. In contrast to the more gradual approach of typical successive approximation, once the animal begins to respond correctly, the helper stimulus must instead be rapidly faded away. Typically, this fading is usually performed far before the animal is close to 100% performance. By this point, the animal probably has become concretely over-trained to attend only to the helper stimulus. We can promote decontextualization if the fading begins when the animal is performing fairly well, but still in the process of learning and making frequent errors (70-80% accuracy). The situation remains fluid and uncertain enough that it facilitates the animal’s shift from associating behaviors to cues by rote to the use of conceptual rules governing the experimental stimuli to which we desire the animal to attend.

As an example, let us consider the identity matching-to-sample (MTS) task, as described in Herman, Hovancik, Gory, & Bradshaw (1989). The trainer first presents the dolphin with an object for its visual inspection. The object is then removed from view, and two objects are next presented, one to the dolphin’s left and the other to the right. One of these choices is identical to the first object and is the dolphin’s correct choice whereas the other is a differing object and the incorrect choice. To respond, the dolphin must swim to station in front of the matching object. During the training process, the correct object could be presented conspicuously closer to the dolphin, so that it is very likely that it will respond to it. The proximity of the object serves as the initial helper cue. This proximity cue can be faded away either in distance or in frequency across multiple trials. Eventually the dolphin, in the course of being reinforced for these cued trials, may also gather experience in patterns of co-occurring experimental stimuli, such as the identical nature of two of the objects. Finally, on trials where the cue has been faded greatly as to be of little help, the animal may draw upon these experiences through latent learning to develop an alternate strategy to choose a response, and it is possible it will be the correct one: the concept of matching identical objects. Many other types of cues may be used from simple targeting to more complex ones such as using the same concept in another sensory modality like acoustic matching cues for a visual matching task (Forestell & Herman, 1988).

However, how does the trainer know which cue to use, or how much training time should be invested in any particular step? Moving through steps too slowly encourages rote learning and the development of concrete response contingencies (i.e. operant conditioning), whereas moving too quickly may simply be a stimulus overload for the animal, resulting in little learning. Training plans that may have been carefully chosen to help guide the animal’s learning often turn out to be ineffective, or teach the animal to use what seems to be an incorrect strategy. How do we identify and correct these resulting errors by the animal?

In non-research animal training, when the animal makes an error, it is explicitly manifested through behavior. The solution is therefore clear: specifically identify the erroneous components of the behavior, and then extinguish these. The equivalent in cognition research is that when an animal makes an erroneous response, it may be applying an erroneous strategy, which may be less explicit because incorrect strategies can sometimes result in the correct behavior. The trainer must identify this erroneous strategy, and actively extinguish it. The animal is thus forced to adopt a different strategy, which has the potential of being the correct one desired by the investigator.

Given the need to extinguish erroneous strategies of the animal, the conflict between behaviorism-based philosophies of animal training with the goal of investigating mental processes in animals becomes most apparent. When trainers are cautioned not to anthropomorphize, the practical issue is that guessing at the internal state of an animal is a haphazard activity. We as trainers may often be wrong, or guessing correctly may not make a difference, thereby weakening our training. Such inefficiency is generally unacceptable. During the training of cognitive tasks, we as researchers also do not have certainty of what the animal’s mental processes are. Thus, guessing at the animal’s response strategy is similarly haphazard, but in the training period before the transfer test, there is simply no other tool at our disposal, because what we are attempting to train is not visible.

Why do we need to guess? One alternative to guessing, the passive method of exposing the animal to experimental stimuli and simply waiting for the animal to spontaneously acquire the correct strategy by a method akin to scanning, is almost universally ineffective. This is in no small part because of the issue of reinforcement schedules. Cognitive research is generally of the forced-choice variety where the animal is choosing a behavior from a limited set of possible ones such as pressing left or right response paddles. Therefore, if the animal is randomly guessing or using an incorrect strategy, it will likely get some trials correct, thereby being reinforced in what becomes in effect a variable schedule of reinforcement. In most cases, the animal will be reinforced by such passive methods sufficiently enough to not extinguish the animal’s current response strategy. Furthermore, due to the well-known effect of variable schedules, even direct interventions to retrain the animal may be difficult if it is allowed to use these incorrect strategies for any extended period. The animal does not ‘know’ that there is a correct strategy that will result in the reinforcement of every trial. Therefore, guessing at the animal’s strategy, followed by rapid and dynamic applications of direct interventions are necessary to guide the animal away from erroneous strategies and (potentially) toward the desired correct ones.

For an example, let us return to the same/different protocol, where the dolphin is presented two objects, and must choose between two response paddles, one each on its left and right sides. If the two objects are the same (a ball and a ball), the dolphin is required to touch the right paddle and if different (a ball and a hoop), the left one. Suppose the dolphin initially learns instead to simply default always to the left paddle, no matter what the objects are. Its performance on this two-choice task will be 50% correct on a balanced schedule of trials. Suppose further that the dolphin subsequently learns that if it sees two balls, it should make an exception and choose to the right, a correct response under that trial’s criterion. Using these two rote strategies, the dolphin will be scoring a highly reinforcing 75% correct responses. If we present a transfer test at this point using unfamiliar objects, the animal will surely fail (i.e., perform at chance). However, if we correctly guess the animal’s application of these erroneous strategies (all-left and two-ball=right), we can extinguish those by replacing the ball with another training object while temporarily skewing the training trials to favor the right paddle. The overall reinforcement frequency is drastically reduced, forcing the dolphin to seek a different strategy, and at this point, we may even apply fading to guide further learning. Of course, there is no guarantee that the dolphin will subsequently adopt the strategy of sameness and differentness. If errors again predominate, we will still likely gain new insight into possible patterns of the dolphin’s errors, allowing us to derive new guesses and new training plans for extinguishing erroneous strategies, or for guiding conceptual learning. When the dolphin’s performance in baseline trials improves to a predetermined threshold where we feel it may be operating on the correct concept, we can then expose the animal to the transfer test.

Two options may be executed by the trainer upon the animal’s error. It is very useful to the animal to receive feedback for a response that was not desirable to the trainer. Presenting a well-timed no-reward marker (NRM) is one method, especially if it can be given at the point when it seems the animal has already made its erroneous decision and has begun its incorrect response. Although providing critically useful information to the animal, overly frequent NRMs noticeably lower the overall reinforcement frequency, and the result may be frustration behaviors not conducive to further learning. An alternative is to intervene on the fly before the animal has fully completed its incorrect response, and cue the animal to the correct behavioral response, which is then reinforced. Such a cue could be made by calling the animal in a particular direction with a point or a visual (e.g., shaking) or acoustic (e.g., splashing) cue or by cueing a behavioral component with an S^D. Of course, such a cue must be rapidly faded, as its only purpose is allow the animal to compare the correct response with the experimental condition of that trial. As such, these interventionary cues should be relatively infrequent so the animal does not tend to depend on them, and like NRMs, should be given only after the animal has clearly chosen and begun a specific response. The particular ratio of interventions vs. NRMs in response to errors depends upon the context and the particular animal, and this ratio must often be dynamically varied by the trainer. However, since both interventionary helper cues and NRMs are limited to the frequency of their use, the trainer must often make the decision to follow an entirely different plan to guide the animal’s learning. This may range from the addition of new varieties of helper stimuli to be faded, subtle regressions of criterion, or even basic changes in overall approach or presentation of stimuli.

We believe the techniques of teaching mental concepts may be useful outside of the laboratory in a typical public-display context. We have seen through our own educational programs that the public is keenly interested in demonstrations of the mental capabilities of dolphins. For example, we have shown that dolphins are capable of various generalized forms of imitation including behavioral (dolphin model as well as human model) and vocal (Herman, 2002; Xitco, 1988; Richards, Wolz, & Herman, 1984). We have a hand sign that instructs the dolphin to imitate another dolphin’s current behavior; if the model dolphin is waving its fluke in the air, the imitating dolphin will do the same upon this signal. Alternatively, our dolphins will imitate the actions of the human models to the extent of mapping the human body plan onto the dolphin body plan. A human shaking his or her head from side to side will result in the dolphin doing the same. Lifting a foot results in the dolphin lifting its tail. Such behaviors in isolation are certainly not unique to our facility, but the difference is that these are not a set of many individually trained S^Ds. A small set of behaviors was trained to the hand signal, encompassing a variety of actions, and our dolphins have since generalized to unencountered stimuli. If the human performs a cartwheel past a viewing window, the dolphin will likely respond with a somersault, in the same direction.

We believe it is possible to use the same methods to demonstrate the mental capabilities of dolphins in almost any facility. For example for a generalized imitation behavior, the training program would begin immediately with a wide variety of relevant stimuli (behaviors to be imitated) without long periods of repetitive reinforcement contingencies typical to most training. In this way, the animals would avoid developing concrete associations, and the acquisition of the mental concept of imitation would be facilitated. Such a dolphin would surely be of interest in a public display demonstration, for example by asking an audience member to perform some bodily motion, after which the dolphin would spontaneously imitate a similar motion. Other relatively easily trained examples of demonstration friendly concepts exist. Braslau-Schneck’s (1994) training of two dolphins to innovate a new behavior in synchrony with each other, implying the use of some communicative mechanism between the two dolphins, is one such example. Furthermore, it is conceivable that once trained, conceptually based behaviors could be useful in the training of behaviors for other uses, from husbandry to applications where it is desirable to have an animal respond flexibly and appropriately in situations it has not specifically encountered in its training history.

In summary, one of the most important points is the contrast between rote associative learning and the learning of mental concepts by an animal. As mental concepts cannot be viewed directly, these are inferred from the animal’s behavior using the transfer test, which exposes the animal to new stimuli to which no previous associations enabling operant conditioning can exist. The animal’s internal use of mental representations and rules may be expressed through either new behaviors entirely, or new patterns of existing behaviors. It is important to consider alternate explanations before any research is begun and to draft an initial plan that attempts to guide the animal toward the learning of concepts, while pointedly avoiding one-to-one associations of stimuli and response behaviors. Maintaining a fluid set of guiding stimuli, as well as dynamically intervening at the first hint of associative learning will help prevent the undesirable outcome of an animal that simply alternates stubbornly from one incorrect strategy to another.

During training for cognitive research, the use of techniques such as bridging, no reward markers, and keep–going stimuli are important, as are dynamic techniques such as fading and helper cues. Constant vigilance by research trainers is important to ensure the animal is not falling into the concrete use of rote associative learning to generate its responses. Toward this end, the trainer is conscious at all stages of the importance of decontextualization: the stimuli and cues presented to the animal should be in flux in the majority of the training period as it builds toward the transfer test. Successes at earlier stages in learning must be rapidly followed by subsequent stages rather than risk over-consolidation and a halt in progress. Therefore, the pace of training may be noticeably accelerated as compared to non-research training. Furthermore, in the case of poor performance levels, the trainer may even find it an effective technique to guess and attempt to extinguish the current strategies the animal is using for generating its behaviors. After the animal’s performance in its baseline (i.e., learning set) trials is high, a transfer test may be performed to suggest that operant conditioning is not sufficient alone to explain its behavior, which serves as evidence for the animal’s application of a generalized mental concept. In this way, we have guided the animal’s learning in a process that may be called cognitive shaping.

Braslau-Schneck, S. (1994). Innovative behaviors and synchronization in bottlenosed dolphins. Unpublished master's thesis, University of Hawaii, Honolulu.

Forestell, P. H. & Herman, L. M. (1988). Delayed matching of visual materials by a bottlenosed dolphin aided by auditory symbols. Animal Learning and Behavior, 16, 137-147.

Herman, L. M. (2002). Vocal, social, and self-imitation by bottlenosed dolphins. In C. Nehaniv & K. Dautenhahn (Eds.), Imitation in animals and artifacts (pp. 64-108). Cambridge, MA: MIT Press.

Herman, L. M., Hovancik, J. R., Gory, J. D., & Bradshaw, G. L. (1989). Generalization of visual matching by a bottlenosed dolphin (Tursiops truncatus): evidence for invariance of cognitive performance with visual or auditory materials. Journal of Experimental Psychology: Animal Behavior Processes, 15, 124-136.

Herman, L. M., Kuczaj, S. A. II, & Holder, M. D. (1993). Responses to anomalous gestural sequences by a language–trained dolphin: evidence for processing of semantic relations and syntactic information. Journal of Experimental Psychology, General, 122, 184-194.

Herman, L. M., Richards, D. G. & Wolz, J. P. (1984). Comprehension of sentences by bottlenosed dolphins. Cognition, 16, 129-219.

Mercado, E. III, Killebrew, D. A., Pack, A. A., Macha, I. V. B., & Herman, L. M. (2000). Generalization of 'same-different' classification abilities in bottlenosed dolphins. Behavioural Processes, 50, 79-94.

Pryor, K. (1984). Don’t shoot the dog. Revised edition, 1999. New York: Bantam books.

Ramirez, K. (1999). Animal training: Successful animal management through positive reinforcement. Shedd Aquarium: Chicago, IL.

Richards, D. G., Wolz, J. P. & Herman, L. M. (1984). Vocal mimicry of computer generated sounds and vocal labeling of objects by a bottlenosed dolphin, Tursiops truncatus. Journal of Comparative Psychology, 98, 10-28.

Roitblat, H. L. (1987). Introduction to comparative cognition. New York: W. H. Freeman and Company.

Xitco, M. J., Jr. (1988). Mimicry of modeled behaviors by bottlenose dolphins. Unpublished master's thesis, University of Hawaii, Honolulu.