The Leaky Integrator Model

for mouse choosing contiuously between two levers with different reward probabilities that change.

Dan and Jin-sook


Testing the Model (ppt) Predicting Behavior (ppt) Matlab source files

Abstract: In some perspective, life is a process of making decisions. To make decisions for better rewards, animals will consider the values of response options, which probably affect animal behavior, such as switching their foraging sites. Here with the given data, I tested the matching law with animal behavioral data and leaky integrator model based on reward history to simulate the values of two possible alternatives that animals can choose in the experimental paradigm.

Introduction

When a mouse is able to press a lever to cause a stimulation to the ``happy center,'' he will press the lever repeatedly and rapidly, until he drops from exhaustion. Mice actually stop eating in order to completely devote themselves to the pressing. It is clear that happy center activation has compelling and persistent effects on motivation and behavior. Happy center stimulation is a kind of ``reward'' that does not get old and whose value supercedes all else.
      Though a life of lever-addiction is probably not the best one for a mouse, we can assume that the mechanisms by which a mouse endeavors to maximize this artificially provided reward are similar to those by which it achieves other kinds of goals that are more relevant to its continued existence, like foraging for food. So the value of studying this behavior goes beyond a sadistic desire to terrorize junkie mice. We hope to understand how nature has found a way to produce complex goal oriented behavior and strategizing in higher animals.
      In life nothing is certain, and this contributes to the complexity of situations and makes strategizing more difficult. Also, in life there are options, and if you don't make the right choice first or quickly, you aren't going to be able to try again: someone else will have found the food, or the prey will get away.
      To reflect these complications of uncertainty and options, we consider the situation where a mouse is presented with two levers with probabilistic reward. We seek to model how the rat strategizes, or chooses his behaviors, in response to this situation. We are particularly interest in how the mouse adapts when the reward probabilities for the levers silently change in the middle of the trial. Though the matching law shows pretty good agreement with actual behavior, there are discrepancies, and one may ask if there are other models that do better.
      Psychologists and economists have long appreciated the contribution of reward history and expectation to decision-making in higher mammals (Kahneman and Tversky, 2000). Sugrue, Corrado, and Newsome consider a model using reward histories in monkey behavior. In their work, monkeys make saccades left or right in response to colored circles appearing. Similar to the mouse and lever situation, the monkeys are rewarded probabilistically depending on the color they look to. They find that a so-called leaky integrator model is very effective in predicting behavior, and they have even found neurons in the monkey brain which appear to be keeping track of the very variables of the model. Though we don't have access to similar cell recordings in mice, we can see if the leaky integrator model fits the mouse behavior well.
      Sugrue et al sought to quantify the time course of the relationship between the monkey's choices and past rewards. Considering the reward history that immediately precedes a particular choice, they proposed the technique of ``choice triggered averaging'' (CTA) to construct optimal linear filter, which relates recent reward history to current choice.

Fig 1. The output based on leaky integrator indicates a local estimate of the target’s recent value. For instance, the peak around 3500 ms represents a given reward within one stay on the target of a rat.
      The researchers successfully modeled the switching behavior by applying the leaky integrator model, which was used to produce the best causal linear filter relating past rewards to current choice. The success of their study warrants the consideration of the leaky integrator in other contexts. Our study applies this model to the mouse behavior.
      Consideration of the data itself is another motivation for our study. In the graph below, the rat switched the target more frequently after pressing the target without getting a reward. That means, the value distribution of a whole stay on a target would be quantitatively different from that of a final stage of a single stay on the same target.

Fig 2. Rats switched their targets (levers) after pressing a lever a couple of times if there was no reward. A blue dot represents an unrewarded event of lever pressing, and a red dot represents a rewarded event of that.
      Our study intends to address two questions. First, does the rat behavior follow the matching law? Second, is there a simple model based on a reward history, which can explain switching behavior?

Methods

1. Experimental paradigm: The subjects were five male Sparague-Dawley rats, implanted with monopolar stimulating electrodes in the posterior part of the lateral hypothalamus. They learned to press a lever for brain stimulation reward. Variable interval (VI) schedules of reinforcement were generated using independent constant probability geometric approximations to an exponential distribution. Beginning immediately after a reward was collected from a lever, the computer flipped an electronic coin once each second to determine whether to set up the next reward on that lever. The probability, p, of this coin coming up heads determined the expected delay (1/p seconds) to the next available reward. The reward was delivered immediately if the rat was holding the lever down at the moment it was set up. (from Gallistel et al, 2001)

Fig 3. Experimental paradigm. The rat can press one lever between two alternatives with different reward schedule. Richer means the lever with larger probability of reward, while leaner means the lever with smaller probability of reward.
      2. Choice-triggered averaging (CTA) by leaky integrator model: CTA was applied to relate recent reward history to current choice. This technique is conceptually similar to spike-triggered averaging. More practically, using that signal to drive a leaky integrator means passing a signal through a linear filter with a kernel in the shape of an exponential, which is described below. Here time constant of integration is the decay constant of the exponential kernel. Through this model, the number of rewarded trials on which a rat chose a given lever before switching was distributed as the average of a family of exponentials. (Sugrue et al, 2004)
Value of target a = Sum a*exp(-lambda*(t-tau_i))

Results

1. The rat behaviors follow the matching law
a. Distribution of stay durations
Rats stay longer on richer (high probability of reward) side than leaner (low probability of reward) side, as expected.
     
Fig 4. Representative distributions of stay durations. High probability of reward is strongly correlated with longer duration of stay on the lever.
      b. Transitions after one time-distribution to another The reward fractions on lever A and B changed at some time in each experimental trial. To test whether the rat subject adapted to the new reward schedule by changing its time-allocation behavior, we consider the graph of the cumulative time on lever A against the cumulative time on lever B. The slope at any point is the rat’s time-allocation ratio at that time. When the rat matches, the slop of this function equals the ratio of the rates of reward. Based on this graph, it is found that the rat time-allocation behaviors follow the matching law.

Fig 5. When the cumulative-cumulative stay duration and reward curves (blue lines) parallel the curves generated based on the programmed ratio of reward rates (red lines), the subject’s time-allocation ratio matches the ratio of the programmed rates of reward.
      2. Model for relating choice to reward history
a. Different distribution of value during a single whole stay and that for a final 600 ms stage of the same stay on a target Generally, the average of value during a single whole stay on a target was 1.6 times bigger than that during a final 600 ms stay before switch. Especially, the standard deviation of the reward for a whole stay was almost 2.8 times bigger than that for the final 600 ms stay before switch. The table below shows the relevant statistics for all given data from three animal subjects, respectively.
     
Fig 6. Different value distribution for a whole stay and the final 600 ms duration of the same stay on target A. Usually, the mean of Value during a whole stay is about two times bigger than that during a final 600 ms stage of the stay. In contrast, the standard deviation of value during a whole stay is about three to four times bigger than that during a final 600 ms stage of the stay.

Fig 7. Different value distribution for a whole stay and the final 600 ms duration of the same stay on target B. The statistical characteristics of value distribution, such as mean and standard deviation, for two stages on target B are almost the same as those on target A.

Discussion

We tested two main questions motivated by choice-triggered averaging based on leaky integrator model and direct observation of the given data. The main conclusions are the following: (1) the rat time-allocation behaviors follow the matching law within the context of operant conditioning paradigm and (2) it is possible to find statistical characteristics of switching behavior based on application of leaky integrator model.

Cited Works


Jinsook tested the possibility of behavioral matching, applied leaky integrator model to relate recent reward history to current choice, and produced much of the material for the final webpage. Dan looked at a second way of assessing the effectiveness of the leaky integrator model, by considering how often the mouse switches on particular value bins of expectation function, and collected and converted to html final webpage content. Both went through a lot of initial data analysis, and lots of matlab coding.
Back to main Dezhe's projects page