Rat Behavior in Response to Reward Changes:

A Matching-Maximizing Model

Mathew Tantama, Margaret Blayney, Dezhe Jin

9.290 Spring 2004 Midterm Project

 

 

Abstract:

 

Gallistel and co-workers have previously examined rat behavior in response to rewards on concurrent variable interval schedules.1 Pure matching theory weakly predicts behavior when the reward ratio is unity; however, even in this simple case it is observed that switching behavior is dependent on the reward rate – that is, if the reward ratio remains unity but both reward rates increase or decrease. Matching theory does not explain modulations of the switching rate. Here, we build a matching-maximizing model in attempt to better fit the experimental data and specifically address the switching rate of the rat by including a maximization of rewards rule. In order to validate this approach we consider two cases. In the first case, we approximate rat behavior as governed by a constant frequency of switching. The first case produces a linear model that appears to adequately predict the rat switching rate when the reward ratio is unity and reward rates vary; however, in general the model poorly predicts observed switching rates. In the second case, we approximate the rat’s behavior as governed by a Poisson distribution of stay durations. Surprisingly, the second case converges to the same linear model with no improvement in the theoretical fit indicating a failure in this approach. The failure of the model in these two cases may indicate that the rat does not use a matching-maximizing strategy, or it may be that the behavioral approximations made in the two cases were too crude and more experimentally relevant distribution functions must be used to adequately evaluate this approach.

 

Introduction:

 

In the experiment conducted by Gallistel and co-workers1, rats choose between two levers in order to receive a reward in the form of direct stimulation of a pleasure center in the brain. The levers are set on a concurrent variable interval schedule with exponentially distributed baiting delays. A schematic for the experimental setup is presented below.

 

 

The reward probabilities are given by a Poisson distribution with reward rates l, and it is important to note that once a lever is baited, it remains baited until the reward is harvested.

 

In the experiment, the rats are subjected to several phases.  In the first and third phases, rats are presented with a constant reward ratio.  In the middle phase, the rats experience variable reward ratios such that at some time during a given trial a single unannounced reward ratio change occurs. Gallistel and co-workers observed that the rats showed varying degrees of speed of adaptation to the change in reward ratio.  Interestingly, in some cases the rat modified its behavior immediately in response to the change in reward ratio.  The speed of the rat’s reaction is faster than Gallistel claims a feed-back model such as reinforcement learning can predict. As an alternative to feedback models, they consider matching and show that matching allows for calculating an expected income without correlation to previous behavior. They may be able to better model responses to a reward ratio change using matching; however, matching does not predict how switching rate depends on reward rate as is observed in the experiment.  Thus, a pure matching model provides an incomplete estimate of behavior.

 

We would like to further develop a matching model that is able to explain behavior including reward rate-dependent modulations in the switching rate.  Matching provides a rule for expected incomes, and here we include an additional rule to maximize total rewards. We construct an algorithm for building this “matching-maximizing” model and consider two simple cases in order to evaluate its utility.

 

BACK  NEXT