Transition Differential Summation Model of Human Behaviour in a 2-Choice Scenario

By Shaun Foley and Jon Wetzel

Approach

Our model is based on reward differentials, classified by the button choices spanning the trials: 0->0, 0->1, 1->0, or 1->1. On each trial the reward delta, R(i-1) - R(i), is computed. This is added to the sum in the appropriate transition counter with two modifications: it is multipled by rscale, and added to rbase. Two sets of transition differential sums (TDS) are kept: One is global, tracking all trials. The other is computed setting it to the value of the global counter, scaling this value by oldscale, then recounting the previous anum trials. For the first ne1 trials, the choice is random, to gather information regarding what happens upon different transitions, and to emulate the participants' initial exploration.

Now, the TDS corresponding to the previous choice are examined. For instance, if the last choice was 0, the 0->0 and 0->1 TDS are the relevant variables. Call these sums a and b, respectively. If |a-b| > bigdiff, then the difference is deemed overwhelming, and the button corresponding to the larger reward is chosen. Otherwise, the probability that a button will be chosen increases according to P=.5 - (a-b)/(2*bigdiff). The appropriate order is maintained so that the difference approaches bigdiff, the probability that the "better" button will be chosen approaches 1.

Each part of the model is meant to correspond to predicted behavioural motivations. Transitions and differences are the core, as people will notice these more readily than absolute values or choices. The global TDS represents the overall impression people have of what happens upon given choices. For a given press, however, this is scaled down by oldscale, giving the more recent events more weight. The local TDS represents a subject's short term memory and realisation that they may be in a dynamic situation. The scaled reward delta and bigdiff magnitude are important, since the TDS differences are converted to probabilities and must therefore lie between 0 and 1. The constant rbase factor gives more weight to previous choices, since a button chosen more in the past is likely to be chosen more in the future. From a computational standpoint, this also helps buffer against unrealistic changes that might occur because of randomness.

Parameters and Training

There are six parameters that need to be determined: ne1,anum,oldscale,rscale,rbase,bigdiff. At first we measured the success of the model by giving the simulator the game history and getting its prediction for the subject's next choice. This causes problems when the distribution is around .5, however. For example, if our model chooses 1 with probability .5 for two trials, and the subject actually chooses 0 then 1, we have only 25% chance of matching that, even though the distribution is as expected. Since most subjects did not escape, they often oscillated about the local maxima, exacerbating this problem.

Since the model naturally provides choice distribution predictions, we use it instead, and trained on the first pilot data as follows: There is a decision point every ten trials: the simulator is given the choice and reward history of a subject and returns its expected choice probability. Then, the actual probability is computed by averaging at the choice in question and the four surrounding choices. The difference in the expected probability and the real distribution is the error, which is averaged over all decision points. In reality, we examined 1 minus the error ("correctness"), since we had designed other training functions to maximise, not minimise, and subtracting here was the easier change to make.)

The relevant magnitude of each variable wasn't known, so we first chose a wide range of values to get 170 sets of parameters. Because our model is stochastic, we ran each twelve times and averaged the "correctness." We then ran a hill-climbing algorithm on individual parameters on the best performing sets. Happily, these tended to gravitate toward the same values which took as a parameter vectors.

Dataset Mean Correctness ne1 anum oldscale rscale rbase bigdiff
1 .8885 10 6 0.0 .26 .07 .35
1r .8857 10 3 0.0 .92 .11 .20
2 .9266 10 4 0.0 .90 .17 .36
2r .9447 10 6 0.0 .46 .17 .56
3 .8509 15 6 .01 .66 .06 .43
3r .8577 15 4 .01 .96 .07 .35

Dataset	Mean Correctness	ne1	anum	oldscale	rscale	rbase	bigdiff
1	.8885	10	6	0.0	.26	.07	.35
1r	.8857	10	3	0.0	.92	.11	.20
2	.9266	10	4	0.0	.90	.17	.36
2r	.9447	10	6	0.0	.46	.17	.56
3	.8509	15	6	.01	.66	.06	.43
3r	.8577	15	4	.01	.96	.07	.35

Results

Parameters and Method Trends

From the above parameter vectors, we can begin to explain the subjects behavior in terms of the parameters. When looking at the parameters, there are two main areas to interpret: Differences in trail m1 and m3, and differences between the forward method and the reverse method.

M1 vs. M3

1) The number of trials required to gather background data (NEl) is a bit larger in M3.

2) The difference in the "goodness" of the two choices needed for the decision to become obvious (bigdiff) was greater in M3.

3) The relevance of choices made in the distant past (oldscale) was non-existent in M1, and slight in M3.

4) The sensitivity to changes in reward was less in M1 than M3, but not uniformly so. (See point 7)

The model shows that M3 was harder for subjects than M1. This fits the experimental results, as the average rewards were lower in M3 than M1. Reasons for this difficulty are most likely resultant of the larger "trap" area in M3.

M1-M2-M3(MF) vs. M3-M2-M1(MR)

4) The difference in the "goodness" of the two choices needed for the decision to become obvious (bigdiff) was greater in MF.

5) The slight weight added to past rewards (rbase) is greater in MR than MF.

6) The number of trails considered "recent" enough to have great effect on decision making (anum) was greater in MF.

7) The sensitivity to reward changes (rscale) increases greatly in MF, but very little in MR.

8) The overall sensitivity to changes in reward (rscale) was greater in MR.

Drawing on our conclusion that M3 is more difficult than M1, the model shows with (5) that the importance placed on past choices during M3 will carry on to M1 in the reverse case. However, decisions in MF required more consideration (4) and memory (6). (7) and (8) indicate that the subject concentrates more on the red reward meter during M3, and if started on M3 will carry it on to M1 in the reverse case.

Another point of interest we can see here is that "n", the nubmer of trials kept in short-term memory is quite small compared to the number of trials in the experiment. This leads us to the conclusion that the small number of trials almost entirely determines the subject's actions. Also, the "rscale" parameter is greater than the "rbase" parameter in every method, indicating that the changes in reward are affect choice more than the overall reward.

Parameters and Pilot 2

Using the above parameters trained on the first pilot data, we tested the same parameters for the respective data sets in the second pilot. The testing process again averaged the results of twelve simulations to mitigate random noise.

Dataset	Mean Correctness
1	.8720
1r	.8228
2	.9495
2r	.9580
3	.8369
3r	.8244

Collaboration

Experimental Data Collection: JW and SF

Model Development: SF

Model Training: JW and SF

Analysis of Parameter Trends: JW

Analysis of Model on Pilot 2 Data: SF