Spaun performing reinforcement learning. This is a 3-armed bandit task. Spaun generates a number between 0-2, then is provided a reward or not, indicated by a 1 or 0 respectively. The reward is given with a probability of .12 for 'bad' actions and .72 for 'good' actions. In the video shown here, the 'good' action is choosing a 2. In longer runs, the 'good' action switches every once in a while. This task demonstrates that Spaun can change its behavior based on probabilistic rewards from the environment.
Task number two is a reinforcement learning task. After each question mark, Spaun must guess the 'best' number between zero and three. The best number is the number that generates the most reward. In the simulation, a positive reward is indicated by a 1, and a lack of reward is indicated by a 0. However, even the best number is only probabilistically rewarded. Such tasks are called 'bandit tasks' because they are reminiscent of the chance rewards received from one-armed bandits at casinos.
As you can seen in this simulation, Spaun begins with several guesses that do not generate much reward, until it has determined that 'two' is the best value. Spaun guesses two several times in a row. Eventually the two is not rewarded, but Spaun continues to guess that value. Soon after, the two is not rewarded a second time. So it changes it's guess.
The detailed spiking patterns of the ventral striatum in Spaun are strikingly similar to those of rats performing the same kind of bandit task.