Learning from reward and punishment
An interview with Professor Adam Kepecs, Cold Spring Harbor Laboratory, conducted by April Cashin-Garbutt, MA (Cantab)
Animals learn to survive by reinforcing their successes, but how does the brain achieve this? Adam Kepecs, Professor and Chair of Neuroscience Program at Cold Spring Harbor Laboratory, recently gave a seminar at the Sainsbury Wellcome Centre where he outlined his research on a cortical circuit for reinforcement learning mediated by a type of inhibitory neuron. I caught up with him to find out more.
How does reinforcement learning differ from supervised learning and how important are these two types of learning in animals?
Reinforcement learning is based on the consequences of your actions. There is nobody instructing you what to do in reinforcement learning, instead you see whether your actions lead to success or failure and then use this information to update your beliefs.
Supervised learning is how deep networks learn for instance, it involves making a prediction and comparing that to what should have been predicted in a very granular way.
Reinforcement learning and supervised learning have much in common except the former happens to us all the time, as we are constantly learning from the consequences of our own actions, whereas we are not sure whether supervised learning occurs in the brain yet as there is a paucity of evidence.
What did your research reveal about the role of cortical inhibitory neurons in responding to reward and in contributing to learning?
We study multiple different types of inhibitory neurons, but I am just going to focus on a rare subtype that is called vasoactive intestinal peptide (VIP) interneurons. VIP interneurons make up around 10% of all inhibitory neurons, so that’s around 1-2% of cortex, and they really became accessible with the advent of genetics. Thanks to my colleague Josh Huang at CSHL, we can now target them, study their circuit function and also how they respond during behaviour.
The first surprise came when we realised that VIP interneurons in auditory cortex respond to reward and punishment. We then began to explore why a specific but rare type of neuron would do something so non-auditory in auditory cortex.
Learning is a multi-layered construct: you can learn about the values of things, you can learn about the state of the world, you can even learn to learn. We don’t quite know what behaviour in the auditory cortex VIP neurons contribute to but what we can see is that their responses track learning.
To what extent did VIP interneurons respond to sensory cues predicting reinforcement? How does this change with experience?
VIP interneurons not only respond to reward, but they respond to cues that predict reward or punishment and even visual cues in auditory cortex!
Once a sensory cue predicts the reward or punishment, the direct responses to those reinforcers is diminished and that’s exactly what you would expect for a prediction error signal. Prediction errors have been central to reinforcement learning and so we believe that these interneurons end up guiding learning in some manner.
How does the VIP interneuron response to punishment differ from reward?
The reward and punishment responses are roughly the same. The prediction error is ‘unsigned’, which means that these neurons respond whether things are better than expected or worse than expected, when there is a surprise.
We see the same signal in several cortical areas and this lends itself to the speculation that the signal informs each cortical area about some unexpected consequence and lets each area figure out what went wrong. The brain is highly distributed, so it is unclear what should be updated, and what in fact led to the final action. In this sense, we believe this is the right thing to do by broadcasting to everyone that the outcome wasn’t expected and let each area see what they can do about it.
Each area represents a very small piece of the world and they could use this surprise signal to update their internal model about what’s expected. One can imagine that, over time, this kind of signal would enable you to learn a better model so that area does better. But if that area has nothing to do with the decision, say it’s an auditory decision, and it’s visual cortex, there’s nothing for it to do.
How do the neurons coordinate at a circuit level?
Even though VIP neurons are inhibitory they end up specialising in the inhibition of other inhibitory neurons, so they end up disinhibiting. The particular circuit arrangement is such that they might specifically gate particular pathways because of the identities of the neurons inhibited. This is to be expected from a circuit perspective, as once a surprise is reported, the activation of VIP neurons impacts other neurons, creating a brief window of opportunity for learning. This disinhibitory window might allow cortex to deal with unexpected outcomes.
About Professor Adam Kepecs
Adam Kepecs, Professor and Chair of Neuroscience Program at Cold Spring Harbor Laboratory, studies the neurobiology of decision-making. After receiving his B.Sc. degree in computer science and mathematics at Eotvos Lorand University, Hungary, he switched to studying the brain, completing his Ph.D. at Brandeis University in theoretical neuroscience. For the past decade he has led a research laboratory at CSHL where he employs sophisticated behavioral paradigms and electrophysiological, optical and molecular techniques to study the neural circuitry underlying decision-making in rodents.