Probabilistic Reversal Learning Task

Xiaowen Zhuang; Alexandra Nelson; Berenice Coutant

Aug 31, 2024

Probabilistic Reversal Learning Task

DOI

dx.doi.org/10.17504/protocols.io.261ge51eog47/v1

¹UCSF

Alexandra Nelson

UCSF

DOI: dx.doi.org/10.17504/protocols.io.261ge51eog47/v1

Protocol Citation: Xiaowen Zhuang, Alexandra Nelson, Berenice Coutant 2024. Probabilistic Reversal Learning Task. protocols.io https://dx.doi.org/10.17504/protocols.io.261ge51eog47/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: August 31, 2024

Last Modified: August 31, 2024

Protocol Integer ID: 106778

Keywords: ASAPCRN

Funders Acknowledgements:

Aligning Science Across Parkinson's

Grant ID: ASAP-020529

Abstract

This is a protocol used to assess mouse reversal learning in an operant box. Food-deprived mice are initially trained in the operant box (milk as the reward) using the related "Basic Operant Behavioral Training" protocol in the same folder on protocols.io. See that protocol for materials and operant box design. Mice are then trained and tested on the protocol included here to assess reversal learning with probabilistic reward delivery. 

Overview/Setup

Setup: This protocol can be used after following the Basic Operant Training protocol listed separately in the same folder on protocols.io. This related protocol includes information about operant box design and implementation, as well as training/shaping that is used prior to using the Probabilistic Reversal Learning Task detailed here. 

Probabilistic Reversal Learning contains two phases. In the initial phase, the two choice ports are associated with 80 (high) and 0% (low) probability of reward, and contingencies are switched every 7-23 rewarded trials (Phase5.1). In the second phase, the two choice ports are associated with 80 (high) and 20% (low) probability of reward, with a switch in contingencies occurring when a mouse chooses the high-probability reward port in >80% of the last 15 trials (Phase5.2). There is a maximum of 250 trials per session. In a given mouse, the left and right sides are randomly assigned to either the “high” or “low” probability outcomes.

Figure 1: Phase5.2/Probabilistic Reversal Learning. Schematic of evaluation
phase5.2 with left port being an 80% chance to gain of reward and the right
port assigned to a 20% change to gain a reward. At the start of each trial, the
center port light is illuminated, and the mouse must poke in the central port
to initiate the trial. Then both side cue lights illuminate. Depending on the
choice of the side port, mice receive a reward based on the probability chance.
The switch between ports happens after >80% of rewarded trials on the “high”
probability port during the last 15 trials. Each session contains 250 trials. 

Task Design

Mice are trained with one session per day consisting of 250 trials. “High” and “low” probability ports are randomly assigned to the left and right port at the beginning of the session. All trials start with a 10-second illumination of the central port light. A central nosepoke within 10 seconds extinguishes the central nosepoke light and initiates the next phase of the trial. Trials in which the mouse fails to nosepoke during this window are recorded as center omissions. After the center nosepoke, cue lights on both side ports are illuminated for 10 seconds, and mice can nosepoke in either side port. Trials in which mice fail to nosepoke at a side port during this 10-second window are recorded as side omissions. A side nosepoke at the “high” probability port results in reward delivery (10uL) in the central port 80% of the time in both phase 5.1 and phase 5.2. A side nosepoke at the “low” probability port results in reward delivery in the central port 0% of the time in phase 5.1 or 20% of the time in phase 5.2. A side nosepoke at the “low” probability port results in reward delivery in the central port 0% of the time in phase 5.1 or 20% of the time in phase 5.2. Once a side port is selected, reward is delivered at the central port based on these probabilities and the central port light stays illuminated for 10 seconds. Once the central port light turns off, the intertrial interval (ITI) begins (a randomly selected period of time between 20-30 seconds). In Phase 5.1, a switch in the pairing of left and right ports with “high” (80%) and “low” (0%) probability of reward occurs periodically. Contingency reversals  occur at a random frequency (every 7-23 rewarded trials) throughout the session.
Mice continue these sessions for a minimum of 6 days and maximum of 10 days; mice reaching a total of >80 rewarded trials over 3 cumulative days are classified as ‘learners’ and go to Phase5.2. If a mouse does not reach the >80 rewarded trials criterion after 10 days of training, it is labelled a ‘non-learner’, but still goes on to Phase5.2. In Phase5.2, trials have a similar structure, but the contingency switch only occurs when mice choose the high reward-probability port on >80% of the last 15 trials. All mice undergo 3 Phase5.2 sessions. 

Data Collection and Analysis

During experimental sessions, the following information and events (timestamps) are
recorded for subsequent analysis:
-Initial random assignment of either the “high” or “low” probability-associated port.
-Trial start.
-First center nosepoke (beam break).  The interval between trial start and center nosepoke is the reaction time to initiate a trial. 
-Side port (left or right) nosepoke (beam break). The interval between center nosepoke and side port nosepoke is the reaction time to choose a side. 
-Second center port nosepoke (beam break), for reward retrieval. 
-End of trial, time-out. End of trial may occur at the time of reward retrieval, or when an animal does not perform the next nosepoke within the prescribed period (omission).

Using the above information and timestamps, software/code can extract for each session: 
-Choice of port (high or low-probability of reward).
-Reaction time to initiate a trial, which may be used as a readout of attention and/or motivation.
-Reaction time to choose a side, which may reflect motivation regarding specific choices within blocks. 
-Number of contingency switches within a session.
-Center and Side omissions. Omissions can reflect motivation. 

Public workspaceProbabilistic Reversal Learning Task

Probabilistic Reversal Learning Task