Data from the probabilistic reversal learning task, presented in 'Sex-dependent effects of early life stress on reinforcement learning and limbic cortico-striatal functional connectivity' (Zühlsdorff et al, 2023, Neurobiology of Stress). Data collection: Animals were trained in one of eight identical operant chambers (Med Associates, St. Albans, VT, USA). On one wall, there was a 15-inch LCD touchscreen, and on the opposite side of the chamber, a pellet receptacle with a head-entry detector and accompanying light was located. Small 50% sucrose pellets were delivered by an electronic pellet dispenser (TestDiet, St. Louis, MO, USA) when trials were completed. K-Limbic software (Conclusive Marketing LTD., High Wych, UK) was used to control the touchscreens. Animals were trained or tested during a single session each day, which automatically ended after 40 min or when 200 trials were completed, whichever came first. After habituation to the test environment, animals were trained to respond to a solid white square stimulus with an ‘X’ in the middle on the right or left side of the touchscreen. If the stimulus was touched, a 0.5 s tone was presented, the pellet receptacle light switched on, and a food pellet was delivered. The stimulus remained illuminated on the screen until the animal touched the screen. When the animal took the reward from the receptacle, the light was turned off and a 5 s inter-trial interval was initiated. After receiving at least 100 rewards during two consecutive sessions, the rats progressed to the next stage, which animals had to respond to the solid white square stimulus, but any touch outside of the stimulus was punished by the house light being on for 5 s, followed by the receptacle light being switched on until the animal made a head entry, after which the receptacle light was turned off and a 5 s ITI began. After 100 rewards on two consecutive sessions, they progressed to the deterministic reversal learning task, during which two white square stimuli on opposite sides of the screen were presented simultaneously. One was the ‘correct’ stimulus and the other one was ‘incorrect’. When the former was touched, the trial was rewarded. If the ‘incorrect’ option was touched, a 5 s punishment was introduced. When the animal selected the ‘correct’ stimulus 8 times consecutively, the contingencies were reversed, and the adjacent side was now the ‘correct’ choice. After animals achieved 4 reversals on two consecutive days, they progressed to the probabilistic reversal learning task. This task was structured similarly to the deterministic reversal learning task, except that in PRL only 80% of trials were rewarded for a ‘correct’ response, and 20% of ‘incorrect’ responses were rewarded randomly. Contingencies were reversed when the animal reached the reversal criterion, which was defined as 8 consecutive responses being correct. A summary of the different stages of the task with figures can be found in the supplementary materials. PRL testing started at PND241-277 and ended between PND257-290. After adulthood stress (see below), PRL testing was repeated over 3 days (Fig. 1). Since there were only 3 sessions after adulthood stress, the results prior to the stressor were constrained to the first 3 sessions. For each session, trials up until the fourth reversal were included. Forty-nine animals were included in the final analysis, as some animals had to be excluded due to data loss or the animal not completing the task correctly, e.g., by not engaging and not selecting either stimulus throughout the session (control females: 3; MS females: 5; MS males: 1). Data analysis: Win-stay and lose-shift behaviour after a correct and an incorrect trial (lose-shift correct and lose-shift incorrect, respectively) were calculated from the raw data, as well as the number of trials required to reach a reversal (otherwise known as trials to criterion), the proportion of correct responses and number of perseverative responses, which were measured by the number of times the animal would select the same stimulus after a contingency reversal and before pressing the correct stimulus (Jentsch et al., 2002). Win-stay behaviour was calculated as the percentage of trials during which the animal selected the same stimulus as on the previous trial if the previous trial was rewarded. Lose-shift, on the other hand, was the percentage of trials during which the animal shifted its response to the other stimulus after not receiving a reward on the last trial. Statistical analysis: The data were centred, and linear mixed-effects (LME) models were fit to the conventional parameters, with MS, sex, adulthood stress and their interaction used as fixed factors, with a random intercept for each subject and subsequent post-hoc pairwise comparisons of estimated marginal means (Lenth, 2018; Pinheiro et al., 2021). The residuals were checked for normality using the Shapiro-Wilk test, which was verified in our data and confirmed an assumption required for fitting a LME model (Schielzeth et al., 2020). All statistical analyses were run in R (R: A Language and Environment for Statistical Computing (R Core Team, 2020)). For all analyses, significance was at p = 0.05. File information: PRL-summary-pre-stress.csv: data prior to adulthood stress PRL-summary-post-stress: data post adulthood stress Data column definitions: File name: name of file data extracted from. Date: date of data collection. ID: rat ID. PropCor: proportion correct responses. Trials-to-crit: trials to criterion. Win-stay-corr: win-stay behaviour after a correct response. Lose-shift-corr: lose-shift after a correct response. Win-stay-incorr: win-stay after an incorrect response. Lose-shift-incorr: lose-shift after an incorrect response. Perseverative_resp: perseverative responses.