Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Hindsight Biases

Hindsight Biases

Hindsight bias leads you to believe that the event you just experienced was predictable, even though it really wasn’t. In other words, you knew all along that things would turn out the way they did. Can you relate this to the phrase "Hindsight is 20/20" now?

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a bonus...

Observational Studies

Observational Studies

Observational studies are a type of analytical study where researchers observe events without any interventions. In other words, the researcher does not influence the response variable or the experiment's outcome.
There are three types of observational studies – Prospective, retrospective, and cross-sectional.
Prospective Study
Prospective studies, also known as longitudinal or cohort studies, are carried out by collecting future data from groups sharing similar characteristics. One example of...

Longitudinal Research

Longitudinal Research

Sometimes we want to see how people change over time, as in studies of human development and lifespan. When we test the same group of individuals repeatedly over an extended period of time, we are conducting longitudinal research. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time. For example, we may survey a group of individuals about their dietary habits at age 20, retest them a decade later at age 30, and then again...

Properties of Laplace Transform-II

Properties of Laplace Transform-II

Time differentiation, convolution, integration, and periodicity are fundamental concepts in analyzing functions and signals over time. Each concept provides a unique perspective on how functions evolve, interact, and repeat, offering essential tools for various scientific and engineering applications.
Time differentiation involves analyzing the rate of change of a function over time. Mathematically, it is the derivative of a function with respect to time. This concept can be likened to tracking...

Longitudinal Studies

Longitudinal Studies

Longitudinal studies are also widely used in other medical and social science fields. For instance, in cardiovascular research, they can monitor patients' health over decades to identify risk factors for heart disease, such as high cholesterol or smoking, and evaluate the long-term effectiveness of preventive measures. Similarly, in mental health studies, researchers might follow individuals from adolescence into adulthood to understand the development and progression of conditions like...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Latent subdimensions of anxiety and depression differentially influence exertion of effort in pursuit of reward versus avoidance of threat.

Translational psychiatry·2026

Same author

Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum.

eLife·2026

Same author

Uncertainty for better and worse.

Current opinion in neurobiology·2026

Same author

Dopamine dynamics in human anterior cingulate cortex during Pavlovian-instrumental conflict.

bioRxiv : the preprint server for biology·2026

Same author

Modality-general sensitivity of pupil responses to regularity violations.

Cognitive, affective & behavioral neuroscience·2026

Same author

Metacognitive efficiency in learned value-based choice.

PLoS computational biology·2026

Same journal

Enhancing IoT security: A Creative Swagger Optimization algorithm for DDoS defence.

Network (Bristol, England)·2026

Same journal

Parametric optimization for electrical discharge diamond grinding (EDDG) system using dual approach.

Network (Bristol, England)·2025

Same journal

A novel lung cancer diagnosis model using hybrid convolution (2D/3D)-based adaptive DenseUnet with attention mechanism.

Network (Bristol, England)·2025

Same journal

Hybrid optimization enabled Eff-FDMNet for Parkinson's disease detection and classification in federated learning.

Network (Bristol, England)·2025

Same journal

AI-driven plant disease detection with tailored convolutional neural network.

Network (Bristol, England)·2025

Same journal

Layer modified residual Unet++ for speech enhancement using Aquila Black widow optimizer algorithm.

Network (Bristol, England)·2025

See all related articles

Search research articles

Related Experiment Videos

Prospective and retrospective temporal difference learning.

¹Gatsby Computational Neuroscience Unit, UCL, London, WC1N 3AR, UK. dayan@gatsby.ucl.ac.uk

Network (Bristol, England)

|February 21, 2009

Summary

This summary is machine-generated.

Monkeys sometimes choose poorly when rewards are delayed. A standard reinforcement learning model, using average reward per step as a baseline, can explain this behavior and improve predictions.

Related Experiment Videos

Area of Science:

Behavioral neuroscience
Computational neuroscience
Reinforcement learning

Background:

Monkeys exhibit maladaptive behavior in tasks with delayed rewards, potentially due to Pavlovian influences.
Previous research identified that task state performance depends on recent history, contradicting standard reinforcement learning (RL) models.
An alternative temporal difference (TD) model was proposed to account for this history-dependent behavior.

Purpose of the Study:

To demonstrate that a standard TD model can replicate history-dependent behavior observed in monkeys.
To show that this standard model avoids negative consequences for choice prediction.
To highlight the role of average reward per step as a crucial baseline in RL.

Main Methods:

Analysis of behavioral data from tasks with systematically delayed rewards.
Modeling using a standard temporal difference (TD) reinforcement learning framework.
Investigating the influence of average reward per step as a dynamic baseline.

Main Results:

A standard TD model successfully replicates the observed history-dependent behavior in monkeys.
The model demonstrates that average reward per step acts as a critical baseline, influencing predictions.
Subtle historical changes to this baseline significantly impact behavioral predictions.

Conclusions:

Standard TD models, incorporating average reward per step, can explain complex behavioral phenomena.
This approach reconciles observed monkey behavior with established RL theory.
Understanding the baseline's dynamic nature is key to accurate prediction of behavior in delayed reward tasks.