Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Hindsight Biases01:12

Hindsight Biases

Hindsight bias leads you to believe that the event you just experienced was predictable, even though it really wasn’t. In other words, you knew all along that things would turn out the way they did. Can you relate this to the phrase "Hindsight is 20/20" now?
Purposive Learning01:22

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a bonus...
Observational Studies01:11

Observational Studies

Observational studies are a type of analytical study where researchers observe events without any interventions. In other words, the researcher does not influence the response variable or the experiment's outcome.
There are three types of observational studies – Prospective, retrospective, and cross-sectional.
Prospective Study
Prospective studies, also known as longitudinal or cohort studies, are carried out by collecting future data from groups sharing similar characteristics. One example of...
Longitudinal Research02:20

Longitudinal Research

Sometimes we want to see how people change over time, as in studies of human development and lifespan. When we test the same group of individuals repeatedly over an extended period of time, we are conducting longitudinal research. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time. For example, we may survey a group of individuals about their dietary habits at age 20, retest them a decade later at age 30, and then again...
Properties of Laplace Transform-II01:16

Properties of Laplace Transform-II

Time differentiation, convolution, integration, and periodicity are fundamental concepts in analyzing functions and signals over time. Each concept provides a unique perspective on how functions evolve, interact, and repeat, offering essential tools for various scientific and engineering applications.
Time differentiation involves analyzing the rate of change of a function over time. Mathematically, it is the derivative of a function with respect to time. This concept can be likened to tracking...
Longitudinal Studies01:26

Longitudinal Studies

Longitudinal studies are also widely used in other medical and social science fields. For instance, in cardiovascular research, they can monitor patients' health over decades to identify risk factors for heart disease, such as high cholesterol or smoking, and evaluate the long-term effectiveness of preventive measures. Similarly, in mental health studies, researchers might follow individuals from adolescence into adulthood to understand the development and progression of conditions like...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Latent subdimensions of anxiety and depression differentially influence exertion of effort in pursuit of reward versus avoidance of threat.

Translational psychiatry·2026
Same author

Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum.

eLife·2026
Same author

Uncertainty for better and worse.

Current opinion in neurobiology·2026
Same author

Dopamine dynamics in human anterior cingulate cortex during Pavlovian-instrumental conflict.

bioRxiv : the preprint server for biology·2026
Same author

Modality-general sensitivity of pupil responses to regularity violations.

Cognitive, affective & behavioral neuroscience·2026
Same author

Metacognitive efficiency in learned value-based choice.

PLoS computational biology·2026
Same journal

Enhancing IoT security: A Creative Swagger Optimization algorithm for DDoS defence.

Network (Bristol, England)·2026
Same journal

Parametric optimization for electrical discharge diamond grinding (EDDG) system using dual approach.

Network (Bristol, England)·2025
Same journal

A novel lung cancer diagnosis model using hybrid convolution (2D/3D)-based adaptive DenseUnet with attention mechanism.

Network (Bristol, England)·2025
Same journal

Hybrid optimization enabled Eff-FDMNet for Parkinson's disease detection and classification in federated learning.

Network (Bristol, England)·2025
Same journal

AI-driven plant disease detection with tailored convolutional neural network.

Network (Bristol, England)·2025
Same journal

Layer modified residual Unet++ for speech enhancement using Aquila Black widow optimizer algorithm.

Network (Bristol, England)·2025
See all related articles

Related Experiment Videos

Prospective and retrospective temporal difference learning.

Peter Dayan1

  • 1Gatsby Computational Neuroscience Unit, UCL, London, WC1N 3AR, UK. dayan@gatsby.ucl.ac.uk

Network (Bristol, England)
|February 21, 2009
PubMed
Summary
This summary is machine-generated.

Monkeys sometimes choose poorly when rewards are delayed. A standard reinforcement learning model, using average reward per step as a baseline, can explain this behavior and improve predictions.

Related Experiment Videos

Area of Science:

  • Behavioral neuroscience
  • Computational neuroscience
  • Reinforcement learning

Background:

  • Monkeys exhibit maladaptive behavior in tasks with delayed rewards, potentially due to Pavlovian influences.
  • Previous research identified that task state performance depends on recent history, contradicting standard reinforcement learning (RL) models.
  • An alternative temporal difference (TD) model was proposed to account for this history-dependent behavior.

Purpose of the Study:

  • To demonstrate that a standard TD model can replicate history-dependent behavior observed in monkeys.
  • To show that this standard model avoids negative consequences for choice prediction.
  • To highlight the role of average reward per step as a crucial baseline in RL.

Main Methods:

  • Analysis of behavioral data from tasks with systematically delayed rewards.
  • Modeling using a standard temporal difference (TD) reinforcement learning framework.
  • Investigating the influence of average reward per step as a dynamic baseline.

Main Results:

  • A standard TD model successfully replicates the observed history-dependent behavior in monkeys.
  • The model demonstrates that average reward per step acts as a critical baseline, influencing predictions.
  • Subtle historical changes to this baseline significantly impact behavioral predictions.

Conclusions:

  • Standard TD models, incorporating average reward per step, can explain complex behavioral phenomena.
  • This approach reconciles observed monkey behavior with established RL theory.
  • Understanding the baseline's dynamic nature is key to accurate prediction of behavior in delayed reward tasks.