Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Operant Conditioning Intervention

Operant Conditioning Intervention

Operant conditioning serves as a foundational principle in therapeutic interventions aimed at modifying maladaptive behaviors. Central to this approach is the notion that behaviors, both adaptive and maladaptive, are learned through reinforcement. By analyzing the environmental factors that reinforce problematic behaviors, clinicians can design interventions to weaken these reinforcements and replace maladaptive behaviors with healthier alternatives.
In operant conditioning, behaviors that are...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Timing and Consequences on Behavior

Timing and Consequences on Behavior

In operant conditioning, the timing of reinforcement is crucial. For animals like rats and cats, immediate reinforcement (within a few seconds) is much more effective than delayed reinforcement. For example, a food reward for a rat needs to follow within 30 seconds of pressing a bar to be effective.
Humans, however, can respond to delayed reinforcers. We often make decisions between immediate small rewards and delayed larger rewards. This ability to delay gratification is a significant...

Modeling in Therapy

Modeling in Therapy

Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...

Primary and Secondary Reinforcers

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MiR-21 promotes osteogenic transformation in ankylosing spondylitis fibroblasts and modulates bone metabolism in a murine arthritis model potentially involving the MAPK NF-κB pathway.

Journal of orthopaedic surgery and research·2026

Same author

A Machine Learning Driven Approach to Quantifying Coronary Artery Tortuosity.

JACC. Advances·2026

Same author

Mobile intervention for emerging adults with regular cannabis use: a micro-randomized trial.

Lancet regional health. Americas·2026

Same author

Elevated red cell distribution width as a prognostic indicator in critically ill patients with atrial fibrillation and chronic kidney disease.

BMC cardiovascular disorders·2026

Same author

Reproducible workflow for online artificial intelligence in digital health.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same author

Personalized modeling of stress and blood pressure reactivity using mobile health data.

Npj mental health research·2026

Same journal

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits.

Reinforcement learning journal·2026

Same journal

Non-Stationary Latent Auto-Regressive Bandits.

Reinforcement learning journal·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions.

Ian M Moore¹, Eura Nofshin¹, Siddharth Swaroop¹

¹Department of Computer Science, Harvard University, USA.

Reinforcement Learning Journal

|March 16, 2026

Summary

This summary is machine-generated.

AI agents can better guide humans by modeling their reward discounting. This study introduces an exponential discount factor approximation for hyperbolic discounting, improving AI interventions and surprisingly outperforming hyperbolic models in online learning.

Keywords:

Agent-based modeling of humans Human-AI interaction Hyperbolic discounting

More Related Videos

Errors as a Means of Reducing Impulsive Food Choice

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

Related Experiment Videos

Last Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

Errors as a Means of Reducing Impulsive Food Choice

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

Area of Science:

Artificial Intelligence
Cognitive Science
Reinforcement Learning

Background:

Human decision-making often exhibits hyperbolic discounting of future rewards.
Current Reinforcement Learning (RL) models predominantly use exponential discounting for human behavior, simplifying planning.
This discrepancy poses challenges for AI agents aiming to effectively guide human behavior.

Purpose of the Study:

To investigate the trade-offs between computational cost and performance benefits of modeling humans with hyperbolic discounting.
To develop and evaluate an AI policy that modifies human discounting behavior to achieve distant goals.
To determine if approximating hyperbolic discounting with an exponential factor is computationally feasible and effective.

Main Methods:

Derived a fixed exponential discount factor to approximate hyperbolic discounting in human models.
Proved theoretical guarantees for the approximation, ensuring no necessary AI interventions are missed.
Compared the approximation against the mean hazard rate method for reducing unnecessary interventions (false positives).

Main Results:

The derived exponential approximation guarantees that AI agents will not miss crucial interventions.
The approximation results in fewer false positives compared to the mean hazard rate method.
Experimental results show that exponential approximations outperform true hyperbolic models in online learning scenarios.

Conclusions:

Approximating hyperbolic discounting with a fixed exponential factor is a viable strategy for AI agents.
This approach enhances the effectiveness of AI interventions by improving human goal-directed behavior.
The surprising finding that exponential approximations excel in online learning warrants further investigation into AI-human interaction dynamics.