Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Operant Conditioning Intervention01:24

Operant Conditioning Intervention

604
Operant conditioning serves as a foundational principle in therapeutic interventions aimed at modifying maladaptive behaviors. Central to this approach is the notion that behaviors, both adaptive and maladaptive, are learned through reinforcement. By analyzing the environmental factors that reinforce problematic behaviors, clinicians can design interventions to weaken these reinforcements and replace maladaptive behaviors with healthier alternatives.
In operant conditioning, behaviors that are...
604
Reinforcement Schedules01:24

Reinforcement Schedules

668
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
668
Reinforcement01:23

Reinforcement

1.1K
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
1.1K
Timing and Consequences on Behavior01:08

Timing and Consequences on Behavior

596
In operant conditioning, the timing of reinforcement is crucial. For animals like rats and cats, immediate reinforcement (within a few seconds) is much more effective than delayed reinforcement. For example, a food reward for a rat needs to follow within 30 seconds of pressing a bar to be effective. 
Humans, however, can respond to delayed reinforcers. We often make decisions between immediate small rewards and delayed larger rewards. This ability to delay gratification is a significant...
596
Modeling in Therapy01:26

Modeling in Therapy

665
Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...
665
Primary and Secondary Reinforcers01:23

Primary and Secondary Reinforcers

1.4K
In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...
1.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

MiR-21 promotes osteogenic transformation in ankylosing spondylitis fibroblasts and modulates bone metabolism in a murine arthritis model potentially involving the MAPK NF-κB pathway.

Journal of orthopaedic surgery and research·2026
Same author

A Machine Learning Driven Approach to Quantifying Coronary Artery Tortuosity.

JACC. Advances·2026
Same author

Mobile intervention for emerging adults with regular cannabis use: a micro-randomized trial.

Lancet regional health. Americas·2026
Same author

Elevated red cell distribution width as a prognostic indicator in critically ill patients with atrial fibrillation and chronic kidney disease.

BMC cardiovascular disorders·2026
Same author

Reproducible workflow for online artificial intelligence in digital health.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026
Same author

Personalized modeling of stress and blood pressure reactivity using mobile health data.

Npj mental health research·2026
Same journal

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits.

Reinforcement learning journal·2026
Same journal

Non-Stationary Latent Auto-Regressive Bandits.

Reinforcement learning journal·2026
See all related articles

Related Experiment Video

Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

16.1K

When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions.

Ian M Moore1, Eura Nofshin1, Siddharth Swaroop1

  • 1Department of Computer Science, Harvard University, USA.

Reinforcement Learning Journal
|March 16, 2026
PubMed
Summary
This summary is machine-generated.

AI agents can better guide humans by modeling their reward discounting. This study introduces an exponential discount factor approximation for hyperbolic discounting, improving AI interventions and surprisingly outperforming hyperbolic models in online learning.

Keywords:
Agent-based modeling of humansHuman-AI interactionHyperbolic discounting

More Related Videos

Errors as a Means of Reducing Impulsive Food Choice
07:07

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

9.3K
Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats
09:12

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

10.3K

Related Experiment Videos

Last Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

16.1K
Errors as a Means of Reducing Impulsive Food Choice
07:07

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

9.3K
Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats
09:12

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

10.3K

Area of Science:

  • Artificial Intelligence
  • Cognitive Science
  • Reinforcement Learning

Background:

  • Human decision-making often exhibits hyperbolic discounting of future rewards.
  • Current Reinforcement Learning (RL) models predominantly use exponential discounting for human behavior, simplifying planning.
  • This discrepancy poses challenges for AI agents aiming to effectively guide human behavior.

Purpose of the Study:

  • To investigate the trade-offs between computational cost and performance benefits of modeling humans with hyperbolic discounting.
  • To develop and evaluate an AI policy that modifies human discounting behavior to achieve distant goals.
  • To determine if approximating hyperbolic discounting with an exponential factor is computationally feasible and effective.

Main Methods:

  • Derived a fixed exponential discount factor to approximate hyperbolic discounting in human models.
  • Proved theoretical guarantees for the approximation, ensuring no necessary AI interventions are missed.
  • Compared the approximation against the mean hazard rate method for reducing unnecessary interventions (false positives).

Main Results:

  • The derived exponential approximation guarantees that AI agents will not miss crucial interventions.
  • The approximation results in fewer false positives compared to the mean hazard rate method.
  • Experimental results show that exponential approximations outperform true hyperbolic models in online learning scenarios.

Conclusions:

  • Approximating hyperbolic discounting with a fixed exponential factor is a viable strategy for AI agents.
  • This approach enhances the effectiveness of AI interventions by improving human goal-directed behavior.
  • The surprising finding that exponential approximations excel in online learning warrants further investigation into AI-human interaction dynamics.