Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Instinctive Drift

Instinctive Drift

Instinctive drift refers to the tendency of animals to revert to their innate behaviors despite repeated reinforcement. Breland and Breland demonstrated this concept in an experiment with a raccoon. The raccoon was trained to pick up two coins and place them in a container in exchange for food. Initially, the raccoon learned to associate the coins with food, making them a conditioned stimulus or a substitute for food. However, over time, the raccoon became less willing to put the coins into the...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Cognitive Learning

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Offline constrained policy optimization with safe anchoring.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Spatiotemporal evolution and trade-offs/synergies of ecosystem services in Hubei Province.

Scientific reports·2025

Same author

Measuring the resilience of mountain city ecological network: a methodological framework integrating real disaster shocks and simulated disturbance scenarios.

Journal of environmental management·2025

Same author

Did green infrastructure improve water purification ecosystem services in Shandong Peninsula urban agglomeration? Evidence from total phosphorus.

Journal of environmental management·2024

Same author

Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2024

Same author

Retraction Note: Changes in ecological networks and eco-environmental effects on urban ecosystem in China's typical urban agglomerations.

Environmental science and pollution research international·2024

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 3, 2025

Operant Procedures for Assessing Behavioral Flexibility in Rats

Operant Procedures for Assessing Behavioral Flexibility in Rats

Published on: February 15, 2015

Efficient Offline Reinforcement Learning With Relaxed Conservatism.

Longyang Huang, Botao Dong, Weidong Zhang

IEEE Transactions on Pattern Analysis and Machine Intelligence

|February 12, 2024

Summary

This summary is machine-generated.

This study introduces a new offline reinforcement learning (RL) framework, ORL-RC, to address conservatism issues. ORL-RC learns a Q-function closer to the true Q-function, improving policy performance and outperforming existing methods.

More Related Videos

Extinction Training During the Reconsolidation Window Prevents Recovery of Fear

Extinction Training During the Reconsolidation Window Prevents Recovery of Fear

Published on: August 24, 2012

A Prediction Error-driven Retrieval Procedure for Destabilizing and Rewriting Maladaptive Reward Memories in Hazardous Drinkers

A Prediction Error-driven Retrieval Procedure for Destabilizing and Rewriting Maladaptive Reward Memories in Hazardous Drinkers

Published on: January 5, 2018

Related Experiment Videos

Last Updated: Jul 3, 2025

Operant Procedures for Assessing Behavioral Flexibility in Rats

Operant Procedures for Assessing Behavioral Flexibility in Rats

Published on: February 15, 2015

Extinction Training During the Reconsolidation Window Prevents Recovery of Fear

Extinction Training During the Reconsolidation Window Prevents Recovery of Fear

Published on: August 24, 2012

A Prediction Error-driven Retrieval Procedure for Destabilizing and Rewriting Maladaptive Reward Memories in Hazardous Drinkers

A Prediction Error-driven Retrieval Procedure for Destabilizing and Rewriting Maladaptive Reward Memories in Hazardous Drinkers

Published on: January 5, 2018

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Offline reinforcement learning (RL) aims to learn optimal policies from static datasets without environmental interaction.
Existing offline RL methods face challenges with conservatism in learned Q-functions and policies, potentially degrading performance.
Theoretical understanding of offline RL conservatism requires further investigation.

Purpose of the Study:

To propose a simple and efficient offline RL framework with relaxed conservatism (ORL-RC).
To analyze the conservatism of learned Q-functions and policies in offline RL.
To theoretically establish convergence and bounds for the proposed ORL-RC framework.

Main Methods:

Developed the offline RL with relaxed conservatism (ORL-RC) framework.
Analyzed the conservatism of Q-functions and policies in offline RL.
Established theoretical convergence results and bounds for learned Q-functions, considering sampling errors.

Main Results:

Demonstrated that conservatism in offline RL can lead to policy performance degradation.
The proposed ORL-RC framework learns a Q-function closer to the true Q-function.
Experimental results on the D4RL benchmark show ORL-RC outperforms state-of-the-art offline RL methods.

Conclusions:

ORL-RC effectively addresses conservatism issues in offline RL.
The framework offers improved Q-function approximation and policy performance.
ORL-RC represents a significant advancement in offline reinforcement learning.