Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Multi-omics profiling reveals EMT-driven fibroblast activation in the renal injury niche.

Cellular and molecular life sciences : CMLS·2026

Same author

Effects of macro- and micronutrient intake on bone mineral density, osteoporotic fracture risk, inflammation, and functional rehabilitation outcomes in orthopedic patients: a systematic review and meta-analysis.

Frontiers in nutrition·2026

Same author

Signal similarity-informed generative adversarial network for prediction of basal wetness conditions in Antarctica: a case study in the AGAP region.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same author

BDNF insufficiency exacerbates ALS progression.

Cell reports. Medicine·2026

Same author

Corrigendum to "A fully human monoclonal antibody targeting Semaphorin 5A alleviates the progression of rheumatoid arthritis" [Biomed. Pharmacother. 168 (2023) 115666].

Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie·2026

Same author

DNA barcoding-assisted classification of the genus <i>Actias</i> (Lepidoptera: Saturniidae).

Bulletin of entomological research·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 15, 2025

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration.

Wenhui Huang, Cong Zhang, Jingda Wu

IEEE Transactions on Neural Networks and Learning Systems

|October 3, 2023

Summary

This summary is machine-generated.

We introduce a novel preference-guided exploration for deep Q-networks (DQN) that enhances learning without bias. This method encourages diverse action sampling, improving performance and convergence speed in reinforcement learning tasks.

More Related Videos

A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

Published on: November 21, 2019

Operant Sensation Seeking in the Mouse

Operant Sensation Seeking in the Mouse

Published on: November 10, 2010

Related Experiment Videos

Last Updated: Jul 15, 2025

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

Published on: November 21, 2019

Operant Sensation Seeking in the Mouse

Operant Sensation Seeking in the Mouse

Published on: November 10, 2010

Area of Science:

Artificial Intelligence
Machine Learning
Reinforcement Learning

Background:

Stochastic exploration is crucial for deep Q-network (DQN) success.
Existing exploration methods often introduce bias by heuristically selecting actions or coupling sampling with action values.

Purpose of the Study:

To propose a novel preference-guided epsilon-greedy exploration algorithm for DQN.
To facilitate efficient exploration in DQN without introducing additional bias.

Main Methods:

A dual architecture with two branches: a standard DQN branch and a preference branch.
The preference branch learns the action preferences implicitly followed by the DQN.
Theoretical proof that the policy improvement theorem holds for the preference-guided epsilon-greedy policy.

Main Results:

Experimental validation showing the inferred action preference distribution aligns with value landscapes.
The preference-guided exploration encourages diverse action selection, sampling high-value actions more frequently while still exploring lower-value actions.
Comprehensive benchmarking against DQN variants across nine environments.

Conclusions:

The proposed preference-guided epsilon-greedy exploration method significantly enhances DQN performance.
The approach improves convergence speed compared to existing DQN variants.
This method offers a bias-free and effective strategy for exploration in deep reinforcement learning.