Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Reinforcement01:23

Reinforcement

266
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
266
Reinforcement Schedules01:24

Reinforcement Schedules

197
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
197
Randomized Experiments01:13

Randomized Experiments

7.0K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
7.0K
Observational Learning01:12

Observational Learning

202
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
202
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

1.8K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
1.8K
Purposive Learning01:22

Purposive Learning

135
E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...
135

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Multi-omics profiling reveals EMT-driven fibroblast activation in the renal injury niche.

Cellular and molecular life sciences : CMLS·2026
Same author

Effects of macro- and micronutrient intake on bone mineral density, osteoporotic fracture risk, inflammation, and functional rehabilitation outcomes in orthopedic patients: a systematic review and meta-analysis.

Frontiers in nutrition·2026
Same author

Signal similarity-informed generative adversarial network for prediction of basal wetness conditions in Antarctica: a case study in the AGAP region.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026
Same author

BDNF insufficiency exacerbates ALS progression.

Cell reports. Medicine·2026
Same author

Corrigendum to "A fully human monoclonal antibody targeting Semaphorin 5A alleviates the progression of rheumatoid arthritis" [Biomed. Pharmacother. 168 (2023) 115666].

Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie·2026
Same author

DNA barcoding-assisted classification of the genus <i>Actias</i> (Lepidoptera: Saturniidae).

Bulletin of entomological research·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Jul 15, 2025

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
09:01

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

12.6K

Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration.

Wenhui Huang, Cong Zhang, Jingda Wu

    IEEE Transactions on Neural Networks and Learning Systems
    |October 3, 2023
    PubMed
    Summary
    This summary is machine-generated.

    We introduce a novel preference-guided exploration for deep Q-networks (DQN) that enhances learning without bias. This method encourages diverse action sampling, improving performance and convergence speed in reinforcement learning tasks.

    More Related Videos

    A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents
    08:38

    A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

    Published on: November 21, 2019

    7.7K
    Operant Sensation Seeking in the Mouse
    08:39

    Operant Sensation Seeking in the Mouse

    Published on: November 10, 2010

    13.0K

    Related Experiment Videos

    Last Updated: Jul 15, 2025

    The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
    09:01

    The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

    Published on: July 8, 2015

    12.6K
    A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents
    08:38

    A System for Tracking the Dynamics of Social Preference Behavior in Small Rodents

    Published on: November 21, 2019

    7.7K
    Operant Sensation Seeking in the Mouse
    08:39

    Operant Sensation Seeking in the Mouse

    Published on: November 10, 2010

    13.0K

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Reinforcement Learning

    Background:

    • Stochastic exploration is crucial for deep Q-network (DQN) success.
    • Existing exploration methods often introduce bias by heuristically selecting actions or coupling sampling with action values.

    Purpose of the Study:

    • To propose a novel preference-guided epsilon-greedy exploration algorithm for DQN.
    • To facilitate efficient exploration in DQN without introducing additional bias.

    Main Methods:

    • A dual architecture with two branches: a standard DQN branch and a preference branch.
    • The preference branch learns the action preferences implicitly followed by the DQN.
    • Theoretical proof that the policy improvement theorem holds for the preference-guided epsilon-greedy policy.

    Main Results:

    • Experimental validation showing the inferred action preference distribution aligns with value landscapes.
    • The preference-guided exploration encourages diverse action selection, sampling high-value actions more frequently while still exploring lower-value actions.
    • Comprehensive benchmarking against DQN variants across nine environments.

    Conclusions:

    • The proposed preference-guided epsilon-greedy exploration method significantly enhances DQN performance.
    • The approach improves convergence speed compared to existing DQN variants.
    • This method offers a bias-free and effective strategy for exploration in deep reinforcement learning.