Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Reinforcement Schedules01:24

Reinforcement Schedules

292
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
292
Reinforcement01:23

Reinforcement

529
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
529
Observational Learning01:12

Observational Learning

520
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
520
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

2.2K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
2.2K
Operant Conditioning01:21

Operant Conditioning

1.9K
Operant conditioning, a key concept in behavioral psychology, involves using reinforcement and punishment to alter the likelihood of a behavior being repeated. B.F. introduced this type of conditioning. Skinner focused on voluntary behaviors and the consequences that follow them, influencing whether these behaviors will be strengthened or diminished.
Reinforcement in operant conditioning can be positive or negative, both of which serve to increase the likelihood of a behavior. Positive...
1.9K
Primary and Secondary Reinforcers01:23

Primary and Secondary Reinforcers

512
In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...
512

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Application of a multimodal MRI model integrating radiomics and habitat features for predicting glioma pathology and prognosis.

BMC medical imaging·2026
Same author

Neural Spelling: A Spell-Based BCI System for Language Neural Decoding.

IEEE transactions on bio-medical engineering·2026
Same author

A Hybrid Covert Attention-Augmented Motor Imagery Paradigm for Brain-Computer Interfaces.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society·2026
Same author

BrainParc: unified lifespan brain parcellation from structural magnetic resonance images.

Nature computational science·2026
Same author

Exploratory analysis of high-altitude effects on brain morphology and myelination in native Tibetan infants.

Translational pediatrics·2026
Same author

Distinct effects of empathy on self-other processing revealed by different behavioral and EEG indices.

Cognitive, affective & behavioral neuroscience·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Nov 2, 2025

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
09:01

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

12.8K

Weak Human Preference Supervision for Deep Reinforcement Learning.

Zehong Cao, KaiChiu Wong, Chin-Teng Lin

    IEEE Transactions on Neural Networks and Learning Systems
    |June 8, 2021
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a weak human preference supervision framework for reinforcement learning (RL). It reduces human input by 30% using a novel preference scaling model and demonstration estimator, improving RL task performance.

    More Related Videos

    Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents
    07:05

    Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

    Published on: September 10, 2018

    6.2K
    Evaluating Skilled Prehension in Mice Using an Auto-Trainer
    05:01

    Evaluating Skilled Prehension in Mice Using an Auto-Trainer

    Published on: September 12, 2019

    5.8K

    Related Experiment Videos

    Last Updated: Nov 2, 2025

    The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
    09:01

    The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

    Published on: July 8, 2015

    12.8K
    Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents
    07:05

    Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

    Published on: September 10, 2018

    6.2K
    Evaluating Skilled Prehension in Mice Using an Auto-Trainer
    05:01

    Evaluating Skilled Prehension in Mice Using an Auto-Trainer

    Published on: September 12, 2019

    5.8K

    Area of Science:

    • Artificial Intelligence
    • Robotics
    • Machine Learning

    Background:

    • Reinforcement learning (RL) often requires a predefined reward function, which is difficult to specify for complex tasks.
    • Current methods using human preferences for reward learning demand extensive, iterative human input, limiting scalability.
    • Dynamic and efficient human feedback mechanisms are crucial for advancing RL applications.

    Purpose of the Study:

    • To develop a weak human preference supervision framework to reduce the cost and increase the efficiency of reward learning in RL.
    • To introduce a human preference scaling model that captures nuanced human judgments of trajectory preferences.
    • To establish a human-demonstration estimator for predicting preferences and minimizing direct human intervention.

    Main Methods:

    • Developed a human preference scaling model to quantify the degree of weak preferences between trajectory segments.
    • Implemented a supervised learning approach for a human-demonstration estimator to predict preferences.
    • Evaluated the framework on simulated robot locomotion tasks (MuJoCo games) comparing performance against fixed preference methods.

    Main Results:

    • The weak human preference supervision framework effectively solves complex RL tasks, achieving higher cumulative rewards than fixed preference methods.
    • The human-demonstration estimator required human feedback for less than 0.01% of agent interactions.
    • Human input costs were reduced by up to 30% compared to existing approaches.

    Conclusions:

    • The proposed framework offers a more natural and efficient approach to reward learning through weakly supervised learning.
    • This method significantly reduces the need for extensive human input, making RL more practical for complex real-world problems.
    • The approach shows promise for advanced RL systems, including human-autonomy teaming.