Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

213
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
213
Reinforcement01:23

Reinforcement

282
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
282
Reinforcement Schedules01:24

Reinforcement Schedules

208
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
208
State Space Representation01:27

State Space Representation

245
The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...
245
Purposive Learning01:22

Purposive Learning

146
E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...
146
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

1.8K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
1.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cross-sector deep learning scales life cycle assessment using unified textual descriptions.

Environmental science and ecotechnology·2026
Same author

[Retracted] Acaricidal activity of extracts from <i>Ligularia virgaurea</i> against the <i>Sarcoptes scabiei</i> mite <i>in vitro</i>.

Experimental and therapeutic medicine·2026
Same author

Recent Advances on Off-Policy Reinforcement Learning for Optimization Control.

IEEE transactions on cybernetics·2026
Same author

Optimal cooperative output regulation with norm-based performance specifications.

ISA transactions·2026
Same author

Collaborative Diagnosis of Spatiotemporal Faults and Sensor Anomalies in Parabolic Distributed Parameter Systems.

IEEE transactions on cybernetics·2026
Same author

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026
Same journal

Self-Supervised Continuous Dynamic Graph Representation Learning via Hawkes Processes.

IEEE transactions on neural networks and learning systems·2026
Same journal

cPU: Consistent Risk Estimator for Positive-Unlabeled Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Tuning-Free Latent Diffusion Models for Ultrahigh-Resolution Image Editing.

IEEE transactions on neural networks and learning systems·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Jul 24, 2025

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice
08:59

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

2.1K

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.

Biao Luo, Zhengke Wu, Fei Zhou

    IEEE Transactions on Neural Networks and Learning Systems
    |July 7, 2023
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a new human-in-the-loop reinforcement learning (HRL) algorithm for continuous action spaces. The Q value-dependent policy (QDP)-HRL method enhances learning speed and performance by selectively using expert advice.

    More Related Videos

    Investigating Motor Skill Learning Processes with a Robotic Manipulandum
    07:52

    Investigating Motor Skill Learning Processes with a Robotic Manipulandum

    Published on: February 12, 2017

    8.8K
    Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function
    06:17

    Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

    Published on: January 26, 2024

    2.0K

    Related Experiment Videos

    Last Updated: Jul 24, 2025

    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice
    08:59

    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

    Published on: March 3, 2023

    2.1K
    Investigating Motor Skill Learning Processes with a Robotic Manipulandum
    07:52

    Investigating Motor Skill Learning Processes with a Robotic Manipulandum

    Published on: February 12, 2017

    8.8K
    Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function
    06:17

    Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

    Published on: January 26, 2024

    2.0K

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Robotics

    Background:

    • Reinforcement learning (RL) often suffers from sample inefficiency.
    • Existing human-in-the-loop RL (HRL) methods primarily address discrete action spaces.
    • Continuous action spaces present unique challenges for efficient learning and human guidance.

    Purpose of the Study:

    • To propose a novel Q value-dependent policy (QDP)-based HRL algorithm (QDP-HRL) for continuous action spaces.
    • To address sample inefficiency in RL by incorporating selective human expert advice.
    • To enhance the learning speed and performance of agents in continuous control tasks.

    Main Methods:

    • Developed a QDP-HRL algorithm adapted to the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
    • Implemented selective human advice based on the difference between twin Q-networks' outputs.
    • Introduced an advantage loss function using expert experience and agent policy to guide critic network updates.

    Main Results:

    • QDP-HRL demonstrated significant improvements in learning speed across various continuous action space tasks.
    • The algorithm achieved enhanced overall performance compared to baseline methods.
    • Selective human intervention effectively reduced cognitive load while maximizing learning benefits.

    Conclusions:

    • QDP-HRL is an effective approach for improving sample efficiency in continuous action space RL.
    • The proposed method offers a practical way to integrate human expertise into complex learning environments.
    • This work advances HRL by extending its applicability to continuous control problems.