Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cross-sector deep learning scales life cycle assessment using unified textual descriptions.

Environmental science and ecotechnology·2026

Same author

[Retracted] Acaricidal activity of extracts from <i>Ligularia virgaurea</i> against the <i>Sarcoptes scabiei</i> mite <i>in vitro</i>.

Experimental and therapeutic medicine·2026

Same author

Recent Advances on Off-Policy Reinforcement Learning for Optimization Control.

IEEE transactions on cybernetics·2026

Same author

Optimal cooperative output regulation with norm-based performance specifications.

ISA transactions·2026

Same author

Collaborative Diagnosis of Spatiotemporal Faults and Sensor Anomalies in Parabolic Distributed Parameter Systems.

IEEE transactions on cybernetics·2026

Same author

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

Same journal

Self-Supervised Continuous Dynamic Graph Representation Learning via Hawkes Processes.

IEEE transactions on neural networks and learning systems·2026

Same journal

cPU: Consistent Risk Estimator for Positive-Unlabeled Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Tuning-Free Latent Diffusion Models for Ultrahigh-Resolution Image Editing.

IEEE transactions on neural networks and learning systems·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 24, 2025

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.

Biao Luo, Zhengke Wu, Fei Zhou

IEEE Transactions on Neural Networks and Learning Systems

|July 7, 2023

Summary

This summary is machine-generated.

This study introduces a new human-in-the-loop reinforcement learning (HRL) algorithm for continuous action spaces. The Q value-dependent policy (QDP)-HRL method enhances learning speed and performance by selectively using expert advice.

More Related Videos

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Related Experiment Videos

Last Updated: Jul 24, 2025

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Reinforcement learning (RL) often suffers from sample inefficiency.
Existing human-in-the-loop RL (HRL) methods primarily address discrete action spaces.
Continuous action spaces present unique challenges for efficient learning and human guidance.

Purpose of the Study:

To propose a novel Q value-dependent policy (QDP)-based HRL algorithm (QDP-HRL) for continuous action spaces.
To address sample inefficiency in RL by incorporating selective human expert advice.
To enhance the learning speed and performance of agents in continuous control tasks.

Main Methods:

Developed a QDP-HRL algorithm adapted to the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
Implemented selective human advice based on the difference between twin Q-networks' outputs.
Introduced an advantage loss function using expert experience and agent policy to guide critic network updates.

Main Results:

QDP-HRL demonstrated significant improvements in learning speed across various continuous action space tasks.
The algorithm achieved enhanced overall performance compared to baseline methods.
Selective human intervention effectively reduced cognitive load while maximizing learning benefits.

Conclusions:

QDP-HRL is an effective approach for improving sample efficiency in continuous action space RL.
The proposed method offers a practical way to integrate human expertise into complex learning environments.
This work advances HRL by extending its applicability to continuous control problems.