Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Automatic Processing and Automatic Social Behavior

Automatic Processing and Automatic Social Behavior

Automatic processing refers to the cognitive operations that occur without conscious intent or awareness, playing a fundamental role in shaping social cognition and behavior. These processes enable individuals to navigate complex social environments efficiently by relying on mental shortcuts and pre-existing knowledge structures known as schemas. One of the most influential mechanisms underlying automatic processing is priming, which subtly activates mental representations through exposure to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE.

Frontiers in robotics and AI·2025

Same author

Resilient Supervisory Multi-Agent Systems.

IEEE transactions on robotics : a publication of the IEEE Robotics and Automation Society·2024

Same author

Cooperative planning for physically interacting heterogeneous robots.

Frontiers in robotics and AI·2024

Same author

PAC Reinforcement Learning Algorithm for General-Sum Markov Games.

IEEE transactions on automatic control·2023

Same author

Editorial: Thought leaders in robotics and AI.

Frontiers in robotics and AI·2023

Same author

Design and Construction of Unmanned Ground Vehicles for Sub-canopy Plant Phenotyping.

Methods in molecular biology (Clifton, N.J.)·2022

Same journal

Editorial: Synergizing large language models and computational intelligence for advanced robotic systems.

Frontiers in robotics and AI·2026

Same journal

Editorial: Innovations in industry 4.0: advancing mobility and manipulation in robotics.

Frontiers in robotics and AI·2026

Same journal

MPM-based simulation and bounded-error compression of material points for magnetic tactile sensors.

Frontiers in robotics and AI·2026

Same journal

Torque-sensorless control of a high-ratio, backdrivable Wolfrom-gearbox for safe human-centered robotics.

Frontiers in robotics and AI·2026

Same journal

The implications of robot navigation in social space: perceptual effects of socially aware and baseline navigation.

Frontiers in robotics and AI·2026

Same journal

DPTG: diffusion policy with tactile feasibility guidance.

Frontiers in robotics and AI·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 27, 2025

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Published on: March 2, 2015

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Ashkan Zehfroosh¹, Herbert G Tanner¹

¹Cooperative Robotics Lab, Department of Mechanical Engineering, University of Delaware, Newark, DE, United States.

Frontiers in Robotics and AI

|April 8, 2022

Summary

This summary is machine-generated.

A new hybrid reinforcement learning (RL) algorithm, Dyna-Delayed Q-learning (DDQ), combines model-based and model-free approaches for Markov decision processes (MDPs). DDQ demonstrates superior sample efficiency and performance in applications, including pediatric motor rehabilitation.

Keywords:

human-robot interaction markov decision process probably approximately correct reinforcement learning sample complexity

More Related Videos

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

Published on: November 24, 2015

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Related Experiment Videos

Last Updated: Sep 27, 2025

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Published on: March 2, 2015

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

Published on: November 24, 2015

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Reinforcement learning (RL) algorithms for Markov decision processes (MDPs) often specialize as either model-based or model-free.
Existing methods present trade-offs between sample efficiency and performance.
A need exists for hybrid approaches that leverage the strengths of both methodologies.

Purpose of the Study:

To introduce a novel hybrid probably approximately correct (PAC) reinforcement learning algorithm.
To combine the advantages of model-free Delayed Q-learning and model-based R-max algorithms.
To analyze the theoretical properties and practical performance of the proposed algorithm.

Main Methods:

Development of the Dyna-Delayed Q-learning (DDQ) algorithm, a hybrid PAC-RL approach.
Conducting a PAC analysis to derive the sample complexity of DDQ.
Performing numerical simulations to compare DDQ against established PAC model-free and model-based algorithms.
Implementing DDQ in a real-world pediatric motor rehabilitation setting using infant-robot interaction.

Main Results:

The DDQ algorithm integrates model-free and model-based RL techniques effectively.
DDQ consistently outperforms its constituent algorithms (Delayed Q-learning, R-max) in most scenarios.
The algorithm exhibits superior sample efficiency compared to existing state-of-the-art PAC RL methods.
Successful experimental validation in a pediatric motor rehabilitation context demonstrates practical utility.

Conclusions:

The DDQ algorithm represents a significant advancement in PAC reinforcement learning for MDPs.
Hybrid approaches can effectively bridge the gap between model-based and model-free RL.
DDQ shows promise for improving sample efficiency and performance in complex real-world applications, such as assistive robotics.