Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Reinforcement01:23

Reinforcement

400
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
400
Reinforcement Schedules01:24

Reinforcement Schedules

249
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
249
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

1.9K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
1.9K
Observational Learning01:12

Observational Learning

348
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
348
Automatic Processing and Automatic Social Behavior01:28

Automatic Processing and Automatic Social Behavior

7
Automatic processing refers to the cognitive operations that occur without conscious intent or awareness, playing a fundamental role in shaping social cognition and behavior. These processes enable individuals to navigate complex social environments efficiently by relying on mental shortcuts and pre-existing knowledge structures known as schemas. One of the most influential mechanisms underlying automatic processing is priming, which subtly activates mental representations through exposure to...
7

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE.

Frontiers in robotics and AI·2025
Same author

Resilient Supervisory Multi-Agent Systems.

IEEE transactions on robotics : a publication of the IEEE Robotics and Automation Society·2024
Same author

Cooperative planning for physically interacting heterogeneous robots.

Frontiers in robotics and AI·2024
Same author

PAC Reinforcement Learning Algorithm for General-Sum Markov Games.

IEEE transactions on automatic control·2023
Same author

Editorial: Thought leaders in robotics and AI.

Frontiers in robotics and AI·2023
Same author

Design and Construction of Unmanned Ground Vehicles for Sub-canopy Plant Phenotyping.

Methods in molecular biology (Clifton, N.J.)·2022
Same journal

Editorial: Synergizing large language models and computational intelligence for advanced robotic systems.

Frontiers in robotics and AI·2026
Same journal

Editorial: Innovations in industry 4.0: advancing mobility and manipulation in robotics.

Frontiers in robotics and AI·2026
Same journal

MPM-based simulation and bounded-error compression of material points for magnetic tactile sensors.

Frontiers in robotics and AI·2026
Same journal

Torque-sensorless control of a high-ratio, backdrivable Wolfrom-gearbox for safe human-centered robotics.

Frontiers in robotics and AI·2026
Same journal

The implications of robot navigation in social space: perceptual effects of socially aware and baseline navigation.

Frontiers in robotics and AI·2026
Same journal

DPTG: diffusion policy with tactile feasibility guidance.

Frontiers in robotics and AI·2026
See all related articles

Related Experiment Video

Updated: Sep 27, 2025

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks
11:18

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Published on: March 2, 2015

10.5K

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Ashkan Zehfroosh1, Herbert G Tanner1

  • 1Cooperative Robotics Lab, Department of Mechanical Engineering, University of Delaware, Newark, DE, United States.

Frontiers in Robotics and AI
|April 8, 2022
PubMed
Summary
This summary is machine-generated.

A new hybrid reinforcement learning (RL) algorithm, Dyna-Delayed Q-learning (DDQ), combines model-based and model-free approaches for Markov decision processes (MDPs). DDQ demonstrates superior sample efficiency and performance in applications, including pediatric motor rehabilitation.

Keywords:
human-robot interactionmarkov decision processprobably approximately correctreinforcement learningsample complexity

More Related Videos

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots
11:01

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

Published on: November 24, 2015

13.3K
Investigating Motor Skill Learning Processes with a Robotic Manipulandum
07:52

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

8.8K

Related Experiment Videos

Last Updated: Sep 27, 2025

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks
11:18

Closed-loop Neuro-robotic Experiments to Test Computational Properties of Neuronal Networks

Published on: March 2, 2015

10.5K
SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots
11:01

SSVEP-based Experimental Procedure for Brain-Robot Interaction with Humanoid Robots

Published on: November 24, 2015

13.3K
Investigating Motor Skill Learning Processes with a Robotic Manipulandum
07:52

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

8.8K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Robotics

Background:

  • Reinforcement learning (RL) algorithms for Markov decision processes (MDPs) often specialize as either model-based or model-free.
  • Existing methods present trade-offs between sample efficiency and performance.
  • A need exists for hybrid approaches that leverage the strengths of both methodologies.

Purpose of the Study:

  • To introduce a novel hybrid probably approximately correct (PAC) reinforcement learning algorithm.
  • To combine the advantages of model-free Delayed Q-learning and model-based R-max algorithms.
  • To analyze the theoretical properties and practical performance of the proposed algorithm.

Main Methods:

  • Development of the Dyna-Delayed Q-learning (DDQ) algorithm, a hybrid PAC-RL approach.
  • Conducting a PAC analysis to derive the sample complexity of DDQ.
  • Performing numerical simulations to compare DDQ against established PAC model-free and model-based algorithms.
  • Implementing DDQ in a real-world pediatric motor rehabilitation setting using infant-robot interaction.

Main Results:

  • The DDQ algorithm integrates model-free and model-based RL techniques effectively.
  • DDQ consistently outperforms its constituent algorithms (Delayed Q-learning, R-max) in most scenarios.
  • The algorithm exhibits superior sample efficiency compared to existing state-of-the-art PAC RL methods.
  • Successful experimental validation in a pediatric motor rehabilitation context demonstrates practical utility.

Conclusions:

  • The DDQ algorithm represents a significant advancement in PAC reinforcement learning for MDPs.
  • Hybrid approaches can effectively bridge the gap between model-based and model-free RL.
  • DDQ shows promise for improving sample efficiency and performance in complex real-world applications, such as assistive robotics.