Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Masking and Demasking Agents

Masking and Demasking Agents

EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...

Collisions in Multiple Dimensions: Introduction

Collisions in Multiple Dimensions: Introduction

It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Diversification in ANME-1 archaea is associated with the presence of highly variable genomic hotspots.

Nature communications·2026

Same author

Connexin 43 promotes stemness of leukemia cells and chemoresistance in T-cell acute lymphoblastic leukemia via the RAC1/AKT/GSK3β axis.

Chinese medical journal·2026

Same author

Ultrastable Soft Capacitive Tactile Sensor with Impedance-Modulated Signal.

Soft robotics·2026

Same author

Transposable element-driven expansion of enhancer RNA repertoires underlies regulatory innovation and polyploid adaptation in cereal crops.

Plant communications·2026

Same author

GD3s-mediated lipid metabolism reprograming promotes proliferation and metastasis of melanoma.

Journal of translational medicine·2026

Same author

A Survey on Vision-Language-Action Models for Embodied AI.

IEEE transactions on neural networks and learning systems·2026

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

A Networked Desktop Virtual Reality Setup for Decision Science and Navigation Experiments with Multiple Participants

A Networked Desktop Virtual Reality Setup for Decision Science and Navigation Experiments with Multiple Participants

Published on: August 26, 2018

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.

Jianye Hao, Tianpei Yang, Hongyao Tang

IEEE Transactions on Neural Networks and Learning Systems

|April 6, 2023

Summary

This summary is machine-generated.

Deep reinforcement learning (DRL) and multiagent reinforcement learning (MARL) face sample inefficiency due to the exploration problem. This survey categorizes and compares exploration methods to improve DRL and MARL efficiency.

More Related Videos

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Related Experiment Videos

Last Updated: Aug 4, 2025

A Networked Desktop Virtual Reality Setup for Decision Science and Navigation Experiments with Multiple Participants

A Networked Desktop Virtual Reality Setup for Decision Science and Navigation Experiments with Multiple Participants

Published on: August 26, 2018

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Deep reinforcement learning (DRL) and deep multiagent reinforcement learning (MARL) show promise in AI, autonomous vehicles, and robotics.
A key limitation is sample inefficiency, requiring millions of interactions, hindering real-world deployment.
The exploration problem, or efficiently gathering informative experiences, is a major bottleneck, especially in complex environments.

Purpose of the Study:

To provide a comprehensive survey of exploration methods in single-agent and multiagent reinforcement learning.
To systematically classify existing exploration approaches.
To empirically compare different exploration methods for DRL and identify future research directions.

Main Methods:

Categorization of exploration methods into uncertainty-oriented and intrinsic motivation-oriented approaches.
Inclusion of other notable exploration techniques.
Algorithmic analysis and a unified empirical comparison of DRL exploration methods on standard benchmarks.

Main Results:

Identification of key challenges in efficient exploration for DRL and MARL.
Systematic review and classification of existing exploration strategies.
Empirical evaluation highlighting the performance of various exploration techniques.

Conclusions:

Exploration remains a critical challenge in DRL and MARL.
The survey provides a foundation for understanding and advancing exploration strategies.
Future research should focus on addressing open problems and developing more efficient exploration techniques.