Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

PI Controller: Design

PI Controller: Design

Proportional Integral (PI) controllers are a fundamental component in modern control systems, widely used to enhance performance and mitigate steady-state errors. They are particularly effective in applications such as automatic brightness adjustment on smartphones, where they excel at mitigating steady-state errors for step-function inputs. Unlike PD controllers, which require time-varying errors to function optimally, PI controllers leverage their integral component to address residual...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Learning to Super-Resolve Face Images via Dual-Domain Multi-scale Feature Interaction.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Effectiveness of heterologous mRNA vaccine boosters during an Omicron wave of COVID-19: a cross-sectional study in Macao (China).

Journal of thoracic disease·2026

Same author

Fast BCIs: Leveraging Dual-Scale Time Windows with Test-Time Adaptation to Enhance Accuracy.

IEEE transactions on bio-medical engineering·2026

Same author

Riemannian Acceleration for Sparse PCA With Separable Structure and Second-Order Information Exploration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Hierarchical memory-based deep reinforcement learning in simulated survival environments.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

A New Human-Likeness and Comfort Index for Robot Movements Along Prescribed Paths.

IEEE transactions on cybernetics·2026

Same journal

Robust Semiglobal and Global Stabilization for Nonlinear Normal Form Systems by Time-Varying Feedback.

IEEE transactions on cybernetics·2026

Same journal

Adaptive Global Asymptotic Output Stabilization of Uncertain Nonlinear Systems Under Dynamic State/Input Quantization.

IEEE transactions on cybernetics·2026

Same journal

Accelerated Distributed Gradient Tracking for Constrained Aggregative Optimization Over Time-Varying Digraphs.

IEEE transactions on cybernetics·2026

Same journal

Small-Gain-Based Plug-and-Play Distributed Control Framework for DC Microgrids With Decentralized Reconfiguration.

IEEE transactions on cybernetics·2026

Same journal

Prescribed-Time Impulsive Control of High-Order Integrator Systems.

IEEE transactions on cybernetics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 6, 2025

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Integrated Double Estimator Architecture for Reinforcement Learning.

Pingli Lv, Xuesong Wang, Yuhu Cheng

IEEE Transactions on Cybernetics

|October 7, 2020

Summary

This summary is machine-generated.

We introduce an Integrated Double Estimator (IDE) to balance overestimation and underestimation in reinforcement learning (RL). This novel approach, implemented in Integrated Deep Q-Network (IDDQN), effectively reduces estimation bias for more stable and improved RL performance.

Related Experiment Videos

Last Updated: Dec 6, 2025

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Area of Science:

Artificial Intelligence
Machine Learning
Reinforcement Learning

Background:

Reinforcement learning (RL) algorithms like Q-learning and Deep Q-Networks (DQN) often exhibit estimation bias, specifically overestimation, due to the maximum operation in value estimation.
Double Q-learning (DQ) and Double DQN mitigate overestimation but can introduce underestimation by employing a double estimator (DE).

Purpose of the Study:

To propose a novel Integrated Double Estimator (IDE) architecture that balances overestimation and underestimation in RL.
To introduce two new RL algorithms, Integrated DQ (IDQ) and Integrated DQN (IDDQN), based on the IDE architecture.
To theoretically analyze estimation bias and prove the unbiasedness of IDE and convergence of IDQ.

Main Methods:

Proposed an Integrated Double Estimator (IDE) by combining maximum and DE operations for estimating maximum expected action values.
Developed two RL algorithms: Integrated DQ (IDQ) and its deep network version, Integrated DQN (IDDQN).
Employed stochastic action selection using one estimator and convex combination of two estimators for action evaluation to eliminate estimation bias.

Main Results:

Theoretically analyzed the causes of estimation bias in RL and underestimation in DQ.
Proved the unbiasedness of the proposed IDE and the convergence of the IDQ algorithm.
Experimental results on grid world and Atari 2600 games demonstrated that IDQ and IDDQN effectively reduce or eliminate estimation bias.

Conclusions:

IDQ and IDDQN significantly improve learning stability and balance by mitigating estimation bias.
The proposed IDE architecture offers a promising direction for developing more robust and effective RL algorithms.
Experiments confirm the practical efficacy of IDQ and IDDQN in enhancing RL performance across various domains.