Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

297
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
297
Observational Learning01:12

Observational Learning

697
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
697
Reinforcement01:23

Reinforcement

669
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
669
PI Controller: Design01:24

PI Controller: Design

948
Proportional Integral (PI) controllers are a fundamental component in modern control systems, widely used to enhance performance and mitigate steady-state errors. They are particularly effective in applications such as automatic brightness adjustment on smartphones, where they excel at mitigating steady-state errors for step-function inputs. Unlike PD controllers, which require time-varying errors to function optimally, PI controllers leverage their integral component to address residual...
948
Reinforcement Schedules01:24

Reinforcement Schedules

359
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
359
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

980
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
980

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Learning to Super-Resolve Face Images via Dual-Domain Multi-scale Feature Interaction.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Effectiveness of heterologous mRNA vaccine boosters during an Omicron wave of COVID-19: a cross-sectional study in Macao (China).

Journal of thoracic disease·2026
Same author

Fast BCIs: Leveraging Dual-Scale Time Windows with Test-Time Adaptation to Enhance Accuracy.

IEEE transactions on bio-medical engineering·2026
Same author

Riemannian Acceleration for Sparse PCA With Separable Structure and Second-Order Information Exploration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Hierarchical memory-based deep reinforcement learning in simulated survival environments.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

A New Human-Likeness and Comfort Index for Robot Movements Along Prescribed Paths.

IEEE transactions on cybernetics·2026
Same journal

Robust Semiglobal and Global Stabilization for Nonlinear Normal Form Systems by Time-Varying Feedback.

IEEE transactions on cybernetics·2026
Same journal

Adaptive Global Asymptotic Output Stabilization of Uncertain Nonlinear Systems Under Dynamic State/Input Quantization.

IEEE transactions on cybernetics·2026
Same journal

Accelerated Distributed Gradient Tracking for Constrained Aggregative Optimization Over Time-Varying Digraphs.

IEEE transactions on cybernetics·2026
Same journal

Small-Gain-Based Plug-and-Play Distributed Control Framework for DC Microgrids With Decentralized Reconfiguration.

IEEE transactions on cybernetics·2026
Same journal

Prescribed-Time Impulsive Control of High-Order Integrator Systems.

IEEE transactions on cybernetics·2026
See all related articles

Related Experiment Video

Updated: Dec 6, 2025

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
05:41

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

9.7K

Integrated Double Estimator Architecture for Reinforcement Learning.

Pingli Lv, Xuesong Wang, Yuhu Cheng

    IEEE Transactions on Cybernetics
    |October 7, 2020
    PubMed
    Summary
    This summary is machine-generated.

    We introduce an Integrated Double Estimator (IDE) to balance overestimation and underestimation in reinforcement learning (RL). This novel approach, implemented in Integrated Deep Q-Network (IDDQN), effectively reduces estimation bias for more stable and improved RL performance.

    Related Experiment Videos

    Last Updated: Dec 6, 2025

    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
    05:41

    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

    Published on: February 6, 2020

    9.7K

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Reinforcement Learning

    Background:

    • Reinforcement learning (RL) algorithms like Q-learning and Deep Q-Networks (DQN) often exhibit estimation bias, specifically overestimation, due to the maximum operation in value estimation.
    • Double Q-learning (DQ) and Double DQN mitigate overestimation but can introduce underestimation by employing a double estimator (DE).

    Purpose of the Study:

    • To propose a novel Integrated Double Estimator (IDE) architecture that balances overestimation and underestimation in RL.
    • To introduce two new RL algorithms, Integrated DQ (IDQ) and Integrated DQN (IDDQN), based on the IDE architecture.
    • To theoretically analyze estimation bias and prove the unbiasedness of IDE and convergence of IDQ.

    Main Methods:

    • Proposed an Integrated Double Estimator (IDE) by combining maximum and DE operations for estimating maximum expected action values.
    • Developed two RL algorithms: Integrated DQ (IDQ) and its deep network version, Integrated DQN (IDDQN).
    • Employed stochastic action selection using one estimator and convex combination of two estimators for action evaluation to eliminate estimation bias.

    Main Results:

    • Theoretically analyzed the causes of estimation bias in RL and underestimation in DQ.
    • Proved the unbiasedness of the proposed IDE and the convergence of the IDQ algorithm.
    • Experimental results on grid world and Atari 2600 games demonstrated that IDQ and IDDQN effectively reduce or eliminate estimation bias.

    Conclusions:

    • IDQ and IDDQN significantly improve learning stability and balance by mitigating estimation bias.
    • The proposed IDE architecture offers a promising direction for developing more robust and effective RL algorithms.
    • Experiments confirm the practical efficacy of IDQ and IDDQN in enhancing RL performance across various domains.