Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Effects of feedback

Effects of feedback

Feedback in control systems plays a critical role in shaping various operational parameters, extending beyond simple error reduction to influence stability, bandwidth, gain, impedance, and sensitivity. Understanding these effects requires examining a basic feedback system characterized by defined input, output, error, and feedback signals.
Feedback significantly modifies the gain of a control system. The gain of a system without feedback is altered by a factor of one plus GH, where G represents...

Propagation of Uncertainty from Random Error

Propagation of Uncertainty from Random Error

An experiment often consists of more than a single step. In this case, measurements at each step give rise to uncertainty. Because the measurements occur in successive steps, the uncertainty in one step necessarily contributes to that in the subsequent step. As we perform statistical analysis on these types of experiments, we must learn to account for the propagation of uncertainty from one step to the next. The propagation of uncertainty depends on the type of arithmetic operation performed on...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Time-Domain Interpretation of PD Control

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...

Propagation of Uncertainty from Systematic Error

Propagation of Uncertainty from Systematic Error

The atomic mass of an element varies due to the relative ratio of its isotopes. A sample's relative proportion of oxygen isotopes influences its average atomic mass. For instance, if we were to measure the atomic mass of oxygen from a sample, the mass would be a weighted average of the isotopic masses of oxygen in that sample. Since a single sample is not likely to perfectly reflect the true atomic mass of oxygen for all the molecules of oxygen on Earth, the mass we obtain from this...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Transition metal-coordinated metastable [MoS<sub>4</sub>]<sup>2-</sup> cluster for SO<sub>2</sub>-facilitated gaseous mercury adsorption from wet flue gas.

Journal of environmental sciences (China)·2026

Same author

The first-in-human ENCIT01 trial comparing second- versus third-generation L1CAM-specific CAR T cells in patients with primary refractory or relapsed neuroblastoma.

Clinical cancer research : an official journal of the American Association for Cancer Research·2026

Same author

Seroprevalence and associated risk factors for feline panleukopenia virus infection among managed giant pandas in China.

Veterinary research·2026

Same author

A novel chemical engineering system design for synergistic SO<sub>2</sub> reduction and CH<sub>4</sub>/CO<sub>2</sub> reforming.

Environmental research·2026

Same author

Machine learning model-guided selective use of temporary diverting ileostomy in rectal cancer surgery: a randomized controlled trial.

Nature communications·2026

Same author

Generation of HBV cccDNA using single-stranded M13 phage DNA for authentic minichromosome functionality.

Journal of virology·2026

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Videos

Enhancing Stability of Probabilistic Model-Based Reinforcement Learning by Adaptive Noise Filtering.

Wenjun Huang, Xinrui Yue, Yidong Chen

IEEE Transactions on Neural Networks and Learning Systems

|March 17, 2026

Summary

This summary is machine-generated.

Stabilized Model-Based Policy Optimization (SMBPO) enhances reinforcement learning by filtering model prediction noises and clipping values. This approach significantly boosts learning efficiency and performance in complex control tasks.

Related Experiment Videos

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Current probabilistic model-based reinforcement learning (MBRL) methods face challenges with stability and efficiency due to imperfect models.
Model bias and prediction noise can negatively impact policy learning and overall performance.

Purpose of the Study:

To introduce Stabilized Model-Based Policy Optimization (SMBPO) for improved stability and efficiency in MBRL.
To address noise and bias issues inherent in probabilistic model-based approaches.

Main Methods:

SMBPO adaptively refines dimensions with abnormal prediction distributions to stabilize probabilistic model training.
It clips predicted states and estimated value functions to mitigate model bias effects on policy learning.
Batch Normalization (BN) is integrated to enhance learning efficiency.

Main Results:

Evaluations on MuJoCo control benchmarks and a dexterous hand task demonstrated SMBPO's effectiveness.
SMBPO achieved a 90% reduction in training time compared to baselines.
The method resulted in 50% more cumulative rewards than state-of-the-art model-free and MBRL approaches.

Conclusions:

SMBPO offers a stable and efficient solution for model-based reinforcement learning.
The technique significantly enhances learning speed and cumulative rewards.
SMBPO extends the practical applicability of MBRL in complex robotic control scenarios.