Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Behavior Modification

Behavior Modification

Behavioral approaches have often been criticized for ignoring mental processes and focusing solely on observable behavior. However, these approaches provide an optimistic perspective for individuals seeking to change their behaviors. Rather than concentrating on intrinsic personality traits, behavioral approaches suggest that even longstanding habits can be modified by changing the reward contingencies that maintain them.
A real-world application of operant conditioning principles is applied...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Naturalistic Observations

Naturalistic Observations

If you want to understand how behavior occurs, one of the best ways to gain information is to simply observe the behavior in its natural context. However, people might change their behavior in unexpected ways if they know they are being observed. How do researchers obtain accurate information when people tend to hide their natural behavior? As an example, imagine that your professor asks everyone in your class to raise their hand if they always wash their hands after using the restroom. Chances...

Propagation of Action Potentials

Propagation of Action Potentials

The propagation of an action potential refers to the process by which a nerve impulse, or "action potential," travels along a neuron.
Neurons (nerve cells) have a resting membrane potential, with a slightly negative charge inside compared to outside. This is maintained by ion channels, such as sodium (Na+) and potassium (K+) channels, which control the flow of ions. When a stimulus, like a touch or a signal from another neuron, triggers the neuron, sodium channels open, allowing sodium ions to...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Growth strategy of <i>Juniperus tibetica</i> ancient clusters under high-altitude and cold conditions in western Xizang, China.

Ying yong sheng tai xue bao = The journal of applied ecology·2026

Same authorSame journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

Same author

Dendritic nonlinearities mitigate communication costs.

Patterns (New York, N.Y.)·2026

Same author

Recent Advances in Neoadjuvant Treatment of Anaplastic Thyroid Carcinoma: A Narrative Review.

Current treatment options in oncology·2026

Same author

Extruded biodegradable Zn-5Cu alloys with integrated osteoimmunomodulatory, antibacterial, and anti-osteolytic properties for patellar fracture suture repair.

Acta biomaterialia·2026

Same author

Task-Dependent Cortico-Spinal Coupling in the Delta Band During Movement Execution and Inhibitory Control.

IEEE transactions on bio-medical engineering·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 14, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Off-OAB: Off-Policy Policy Gradient Method With Optimal Action-Dependent Baseline.

Wenjia Meng, Qian Zheng, Long Yang

IEEE Transactions on Neural Networks and Learning Systems

|July 24, 2025

Summary

This summary is machine-generated.

This study introduces an off-policy policy gradient method with an optimal action-dependent baseline (Off-OAB) to reduce variance in reinforcement learning training. Off-OAB improves sample efficiency and outperforms existing methods on benchmark tasks.

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Related Experiment Videos

Last Updated: Sep 14, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Policy-based methods are successful in reinforcement learning (RL).
Off-policy policy gradient (OPPG) methods leverage off-policy data but suffer from high variance.
High variance leads to poor sample efficiency in training.

Purpose of the Study:

To propose a novel off-policy policy gradient method to mitigate variance.
To introduce an optimal action-dependent baseline (Off-OAB) for unbiased and low-variance OPPG estimation.
To enhance computational efficiency through an approximated optimal baseline.

Main Methods:

Developed an optimal action-dependent baseline (Off-OAB) for OPPG.
Theoretically minimized variance while maintaining unbiasedness.
Designed an approximated version of the optimal baseline for practical efficiency.

Main Results:

The Off-OAB method demonstrably reduces OPPG estimator variance.
Evaluated on six OpenAI Gym and MuJoCo tasks.
Outperformed state-of-the-art methods on most tasks.

Conclusions:

The proposed Off-OAB method effectively reduces variance in off-policy policy gradient estimation.
Off-OAB enhances sample efficiency and performance in challenging RL tasks.
The approximated baseline ensures practical computational efficiency.