Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Behavior Modification01:21

Behavior Modification

236
Behavioral approaches have often been criticized for ignoring mental processes and focusing solely on observable behavior. However, these approaches provide an optimistic perspective for individuals seeking to change their behaviors. Rather than concentrating on intrinsic personality traits, behavioral approaches suggest that even longstanding habits can be modified by changing the reward contingencies that maintain them.
A real-world application of operant conditioning principles is applied...
236
Regression Toward the Mean01:52

Regression Toward the Mean

6.5K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.5K
Naturalistic Observations02:30

Naturalistic Observations

15.9K
If you want to understand how behavior occurs, one of the best ways to gain information is to simply observe the behavior in its natural context. However, people might change their behavior in unexpected ways if they know they are being observed. How do researchers obtain accurate information when people tend to hide their natural behavior? As an example, imagine that your professor asks everyone in your class to raise their hand if they always wash their hands after using the restroom. Chances...
15.9K
Propagation of Action Potentials01:23

Propagation of Action Potentials

6.9K
The propagation of an action potential refers to the process by which a nerve impulse, or "action potential," travels along a neuron.
Neurons (nerve cells) have a resting membrane potential, with a slightly negative charge inside compared to outside. This is maintained by ion channels, such as sodium (Na+) and potassium (K+) channels, which control the flow of ions. When a stimulus, like a touch or a signal from another neuron, triggers the neuron, sodium channels open, allowing sodium ions to...
6.9K
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

101
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
101
Decision Making: P-value Method01:09

Decision Making: P-value Method

5.7K
The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim  is also stated. These statements can act as null and alternative hypotheses:  a null hypothesis would be a neutral statement while the alternative hypothesis can...
5.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Growth strategy of <i>Juniperus tibetica</i> ancient clusters under high-altitude and cold conditions in western Xizang, China.

Ying yong sheng tai xue bao = The journal of applied ecology·2026
Same authorSame journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
Same author

Dendritic nonlinearities mitigate communication costs.

Patterns (New York, N.Y.)·2026
Same author

Recent Advances in Neoadjuvant Treatment of Anaplastic Thyroid Carcinoma: A Narrative Review.

Current treatment options in oncology·2026
Same author

Extruded biodegradable Zn-5Cu alloys with integrated osteoimmunomodulatory, antibacterial, and anti-osteolytic properties for patellar fracture suture repair.

Acta biomaterialia·2026
Same author

Task-Dependent Cortico-Spinal Coupling in the Delta Band During Movement Execution and Inhibitory Control.

IEEE transactions on bio-medical engineering·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Sep 14, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
11:54

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

4.7K

Off-OAB: Off-Policy Policy Gradient Method With Optimal Action-Dependent Baseline.

Wenjia Meng, Qian Zheng, Long Yang

    IEEE Transactions on Neural Networks and Learning Systems
    |July 24, 2025
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces an off-policy policy gradient method with an optimal action-dependent baseline (Off-OAB) to reduce variance in reinforcement learning training. Off-OAB improves sample efficiency and outperforms existing methods on benchmark tasks.

    More Related Videos

    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
    08:18

    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

    Published on: August 15, 2020

    5.0K
    Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
    13:04

    Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

    Published on: September 19, 2012

    12.2K

    Related Experiment Videos

    Last Updated: Sep 14, 2025

    Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
    11:54

    Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

    Published on: May 8, 2021

    4.7K
    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
    08:18

    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

    Published on: August 15, 2020

    5.0K
    Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
    13:04

    Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

    Published on: September 19, 2012

    12.2K

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Robotics

    Background:

    • Policy-based methods are successful in reinforcement learning (RL).
    • Off-policy policy gradient (OPPG) methods leverage off-policy data but suffer from high variance.
    • High variance leads to poor sample efficiency in training.

    Purpose of the Study:

    • To propose a novel off-policy policy gradient method to mitigate variance.
    • To introduce an optimal action-dependent baseline (Off-OAB) for unbiased and low-variance OPPG estimation.
    • To enhance computational efficiency through an approximated optimal baseline.

    Main Methods:

    • Developed an optimal action-dependent baseline (Off-OAB) for OPPG.
    • Theoretically minimized variance while maintaining unbiasedness.
    • Designed an approximated version of the optimal baseline for practical efficiency.

    Main Results:

    • The Off-OAB method demonstrably reduces OPPG estimator variance.
    • Evaluated on six OpenAI Gym and MuJoCo tasks.
    • Outperformed state-of-the-art methods on most tasks.

    Conclusions:

    • The proposed Off-OAB method effectively reduces variance in off-policy policy gradient estimation.
    • Off-OAB enhances sample efficiency and performance in challenging RL tasks.
    • The approximated baseline ensures practical computational efficiency.