Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Time-Domain Interpretation of PD Control

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...

Field Procedure for Staking Out Curves

Field Procedure for Staking Out Curves

Staking out curves is an essential process in construction to ensure the accurate alignment of structures along a curved path. This task involves positioning stakes at calculated locations corresponding to the curve's design, effectively translating plans into physical markers in the field. The process begins by determining the geometric parameters of the curve, including the radius, central angle, and tangent distances. These parameters are critical for identifying key points such as the...

Mitral Stenosis III: Medical Management

Mitral Stenosis III: Medical Management

Mitral stenosis, a condition marked by the narrowing of the mitral valve, necessitates an integrated approach for effective management. This approach includes preventative measures, medical therapy, and surgical interventions to reduce symptoms and prevent complications.PreventionPrevention of mitral stenosis primarily focuses on reducing the incidence of bacterial infections, particularly streptococcal infections, which can lead to rheumatic fever and subsequent valvular damage. Timely...

Woodward–Hoffmann Selection Rules and Microscopic Reversibility

Woodward–Hoffmann Selection Rules and Microscopic Reversibility

Electrocyclic reactions, cycloadditions, and sigmatropic rearrangements are concerted pericyclic reactions that proceed via a cyclic transition state. These reactions are stereospecific and regioselective. The stereochemistry of the products depends on the symmetry characteristics of the interacting orbitals and the reaction conditions. Accordingly, pericyclic reactions are classified as either symmetry-allowed or symmetry-forbidden. Woodward and Hoffmann presented the selection criteria for...

Reducing Line Loss

Reducing Line Loss

In a three-phase circuit, line loss is an indicator of energy dissipated as heat due to the resistance of transmission lines. To address this, incorporating transformers into the system—a step-up transformer at the source and a step-down transformer at the load—is a strategic solution. Two three-phase transformers are introduced to improve this.
With a step-up transformer at the source, the voltage is increased, thereby reducing the current in the transmission lines since power loss...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Flexible multilayer film structure for visible-laser-infrared compatible camouflage.

Optics letters·2025

Same author

FactVAE: a factorized variational autoencoder for single-cell multi-omics data integration analysis.

Briefings in bioinformatics·2025

Same author

Unraveling the molecular mechanisms underlying flat stem formation in Atractylodes lancea in response to phytoplasmas.

Physiologia plantarum·2025

Same author

Leaf physiological and endophytic microbial community characteristics and interactions of different scions grafted onto Malus sieversii.

Tree physiology·2025

Same author

Description of three new species of Spartaeini (Araneae, Salticidae, Spartaeinae) from China and Malaysia.

ZooKeys·2025

Same author

Optical transparent metamaterial emitter with multiband compatible camouflage based on femtosecond laser processing.

Nanophotonics (Berlin, Germany)·2025

Same journal

An Evolutionary Algorithm Assisted by an Ensemble of Pareto-Optimal Surrogate Models.

IEEE transactions on cybernetics·2026

Same journal

A Quantum Self-Attention Neural Network Model on Quantum Circuits.

IEEE transactions on cybernetics·2026

Same journal

Semi-Explicit Solution of Some Discrete-Time Higher-Order-Cost Mean-Field-Type Control.

IEEE transactions on cybernetics·2026

Same journal

A Novel One-Step Small Object Detector for Autonomous Aerial Vehicles.

IEEE transactions on cybernetics·2026

Same journal

Online Data-Driven-Based Optimal Output Tracking Control Without Initial Stabilizing Policy.

IEEE transactions on cybernetics·2026

Same journal

Digital Redesign-Based Interval State Estimation for Continuous Systems With Aperiodic Discrete Measurements.

IEEE transactions on cybernetics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 23, 2025

Author Spotlight: Development of a Minimally Invasive Large-Animal Model for Reliable and Reproducible Cardiovascular Research

Author Spotlight: Development of a Minimally Invasive Large-Animal Model for Reliable and Reproducible Cardiovascular Research

Published on: October 20, 2023

Anti-Martingale Proximal Policy Optimization.

Yang Gu, Yuhu Cheng, Kun Yu

IEEE Transactions on Cybernetics

|May 13, 2022

Summary

This summary is machine-generated.

This study introduces an anti-martingale (AM) reinforcement learning framework to improve sample efficiency in on-policy deep reinforcement learning (DRL). The AM proximal policy optimization (AMPPO) method enhances data selection for faster, more effective policy updates.

More Related Videos

Author Spotlight: Developing a Safer and More Efficient Treatment Protocol for Wasting Marmoset Syndrome (WMS)

Author Spotlight: Developing a Safer and More Efficient Treatment Protocol for Wasting Marmoset Syndrome (WMS)

Published on: July 12, 2024

A Structured Rehabilitation Protocol for Improved Multifunctional Prosthetic Control: A Case Study

A Structured Rehabilitation Protocol for Improved Multifunctional Prosthetic Control: A Case Study

Published on: November 6, 2015

Related Experiment Videos

Last Updated: Sep 23, 2025

Author Spotlight: Development of a Minimally Invasive Large-Animal Model for Reliable and Reproducible Cardiovascular Research

Author Spotlight: Development of a Minimally Invasive Large-Animal Model for Reliable and Reproducible Cardiovascular Research

Published on: October 20, 2023

Author Spotlight: Developing a Safer and More Efficient Treatment Protocol for Wasting Marmoset Syndrome (WMS)

Author Spotlight: Developing a Safer and More Efficient Treatment Protocol for Wasting Marmoset Syndrome (WMS)

Published on: July 12, 2024

A Structured Rehabilitation Protocol for Improved Multifunctional Prosthetic Control: A Case Study

A Structured Rehabilitation Protocol for Improved Multifunctional Prosthetic Control: A Case Study

Published on: November 6, 2015

Area of Science:

Artificial Intelligence
Machine Learning
Reinforcement Learning

Background:

On-policy deep reinforcement learning (DRL) requires high sample efficiency due to single-use data for parameter updates.
Accelerating DRL training necessitates improved methods for utilizing exploration data.

Purpose of the Study:

To develop a novel framework for efficient sample selection in on-policy DRL.
To enhance the training speed and performance of DRL algorithms.

Main Methods:

Proposed a submartingale criterion based on the optimal policy-martingale equivalence.
Developed an advanced value iteration (AVI) method for accurate value iteration.
Introduced an anti-martingale (AM) reinforcement learning framework for effective sample selection.
Integrated the AM framework with proximal policy optimization (PPO) into the AM proximal policy optimization (AMPPO) method.

Main Results:

AMPPO accelerates the state value updating process while adhering to the submartingale criterion.
Experimental results on the Mujoco platform demonstrate superior performance of AMPPO compared to state-of-the-art DRL methods.

Conclusions:

The proposed AMPPO method significantly improves sample efficiency in on-policy DRL.
AMPPO offers a promising approach for accelerating DRL training and achieving better performance.