Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Strategies of Self-Presentation I: Strategic Self-Presentation

Strategies of Self-Presentation I: Strategic Self-Presentation

Strategic self-presentation refers to individuals' intentional efforts to influence how others perceive them. This process is employed in various social and professional settings, such as job interviews, dating, politics, and legal contexts, where individuals seek to shape impressions to gain social or material advantages. While people generally present themselves in ways that align with their authentic characteristics, external factors, such as cognitive load, can hinder their ability to...

Optimal Foraging

Optimal Foraging

How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.

Boundary Conditions: Lossless Lines

Boundary Conditions: Lossless Lines

Consider a single-phase, two-wire, lossless transmission line terminated by an impedance at the receiving end and a source with Thevenin voltage and impedance at the sending end. The line, with length, has a surge impedance and wave velocity determined by the line's inductance and capacitance.
At the receiving end, the boundary condition states that the voltage equals the product of the receiving-end impedance and current. This relationship is expressed as a function of the incident and...

Impression Management Techniques III: Aligning Actions

Impression Management Techniques III: Aligning Actions

Aligning actions are communicative strategies individuals employ to maintain social harmony and preserve personal identity in the face of potential disruptions to social norms. These actions are particularly important in managing social impressions when one's behavior might be seen as inappropriate, incompetent, or morally questionable.Types of Aligning ActionsThe three principal types of aligning actions are disclaimers, accounts, and apologies.DisclaimersDisclaimers are preventive; they are...

Self-Discrepancy Theory

Self-Discrepancy Theory

One influential perspective on what motivates people's behavior is detailed in Tory Higgin's self-discrepancy theory (Higgins, 1987). He proposed that people hold disagreeing internal representations of themselves that lead to different emotional states.

Boundary Layer Characteristics

Boundary Layer Characteristics

When a fluid encounters a solid surface, a boundary layer forms due to the interaction between the fluid's motion and the stationary surface. This phenomenon is characterized by a thin region adjacent to the surface where viscous forces dominate, influencing the fluid's velocity profile. The development of the boundary layer begins at the leading edge of the surface and evolves as the fluid moves downstream.As the fluid flows over the surface, friction between the fluid and the wall slows down...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Offline constrained policy optimization with safe anchoring.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Spatiotemporal evolution and trade-offs/synergies of ecosystem services in Hubei Province.

Scientific reports·2025

Same author

Measuring the resilience of mountain city ecological network: a methodological framework integrating real disaster shocks and simulated disturbance scenarios.

Journal of environmental management·2025

Same author

Did green infrastructure improve water purification ecosystem services in Shandong Peninsula urban agglomeration? Evidence from total phosphorus.

Journal of environmental management·2024

Same author

Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2024

Same author

Retraction Note: Changes in ecological networks and eco-environmental effects on urban ecosystem in China's typical urban agglomerations.

Environmental science and pollution research international·2024

Same journal

An Evolutionary Algorithm Assisted by an Ensemble of Pareto-Optimal Surrogate Models.

IEEE transactions on cybernetics·2026

Same journal

A Quantum Self-Attention Neural Network Model on Quantum Circuits.

IEEE transactions on cybernetics·2026

Same journal

Semi-Explicit Solution of Some Discrete-Time Higher-Order-Cost Mean-Field-Type Control.

IEEE transactions on cybernetics·2026

Same journal

A Novel One-Step Small Object Detector for Autonomous Aerial Vehicles.

IEEE transactions on cybernetics·2026

Same journal

Online Data-Driven-Based Optimal Output Tracking Control Without Initial Stabilizing Policy.

IEEE transactions on cybernetics·2026

Same journal

Digital Redesign-Based Interval State Estimation for Continuous Systems With Aperiodic Discrete Measurements.

IEEE transactions on cybernetics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 14, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Authentic Boundary Proximal Policy Optimization.

Yuhu Cheng, Longyang Huang, Xuesong Wang

IEEE Transactions on Cybernetics

|March 11, 2021

Summary

This summary is machine-generated.

Proximal Policy Optimization (PPO) algorithms are enhanced with new methods like Authentic Boundary PPO (ABPPO) to improve stability and speed. These advancements offer better theoretical understanding and practical performance in robotic control tasks.

More Related Videos

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Published on: December 9, 2012

Related Experiment Videos

Last Updated: Nov 14, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Published on: December 9, 2012

Area of Science:

Reinforcement Learning
Robotics
Machine Learning

Background:

Proximal Policy Optimization (PPO) is a widely used algorithm for complex tasks.
Theoretical understanding of PPO's clipping mechanism and its relation to Trust Region Policy Optimization (TRPO) is limited.
Existing PPO methods lack robust theoretical grounding for their performance improvements.

Purpose of the Study:

To theoretically analyze PPO's clipping operation and its connection to TRPO.
To propose novel PPO variants with improved stability and learning speed.
To validate the effectiveness of new algorithms on continuous robotic control tasks.

Main Methods:

Analysis of PPO's clipping effect on conservative policy iteration objective function.
Theoretical derivation of the relationship between PPO and TRPO.
Development of Authentic Boundary PPO (ABPPO) using an authentic boundary setting rule.
Introduction of RMABPPO and P3DABPPO incorporating rollback clipping and penalized policy difference.

Main Results:

Established a strict theoretical link between PPO and TRPO.
Demonstrated that ABPPO, RMABPPO, and P3DABPPO improve learning stability.
Showcased accelerated learning speeds in continuous robotic control tasks compared to standard PPO.
Validated the effectiveness of novel policy optimization techniques.

Conclusions:

The proposed ABPPO, RMABPPO, and P3DABPPO algorithms offer significant improvements over standard PPO.
Theoretical analysis provides a foundation for understanding PPO's clipping mechanism.
These advancements contribute to more stable and efficient reinforcement learning in robotics.