Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Collisions in Multiple Dimensions: Introduction

Collisions in Multiple Dimensions: Introduction

It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...

Masking and Demasking Agents

Masking and Demasking Agents

EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Structural and mechanistic insights into the VP14460-VP14465 effector-immunity module of the Vibrio parahaemolyticus type VI secretion system.

The Journal of biological chemistry·2026

Same author

CKD: Contrastive Knowledge Distillation for Cross-Dataset EEG Classification.

IEEE transactions on bio-medical engineering·2026

Same author

Prospective Randomized Controlled Study to Evaluate Combining Video Feedback Teaching with Virtual Simulation for Ceramic Veneer Tooth Preparation.

Journal of visualized experiments : JoVE·2026

Same author

Interpreting the Choice Logic Surrounding High-Scoring Students' Enrollment in China's Vocational Secondary-Undergraduate Articulation Program: A Theoretical Thematic Analysis of Public Discourse.

Behavioral sciences (Basel, Switzerland)·2026

Same author

Data-Driven Internal Model Control for Output Regulation.

IEEE transactions on cybernetics·2026

Same author

MRI- and report-based multimodal model with SHAP-based explanation for preoperative prediction of deep stromal invasion in early-stage cervical cancer.

Insights into imaging·2026

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Videos

Mirror Descent Safe Policy Optimization for Reinforcement Learning Agents.

Renzhi Lu, Ning Wu, Qingqing Xiong

IEEE Transactions on Pattern Analysis and Machine Intelligence

|March 17, 2026

Summary

This summary is machine-generated.

A new Mirror Descent Safe Policy Optimization (MDSPO) algorithm ensures reinforcement learning (RL) agents explore safely. This method improves returns and satisfies constraints, crucial for safe AI in complex environments.

Related Experiment Videos

Area of Science:

Artificial Intelligence
Reinforcement Learning
Robotics

Background:

Embodied agents require mechanisms for complex problem-solving.
Reinforcement learning (RL) is a key AI technology for enhancing agent learning capabilities.
Safe exploration is critical as not all actions are acceptable.

Purpose of the Study:

To propose a novel algorithm, Mirror Descent Safe Policy Optimization (MDSPO), for safe reinforcement learning.
To ensure RL agents maximize returns while adhering to safety constraints during exploration.

Main Methods:

Developed a novel optimization objective for safe policy optimization.
Employed a three-stage optimization strategy: unconstrained gradient descent, nonparametric policy projection with cost constraints, and parametric policy projection.
Utilized mirror descent optimization to balance return maximization and safety.

Main Results:

MDSPO improves average return by approximately 12% in locomotive experiments.
Demonstrated superior satisfaction of cost constraints compared to state-of-the-art methods.
Successfully found optimal paths and guaranteed agent safety in a real-world unmanned surface vessel obstacle avoidance task.

Conclusions:

MDSPO is a simple, first-order, and easily implementable safe RL algorithm.
Theoretical analysis provides bounds on return improvement and constraint violation.
MDSPO effectively enhances agent performance and safety in constrained environments.