Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

1.2K
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
1.2K
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

3.3K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
3.3K
Collisions in Multiple Dimensions: Problem Solving01:06

Collisions in Multiple Dimensions: Problem Solving

5.6K
In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...
5.6K
Reinforcement01:23

Reinforcement

1.1K
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
1.1K
Collisions in Multiple Dimensions: Introduction01:05

Collisions in Multiple Dimensions: Introduction

7.2K
It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...
7.2K
Masking and Demasking Agents01:19

Masking and Demasking Agents

3.9K
EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...
3.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Structural and mechanistic insights into the VP14460-VP14465 effector-immunity module of the Vibrio parahaemolyticus type VI secretion system.

The Journal of biological chemistry·2026
Same author

CKD: Contrastive Knowledge Distillation for Cross-Dataset EEG Classification.

IEEE transactions on bio-medical engineering·2026
Same author

Prospective Randomized Controlled Study to Evaluate Combining Video Feedback Teaching with Virtual Simulation for Ceramic Veneer Tooth Preparation.

Journal of visualized experiments : JoVE·2026
Same author

Interpreting the Choice Logic Surrounding High-Scoring Students' Enrollment in China's Vocational Secondary-Undergraduate Articulation Program: A Theoretical Thematic Analysis of Public Discourse.

Behavioral sciences (Basel, Switzerland)·2026
Same author

Data-Driven Internal Model Control for Output Regulation.

IEEE transactions on cybernetics·2026
Same author

MRI- and report-based multimodal model with SHAP-based explanation for preoperative prediction of deep stromal invasion in early-stage cervical cancer.

Insights into imaging·2026
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Videos

Mirror Descent Safe Policy Optimization for Reinforcement Learning Agents.

Renzhi Lu, Ning Wu, Qingqing Xiong

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |March 17, 2026
    PubMed
    Summary
    This summary is machine-generated.

    A new Mirror Descent Safe Policy Optimization (MDSPO) algorithm ensures reinforcement learning (RL) agents explore safely. This method improves returns and satisfies constraints, crucial for safe AI in complex environments.

    Related Experiment Videos

    Area of Science:

    • Artificial Intelligence
    • Reinforcement Learning
    • Robotics

    Background:

    • Embodied agents require mechanisms for complex problem-solving.
    • Reinforcement learning (RL) is a key AI technology for enhancing agent learning capabilities.
    • Safe exploration is critical as not all actions are acceptable.

    Purpose of the Study:

    • To propose a novel algorithm, Mirror Descent Safe Policy Optimization (MDSPO), for safe reinforcement learning.
    • To ensure RL agents maximize returns while adhering to safety constraints during exploration.

    Main Methods:

    • Developed a novel optimization objective for safe policy optimization.
    • Employed a three-stage optimization strategy: unconstrained gradient descent, nonparametric policy projection with cost constraints, and parametric policy projection.
    • Utilized mirror descent optimization to balance return maximization and safety.

    Main Results:

    • MDSPO improves average return by approximately 12% in locomotive experiments.
    • Demonstrated superior satisfaction of cost constraints compared to state-of-the-art methods.
    • Successfully found optimal paths and guaranteed agent safety in a real-world unmanned surface vessel obstacle avoidance task.

    Conclusions:

    • MDSPO is a simple, first-order, and easily implementable safe RL algorithm.
    • Theoretical analysis provides bounds on return improvement and constraint violation.
    • MDSPO effectively enhances agent performance and safety in constrained environments.