Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Masking and Demasking Agents

Masking and Demasking Agents

EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Robbers Cave

Robbers Cave

During the 1950s, the landmark Robbers Cave experiment demonstrated that when groups must compete with one another, intergroup conflict, hostility, and even violence may result. At the Oklahoman summer camp, two troops of boys—termed the Rattlers and the Eagles—took part in a week-long tournament. During this time, their negativity culminated in derogatory name-calling, fistfights, and even vandalism and destruction of property. However, this work also revealed that such tension...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Efficacy and possible mechanism of <i>Kai-xin-san</i> in animal models of Alzheimer's disease: a systematic review and meta-analysis of preclinical studies.

Frontiers in pharmacology·2026

Same author

Hyper-RAG: combating LLM hallucinations using hypergraph-driven retrieval-augmented generation.

Nature communications·2026

Same author

DAMind: Zero-Shot Visual Cross-Domain Alignment and Representation for EEG Decoding.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Observation of moiré plasmonic skyrmion clusters.

Science advances·2025

Same author

High-performance Data Management for Whole Slide Image Analysis in Digital Pathology.

Proceedings of SPIE--the International Society for Optical Engineering·2025

Same author

StructVPR++: Distill Structural and Semantic Knowledge With Weighting Samples for Visual Place Recognition.

IEEE transactions on pattern analysis and machine intelligence·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 15, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Enhancing Value Decomposition With Target Transformation in Cooperative Multi-Agent Reinforcement Learning.

Zeyang Liu, Lipeng Wan, Shiguang Sun

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 13, 2026

Summary

This summary is machine-generated.

This study introduces Uncertainty-aware Target Transformation (UT2) to improve cooperative multi-agent reinforcement learning (MARL). UT2 enhances performance and stability in MARL by addressing challenges with non-monotonic and stochastic learning targets.

Related Experiment Videos

Last Updated: Apr 15, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Cooperative multi-agent reinforcement learning (MARL) is crucial for coordinating intelligent machines.
Dominant MARL methods using monotonic value decomposition offer scalability but limit representable joint action-values.
Existing solutions for non-monotonic targets can be unstable with stochastic returns, favoring suboptimal trajectories.

Purpose of the Study:

To develop a novel approach for cooperative MARL that overcomes limitations of monotonic value decomposition.
To address the fragility of learning targets in stochastic environments.
To improve the performance and stability of MARL agents.

Main Methods:

Proposed Target Transformation to map non-monotonic and stochastic learning targets to a monotonic-representable surrogate.
Developed Uncertainty-aware Target Transformation (UT2) with value-based and policy-based instantiations.
Integrated an uncertainty estimator with a best-individual coordination envelope.

Main Results:

UT2 demonstrated improved performance and stability across diverse cooperative MARL benchmarks.
The method showed significant gains with increasing non-monotonicity and stochasticity in returns.
UT2 successfully preserves the optimal joint action while handling complex learning targets.

Conclusions:

UT2 offers a robust solution for cooperative MARL, particularly in environments with non-monotonic and stochastic reward structures.
The approach enhances the reliability and effectiveness of multi-agent coordination.
This work advances the field of MARL by providing a more stable and performant learning framework.