Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Stability of Equilibrium Configuration: Problem Solving

Stability of Equilibrium Configuration: Problem Solving

The stability of equilibrium configurations is an important concept in physics, engineering, and other related fields. In simple terms, it refers to the tendency of an object or system to return to its equilibrium position after being disturbed. The stability of an equilibrium configuration can be analyzed by considering the potential energy function of the system and examining its behavior near the equilibrium point.
Problem-solving in the context of the stability of equilibrium configuration...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Winter-associated downregulation of ovarian NR5A2 correlates with impaired follicle development in the striped hamster (Cricetulus barabensis).

Scientific reports·2026

Same author

Molecular Mechanisms of Resistance to Cyhalofop-Butyl in Barnyard Grass (<i>Echinochloa crus-galli</i>).

Plants (Basel, Switzerland)·2026

Same author

Circ_QRICH1 promotes osteoarthritis progression by sponging miR-214-3p to impact ATF3-mediated chondrocyte ferroptosis.

Translational research : the journal of laboratory and clinical medicine·2026

Same author

Dietary intake and hyperuricemia among US adults: A matched case-control analysis of NHANES 2001-2020.

Medicine·2026

Same author

Transcriptome reveals probiotics mitigating MCLR-induced reproductive toxicity in male zebrafish: Regulation of reproductive endocrine, oxidative stress, and inflammatory response.

Journal of environmental sciences (China)·2026

Same author

Combined blockade of VEGFR-3 and Itga-9 inhibits corneal lymphangiogenesis and valvulogenesis in vivo and promotes high-risk transplant survival.

The ocular surface·2026

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 12, 2025

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.

Xiaoliang Hu, Pengcheng Guo, Yadong Li

IEEE Transactions on Neural Networks and Learning Systems

|September 20, 2024

Summary

This summary is machine-generated.

This study introduces a novel factorized Tchebycheff value-decomposition optimization (TVDO) method to address policy inconsistency in cooperative multiagent reinforcement learning (MARL). TVDO ensures consistency between global and individual optimal action-value functions, outperforming state-of-the-art baselines.

More Related Videos

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Related Experiment Videos

Last Updated: Jun 12, 2025

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Area of Science:

Artificial Intelligence
Machine Learning
Reinforcement Learning

Background:

Cooperative multiagent reinforcement learning (MARL) often uses centralized training with decentralized execution (CTDE).
A key challenge in CTDE is the inconsistency between jointly trained policies and individually executed actions.

Purpose of the Study:

To propose a novel method, factorized Tchebycheff value-decomposition optimization (TVDO), to resolve policy inconsistency in MARL.
To ensure consistency between global and individual optimal action-value functions in CTDE.

Main Methods:

Formulation of a nonlinear Tchebycheff aggregation function inspired by multiobjective optimization (MOO).
Theoretical proof that the factorized value decomposition with Tchebycheff aggregation satisfies individual-global-max (IGM) sufficiency and necessity.
Empirical verification in the climb and penalty game and evaluation on the StarCraft multiagent challenge (SMAC) benchmark.

Main Results:

TVDO precisely expresses global-to-individual value decomposition with guaranteed policy consistency.
TVDO demonstrates significant performance superiority over state-of-the-art (SOTA) MARL baselines in empirical evaluations.
The method effectively constrains the upper bound of individual action-value bias to achieve global optimum.

Conclusions:

TVDO effectively overcomes the inconsistency challenge in CTDE for MARL.
The proposed method guarantees policy consistency and achieves superior performance in complex MARL environments.
TVDO offers a promising approach for advancing cooperative MARL research.