Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE.

Frontiers in robotics and AI·2025

Same author

Resilient Supervisory Multi-Agent Systems.

IEEE transactions on robotics : a publication of the IEEE Robotics and Automation Society·2024

Same author

Cooperative planning for physically interacting heterogeneous robots.

Frontiers in robotics and AI·2024

Same author

Editorial: Thought leaders in robotics and AI.

Frontiers in robotics and AI·2023

Same author

Design and Construction of Unmanned Ground Vehicles for Sub-canopy Plant Phenotyping.

Methods in molecular biology (Clifton, N.J.)·2022

Same author

Non-Smooth Control Barrier Navigation Functions for STL Motion Planning.

Frontiers in robotics and AI·2022

Same journal

On the control of recurrent neural networks using constant inputs.

IEEE transactions on automatic control·2026

Same journal

Robust Control Barrier Functions for Uncertain Parameter-Varying Control Affine Systems with Set-Membership Parameter Estimation.

IEEE transactions on automatic control·2026

Same journal

Estimation in Networks with Spatiotemporally Correlated Noise.

IEEE transactions on automatic control·2026

Same journal

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse.

IEEE transactions on automatic control·2025

Same journal

Transient Analysis of Serial Production Lines With Perishable Products: Bernoulli Reliability Model.

IEEE transactions on automatic control·2024

Same journal

Solid Boundary Output Feedback Control of the Stefan Problem: The Enthalpy Approach.

IEEE transactions on automatic control·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 12, 2025

The HoneyComb Paradigm for Research on Collective Human Behavior

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

PAC Reinforcement Learning Algorithm for General-Sum Markov Games.

Ashkan Zehfroosh¹, Herbert G Tanner¹

¹Department of Mechanical Engineering, University of Delaware, Newark, DE 19716 USA.

IEEE Transactions on Automatic Control

|November 2, 2023

Summary

This summary is machine-generated.

This study introduces a framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) in Markov games. It presents a novel PAC MARL algorithm for general-sum games, enhancing existing methods and enabling PAC verification.

Keywords:

Markov Game Multi-agent system Nash Equilibrium Probably approximately correct Reinforcement Learning

More Related Videos

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

Related Experiment Videos

Last Updated: Jul 12, 2025

The HoneyComb Paradigm for Research on Collective Human Behavior

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

Area of Science:

Artificial Intelligence
Machine Learning
Game Theory

Background:

Multi-agent reinforcement learning (MARL) is crucial for complex decision-making.
Markov games are standard models for strategic interactions.
Existing MARL algorithms often lack theoretical performance guarantees.

Purpose of the Study:

To develop a theoretical framework for probably approximately correct (PAC) MARL algorithms.
To introduce a novel PAC MARL algorithm for general-sum Markov games.
To provide a method for verifying the PAC property of MARL algorithms.

Main Methods:

Extension of Nash Q-learning using delayed Q-learning principles.
Development of a theoretical PAC framework for MARL.
Comparative numerical simulations to evaluate algorithm performance.

Main Results:

A new PAC MARL algorithm for general-sum Markov games is proposed.
The theoretical framework allows for PAC verification of MARL algorithms.
Numerical results validate the algorithm's performance and robustness.

Conclusions:

The proposed framework advances PAC MARL theory.
The novel algorithm offers provable PAC guarantees.
The framework facilitates the design and analysis of reliable MARL systems.