Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Incentive Theory: Pull Theory of Motivation

Incentive Theory: Pull Theory of Motivation

Incentive theory, or the "pull theory" of motivation, suggests that external rewards primarily drive behavior. Individuals are motivated to engage in activities when they anticipate a desirable outcome. This is why people often work hard for promotions or study intensively to achieve high grades. These incentives can be tangible, physical rewards such as money or promotions, or intangible, non-physical rewards like praise and social recognition.
The theory differentiates between...

Gradient and Del Operator

Gradient and Del Operator

In mathematics and physics, the gradient and del operator are fundamental concepts used to describe the behavior of functions and fields in space. The gradient is a mathematical operator that gives both the magnitude and direction of the maximum spatial rate of change. Consider a person standing on a mountain. The slope of the mountain at any given point is not defined unless it is quantified in a particular direction. For this reason, a "directional derivative" is defined, which is a vector...

Primary and Secondary Reinforcers

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The impact of food advertising on children's daily energy intake: does it differ by advertising content, format, or participant characteristics? A cross-over randomised controlled trial.

Appetite·2026

Same author

Influence-aware memory architectures for deep reinforcement learning in POMDPs.

Neural computing & applications·2025

Same author

World and Human Action Models towards gameplay ideation.

Nature·2025

Same author

A Survey on Scenario Theory, Complexity, and Compression-Based Learning and Generalization.

IEEE transactions on neural networks and learning systems·2023

Same author

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning.

Autonomous agents and multi-agent systems·2021

Same author

A deep learning approach to identify unhealthy advertisements in street view images.

Scientific reports·2021

Same journal

Supporting human-agent communication for explainable planning in spatial-temporal planning problems.

Neural computing & applications·2026

Same journal

Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition.

Neural computing & applications·2026

Same journal

Sequential pattern transformer (SPT): a generative and interpretable framework for predicting disease trajectories.

Neural computing & applications·2026

Same journal

Balancing misclassification errors in image-based inference using problem domain semantics and a nested cascade architecture.

Neural computing & applications·2025

Same journal

Deep multi-objective reinforcement learning for utility-based infrastructural maintenance optimization.

Neural computing & applications·2025

Same journal

A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.

Neural computing & applications·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 17, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Difference rewards policy gradients.

Jacopo Castellini¹, Sam Devlin², Frans A Oliehoek³

¹Department of Computer Science, University of Liverpool, Liverpool, UK.

Neural Computing & Applications

|June 30, 2025

Summary

This summary is machine-generated.

Dr.Reinforce offers a new solution for multi-agent reinforcement learning by combining difference rewards with policy gradients. This method effectively addresses multi-agent credit assignment for decentralized policies, even when the reward function is unknown.

Keywords:

Difference rewards Multi-agent credit assignment Multi-agent reinforcement learning Policy gradients Reward learning

More Related Videos

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

Related Experiment Videos

Last Updated: Sep 17, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Policy gradient methods are widely used in multi-agent reinforcement learning.
A significant challenge is multi-agent credit assignment, which is crucial for effective policy learning.
Existing methods often struggle with accurately assessing individual agent contributions.

Purpose of the Study:

To propose a novel algorithm, Dr.Reinforce, for improved multi-agent credit assignment.
To enable learning of decentralized policies in multi-agent reinforcement learning settings.
To provide a solution that works both when the reward function is known and unknown.

Main Methods:

Dr.Reinforce combines difference rewards directly with policy gradients.
It avoids the need for learning a Q-function, unlike methods such as Counterfactual Multi-Agent Policy Gradients (COMA).
For unknown reward functions, an auxiliary reward network is trained to estimate difference rewards.

Main Results:

Dr.Reinforce effectively addresses the multi-agent credit assignment problem.
The algorithm facilitates the learning of decentralized policies.
A variant of Dr.Reinforce demonstrates effectiveness even when the reward function is not explicitly known.

Conclusions:

Dr.Reinforce presents a significant advancement in multi-agent reinforcement learning.
The method offers a more direct approach to credit assignment compared to existing techniques.
Dr.Reinforce provides a flexible and effective solution for various multi-agent learning scenarios.