Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Incentive Theory: Pull Theory of Motivation

Incentive Theory: Pull Theory of Motivation

Incentive theory, or the "pull theory" of motivation, suggests that external rewards primarily drive behavior. Individuals are motivated to engage in activities when they anticipate a desirable outcome. This is why people often work hard for promotions or study intensively to achieve high grades. These incentives can be tangible, physical rewards such as money or promotions, or intangible, non-physical rewards like praise and social recognition.
The theory differentiates between...

Gradient and Del Operator

Gradient and Del Operator

In mathematics and physics, the gradient and del operator are fundamental concepts used to describe the behavior of functions and fields in space. The gradient is a mathematical operator that gives both the magnitude and direction of the maximum spatial rate of change. Consider a person standing on a mountain. The slope of the mountain at any given point is not defined unless it is quantified in a particular direction. For this reason, a "directional derivative" is defined, which is a vector...

Primary and Secondary Reinforcers

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

The impact of food advertising on children's daily energy intake: does it differ by advertising content, format, or participant characteristics? A cross-over randomised controlled trial.

Appetite·2026

Same author

Influence-aware memory architectures for deep reinforcement learning in POMDPs.

Neural computing & applications·2025

Same author

World and Human Action Models towards gameplay ideation.

Nature·2025

Same author

A Survey on Scenario Theory, Complexity, and Compression-Based Learning and Generalization.

IEEE transactions on neural networks and learning systems·2023

Same author

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning.

Autonomous agents and multi-agent systems·2021

Same author

A deep learning approach to identify unhealthy advertisements in street view images.

Scientific reports·2021

Same journal

Supporting human-agent communication for explainable planning in spatial-temporal planning problems.

Neural computing & applications·2026

Same journal

Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition.

Neural computing & applications·2026

Same journal

Sequential pattern transformer (SPT): a generative and interpretable framework for predicting disease trajectories.

Neural computing & applications·2026

Same journal

Balancing misclassification errors in image-based inference using problem domain semantics and a nested cascade architecture.

Neural computing & applications·2025

Same journal

Deep multi-objective reinforcement learning for utility-based infrastructural maintenance optimization.

Neural computing & applications·2025

Same journal

A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.

Neural computing & applications·2025

查看所有相关文章

Search research articles

相关实验视频

Updated: Sep 17, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

不同奖励政策梯度差异奖励政策梯度

Jacopo Castellini¹, Sam Devlin², Frans A Oliehoek³

¹Department of Computer Science, University of Liverpool, Liverpool, UK.

Neural computing & applications

|June 30, 2025

概括

此摘要是机器生成的。

通过将差异奖励与政策梯度相结合,Dr.Reinforce为多代理强化学习提供了一个新的解决方案. 这种方法有效地解决了分散政策的多代理信用分配问题,即使奖励函数是未知的.

关键词:

不同奖励的差异奖励.多代理信用指派多代理信用指派多个代理强化学习学习多个代理强化学习学习政策梯度的政策梯度是指政策的梯度奖励学习是为了学习.

更多相关视频

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

相关实验视频

Last Updated: Sep 17, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

科学领域:

人工智能的人工智能
机器学习机器学习
机器人技术机器人技术机器人技术

背景情况:

政策梯度方法在多代理强化学习中被广泛使用.
一个重大挑战是多代理信用分配,这对于有效的政策学习至关重要.
现有的方法往往难以准确评估个体代理的贡献.

研究的目的:

提出一种新的算法,Reinforce博士,用于改进多代理信用分配.
在多代理强化学习环境中,使分散政策的学习成为可能.
为了提供一个解决方案,在奖励函数是已知的和未知的情况下都起作用.

主要方法:

强化博士将差异奖励与政策梯度直接结合起来.
它避免了学习Q函数的需要,与反事实多代理政策梯度 (COMA) 等方法不同.
对于未知的奖励函数,一个辅助奖励网络被训练来估计差异奖励.

主要成果:

强化博士有效地解决了多代理信用分配问题.
该算法促进了对分散政策的学习.
一种变体的Dr.Reinforce证明了有效性,即使奖励功能是不是明确地知道.

结论:

博士强化在多代理强化学习方面取得了重大进展.
与现有技术相比,该方法提供了一种更直接的信用分配方法.
Dr.Reinforce 为各种多代理学习场景提供灵活有效的解决方案.