Search research articles

相关概念视频

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE.

Frontiers in robotics and AI·2025

Same author

Resilient Supervisory Multi-Agent Systems.

IEEE transactions on robotics : a publication of the IEEE Robotics and Automation Society·2024

Same author

Cooperative planning for physically interacting heterogeneous robots.

Frontiers in robotics and AI·2024

Same author

Editorial: Thought leaders in robotics and AI.

Frontiers in robotics and AI·2023

Same author

Design and Construction of Unmanned Ground Vehicles for Sub-canopy Plant Phenotyping.

Methods in molecular biology (Clifton, N.J.)·2022

Same author

Non-Smooth Control Barrier Navigation Functions for STL Motion Planning.

Frontiers in robotics and AI·2022

Same journal

On the control of recurrent neural networks using constant inputs.

IEEE transactions on automatic control·2026

Same journal

Robust Control Barrier Functions for Uncertain Parameter-Varying Control Affine Systems with Set-Membership Parameter Estimation.

IEEE transactions on automatic control·2026

Same journal

Estimation in Networks with Spatiotemporally Correlated Noise.

IEEE transactions on automatic control·2026

Same journal

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse.

IEEE transactions on automatic control·2025

Same journal

Transient Analysis of Serial Production Lines With Perishable Products: Bernoulli Reliability Model.

IEEE transactions on automatic control·2024

Same journal

Solid Boundary Output Feedback Control of the Stefan Problem: The Enthalpy Approach.

IEEE transactions on automatic control·2024

查看所有相关文章

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

Search research articles

相关实验视频

Updated: Jul 12, 2025

The HoneyComb Paradigm for Research on Collective Human Behavior

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

对于一般总和马尔科夫游戏的PAC增强学习算法.

Ashkan Zehfroosh¹, Herbert G Tanner¹

¹Department of Mechanical Engineering, University of Delaware, Newark, DE 19716 USA.

IEEE transactions on automatic control

|November 2, 2023

概括

此摘要是机器生成的。

本研究介绍了马尔科夫游戏中可能大致正确的 (PAC) 多代理强化学习 (MARL) 的框架. 它介绍了一种用于总和游戏的新型PAC MARL算法,增强了现有的方法并使PAC验证成为可能.

关键词:

马尔科夫游戏马尔科夫游戏多代理系统多代理系统纳什平衡是一个纳什平衡.可能大约是正确的.强化学习是一种强化学习.

更多相关视频

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

相关实验视频

Last Updated: Jul 12, 2025

The HoneyComb Paradigm for Research on Collective Human Behavior

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

科学领域:

人工智能的人工智能
机器学习机器学习
游戏理论游戏理论

背景情况:

多代理强化学习 (MARL) 对于复杂的决策至关重要.
马尔科夫游戏是战略互动的标准模型.
现有的MARL算法往往缺乏理论上的性能保证.

研究的目的:

为可能大致正确的 (PAC) MARL算法开发一个理论框架.
为了介绍一个新的PAC MARL算法用于一般和马尔科夫游戏.
提供一种方法来验证MARL算法的PAC属性.

主要方法:

使用延迟Q学习原则扩展纳什Q学习.
开发一个MARL的理论PAC框架.
进行比较的数值模拟来评估算法性能.

主要成果:

为一般和马尔科夫游戏提出了一个新的PAC MARL算法.
理论框架允许对MARL算法进行PAC验证.
数值结果验证了算法的性能和稳定性.

结论:

拟议的框架推进了PAC MARL理论.
这种新的算法提供了可证明的PAC保证.
该框架有助于设计和分析可靠的MARL系统.