Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Stability of Equilibrium Configuration: Problem Solving

Stability of Equilibrium Configuration: Problem Solving

The stability of equilibrium configurations is an important concept in physics, engineering, and other related fields. In simple terms, it refers to the tendency of an object or system to return to its equilibrium position after being disturbed. The stability of an equilibrium configuration can be analyzed by considering the potential energy function of the system and examining its behavior near the equilibrium point.
Problem-solving in the context of the stability of equilibrium configuration...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Clinical features and gastrointestinal bleeding risk factors in IgA vasculitis patients: a retrospective study in a large volume centre.

Clinical and experimental rheumatology·2026

Same author

A dual-functional PEG-tyrosine hydrogel with photothermal effect and antioxidant capacity for cancer therapy and tissue regeneration.

Regenerative biomaterials·2026

Same author

ATP2B4 driven chromatin compaction exacerbates pancreatic cancer radiotherapy resistance.

Cell death discovery·2026

Same author

Overcoming Biofilm Barriers in Periodontitis: A Lectin-Targeted Conjugate for Enhanced Antimicrobial Photodynamic Therapy.

Journal of dentistry·2026

Same author

Knowledge, attitude, and practices on gestational weight gain among pregnant women, partners, female household members, and healthcare providers: a mixed-method study in Tanzania.

BMC pregnancy and childbirth·2026

Same author

Endoscopic features associated with hospitalization outcomes in IgA vasculitis patients: a single-center retrospective cohort study.

Frontiers in immunology·2026

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jun 28, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

适应性个体Q-学习-多代理增强学习方法,用于协调优化.

Zhen Zhang, Dongqing Wang

IEEE transactions on neural networks and learning systems

|April 16, 2024

概括

此摘要是机器生成的。

我们介绍了自适应的个体Q学习 (A-IQL),一个合作的多代理强化学习 (MARL) 算法. A-IQL有效地适应不断变化的环境,优化了交通流等动态设置中的协调.

更多相关视频

Large Scale Energy Efficient Sensor Network Routing Using a Quantum Processor Unit

Large Scale Energy Efficient Sensor Network Routing Using a Quantum Processor Unit

Published on: September 8, 2023

A Modified Lean and Release Technique to Emphasize Response Inhibition and Action Selection in Reactive Balance

A Modified Lean and Release Technique to Emphasize Response Inhibition and Action Selection in Reactive Balance

Published on: March 19, 2020

相关实验视频

Last Updated: Jun 28, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Large Scale Energy Efficient Sensor Network Routing Using a Quantum Processor Unit

Large Scale Energy Efficient Sensor Network Routing Using a Quantum Processor Unit

Published on: September 8, 2023

A Modified Lean and Release Technique to Emphasize Response Inhibition and Action Selection in Reactive Balance

A Modified Lean and Release Technique to Emphasize Response Inhibition and Action Selection in Reactive Balance

Published on: March 19, 2020

科学领域:

人工智能的人工智能
机器学习机器学习
机器人技术机器人技术机器人技术

背景情况:

由于其可扩展性和任务分配能力,多代理强化学习 (MARL) 被用于协调优化.
现有的MARL融合结果主要局限于重复的游戏,忽视了适应动态环境.
很少有MARL算法处理环境变化,例如流动波动或自动驾驶汽车的意外障碍.

研究的目的:

提出一种新的合作MARL算法,即适应性个体Q学习 (A-IQL),旨在适应切换环境.
分析A-IQL在具有时间顺序决定性的状态转换的随机游戏中的收特性.
调查更新期 (T) 对A-IQL趋同的影响.

主要方法:

提出了自适应的个体Q学习 (A-IQL) 算法,其中每个代理以一个T周期更新其Q函数.
对具有决定性状态过渡的随机游戏进行了收分析,按时间顺序进行.
用一个虚构的随机游戏来研究 T 期对趋同的影响.
通过在两个不同的交换环境中的模拟来验证算法的有效性:分布式传感器网络 (DSN) 和目标传输任务.

主要成果:

A-IQL证明了在具有特定过渡属性的随机游戏中学习最佳联合策略的能力.
这项研究分析了更新期T和算法的趋同行为之间的关系.
经验验证证证实了A-IQL在动态场景中的有效性,包括DSN和目标运输任务.

结论:

拟议的A-IQL算法为面临动态和交换环境的多代理系统的协调优化提供了可行的解决方案.
A-IQL为代理提供了一个框架,使他们能够有效地调整他们的策略,提高整体系统性能.
这些发现凸显了MARL适应机制对现实世界的应用的重要性.