Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Primary and Secondary Reinforcers

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...

Law of Effect

Law of Effect

B.F. Skinner, a prominent figure in behavioral psychology, introduced operant conditioning by emphasizing the role of consequences in shaping behavior. This theory builds upon the law of effect proposed by Edward Thorndike, which posits that behaviors followed by satisfying outcomes are likely to be repeated. In contrast, those followed by unsatisfying outcomes are less likely to recur.
Edward Thorndike's foundational work involved studying learning in animals, particularly using puzzle...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Behavior Modification

Behavior Modification

Behavioral approaches have often been criticized for ignoring mental processes and focusing solely on observable behavior. However, these approaches provide an optimistic perspective for individuals seeking to change their behaviors. Rather than concentrating on intrinsic personality traits, behavioral approaches suggest that even longstanding habits can be modified by changing the reward contingencies that maintain them.
A real-world application of operant conditioning principles is applied...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Global Landscape and Translational Trajectories of Pelvic Floor Muscle Rehabilitation for Urinary Incontinence.

International urogynecology journal·2026

Same author

Putative buffering roles of two-way social support and psychological resilience in the association between nurse-patient conflict and situational emotional response: a cross-sectional correlational study among Chinese nursing interns.

BMC nursing·2026

Same author

Immunomodulatory and Gut Microbiota-Regulating Effects of Lactobacillus helveticus LH76 in Healthy Adults: Preclinical Safety Assessment and a Randomized, Double-Blind, Placebo-Controlled Trial.

Probiotics and antimicrobial proteins·2026

Same author

Engineering Crystalline Frameworks into Porous Liquids to Fabricate Graphene Oxide/Porous Liquid Membranes for Efficient Li<sup>+</sup>/Mg<sup>2+</sup> Separation.

Nature communications·2026

Same author

Targeting TMED4 enhances CD8<sup>+</sup> T cell function and CAR T cell efficacy in solid tumors through the IRE1α-autophagy axis.

Science advances·2026

Same author

EUV mask modeling based on a wide-angle full-vector beam propagation method.

Optics express·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jun 12, 2025

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

基于内核的去中心化政策评估,用于增强学习.

Jiamin Liu, Heng Lian

IEEE transactions on neural networks and learning systems

|September 17, 2024

概括

此摘要是机器生成的。

本研究引入了一种去中心化,非参数化的方法,用于加强学习 (RL) 中的政策评估. 它在协作多代理系统中建立了价值函数估计的统计误差极限.

更多相关视频

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

相关实验视频

Last Updated: Jun 12, 2025

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

科学领域:

人工智能的人工智能
机器学习机器学习
优化理论优化理论

背景情况:

分散学习对于多代理强化学习 (RL) 至关重要.
非参数方法提供了灵活性,但也带来了计算方面的挑战.
政策评估需要准确的状态值函数估计.

研究的目的:

开发一种去中心化的非参数方法,用于RL的政策评估.
分析拟议方法的统计收性质.
在多代理环境中解决计算和通信可行性.

主要方法:

使用基于回归的多阶段代技术.
在复制内核希尔伯特空间 (RKHS) 中使用无限维梯度下降 (GD).
应用尼斯特罗姆近似对有限维投影来提高可行性.

主要成果:

在一个完全分散的非参数框架中,为价值函数估计确定第一个统计误差极限.
证明拟议方法的趋同.
通过数值研究,比较基于回归的方法与核心时间差 (TD) 方法.

结论:

拟议的方法为分散的非参数政策评估提供了一个统计学上合理和计算上可行的解决方案.
已建立的误差极限为估计价值函数的趋同提供了理论上的保证.
这项工作促进了RL在复杂的多代理系统中的理解和应用.