Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Operant Conditioning Intervention

Operant Conditioning Intervention

Operant conditioning serves as a foundational principle in therapeutic interventions aimed at modifying maladaptive behaviors. Central to this approach is the notion that behaviors, both adaptive and maladaptive, are learned through reinforcement. By analyzing the environmental factors that reinforce problematic behaviors, clinicians can design interventions to weaken these reinforcements and replace maladaptive behaviors with healthier alternatives.
In operant conditioning, behaviors that are...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Timing and Consequences on Behavior

Timing and Consequences on Behavior

In operant conditioning, the timing of reinforcement is crucial. For animals like rats and cats, immediate reinforcement (within a few seconds) is much more effective than delayed reinforcement. For example, a food reward for a rat needs to follow within 30 seconds of pressing a bar to be effective.
Humans, however, can respond to delayed reinforcers. We often make decisions between immediate small rewards and delayed larger rewards. This ability to delay gratification is a significant...

Modeling in Therapy

Modeling in Therapy

Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...

Primary and Secondary Reinforcers

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

MiR-21 promotes osteogenic transformation in ankylosing spondylitis fibroblasts and modulates bone metabolism in a murine arthritis model potentially involving the MAPK NF-κB pathway.

Journal of orthopaedic surgery and research·2026

Same author

A Machine Learning Driven Approach to Quantifying Coronary Artery Tortuosity.

JACC. Advances·2026

Same author

Mobile intervention for emerging adults with regular cannabis use: a micro-randomized trial.

Lancet regional health. Americas·2026

Same author

Elevated red cell distribution width as a prognostic indicator in critically ill patients with atrial fibrillation and chronic kidney disease.

BMC cardiovascular disorders·2026

Same author

Reproducible workflow for online artificial intelligence in digital health.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same author

Personalized modeling of stress and blood pressure reactivity using mobile health data.

Npj mental health research·2026

Same journal

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits.

Reinforcement learning journal·2026

Same journal

Non-Stationary Latent Auto-Regressive Bandits.

Reinforcement learning journal·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

什么时候以及为什么超标折扣对于强化学习干预措施的重要性

Ian M Moore¹, Eura Nofshin¹, Siddharth Swaroop¹

¹Department of Computer Science, Harvard University, USA.

Reinforcement learning journal

|March 16, 2026

概括

此摘要是机器生成的。

人工智能代理可以通过模拟他们的奖励折扣来更好地指导人类. 这项研究引入了一个指数折扣因子近似,用于过度折扣,改进人工智能干预,并在在线学习中惊人地超越了过度折扣模型.

关键词:

基于代理人的人类建模.人与人工智能的互动过度的折扣是超标的

更多相关视频

Errors as a Means of Reducing Impulsive Food Choice

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

相关实验视频

Last Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

Errors as a Means of Reducing Impulsive Food Choice

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

科学领域:

人工智能的人工智能
认知科学认知科学
强化学习是一种强化学习.

背景情况:

人类的决策经常表现出对未来奖励的夸张性折扣.
目前的强化学习 (RL) 模型主要使用人类行为的指数折扣,简化了规划.
这种差异对旨在有效指导人类行为的人工智能代理提出了挑战.

研究的目的:

为了研究计算成本和性能效益之间的权衡,用高位折扣对人类进行建模.
开发和评估一个人工智能政策,修改人类的折扣行为,以实现遥远的目标.
要确定用指数因子近似的过度折扣是否在计算上可行和有效.

主要方法:

导出一个固定的指数折扣因子,以近似人体模型中的过度折扣.
证明了近似的理论保证,确保没有错过必要的AI干预.
将近值与平均危险率方法进行比较,以减少不必要的干预 (错误阳性).

主要成果:

导出的指数近似保证AI代理不会错过关键的干预.
与平均危险率方法相比,近似结果导致错误阳性结果较少.
实验结果表明,指数近似在在线学习场景中胜过真正的过度模型.

结论:

对AI代理来说,用一个固定的指数系数来近似过度折扣是一个可行的策略.
这种方法通过改善人类目标导向行为来提高人工智能干预的有效性.
令人惊的发现是,指数近似在在线学习中表现出色,因此需要进一步调查人工智能与人类交互的动态.