Jove
Visualize
联系我们
JoVE
x logofacebook logolinkedin logoyoutube logo
关于 JoVE
概览领导团队博客JoVE 帮助中心
作者
出版流程编辑委员会范围与政策同行评审常见问题投稿
图书馆员
用户评价订阅访问资源图书馆顾问委员会常见问题
研究
JoVE JournalMethods CollectionsJoVE Encyclopedia of Experiments存档
教育
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab Manual教师资源中心教师网站
使用条款与条件
隐私政策
政策

相关概念视频

Operant Conditioning Intervention01:24

Operant Conditioning Intervention

604
Operant conditioning serves as a foundational principle in therapeutic interventions aimed at modifying maladaptive behaviors. Central to this approach is the notion that behaviors, both adaptive and maladaptive, are learned through reinforcement. By analyzing the environmental factors that reinforce problematic behaviors, clinicians can design interventions to weaken these reinforcements and replace maladaptive behaviors with healthier alternatives.
In operant conditioning, behaviors that are...
604
Reinforcement Schedules01:24

Reinforcement Schedules

668
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
668
Reinforcement01:23

Reinforcement

1.1K
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
1.1K
Timing and Consequences on Behavior01:08

Timing and Consequences on Behavior

596
In operant conditioning, the timing of reinforcement is crucial. For animals like rats and cats, immediate reinforcement (within a few seconds) is much more effective than delayed reinforcement. For example, a food reward for a rat needs to follow within 30 seconds of pressing a bar to be effective. 
Humans, however, can respond to delayed reinforcers. We often make decisions between immediate small rewards and delayed larger rewards. This ability to delay gratification is a significant...
596
Modeling in Therapy01:26

Modeling in Therapy

665
Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...
665
Primary and Secondary Reinforcers01:23

Primary and Secondary Reinforcers

1.4K
In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...
1.4K

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序
Same author

MiR-21 promotes osteogenic transformation in ankylosing spondylitis fibroblasts and modulates bone metabolism in a murine arthritis model potentially involving the MAPK NF-κB pathway.

Journal of orthopaedic surgery and research·2026
Same author

A Machine Learning Driven Approach to Quantifying Coronary Artery Tortuosity.

JACC. Advances·2026
Same author

Mobile intervention for emerging adults with regular cannabis use: a micro-randomized trial.

Lancet regional health. Americas·2026
Same author

Elevated red cell distribution width as a prognostic indicator in critically ill patients with atrial fibrillation and chronic kidney disease.

BMC cardiovascular disorders·2026
Same author

Reproducible workflow for online artificial intelligence in digital health.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026
Same author

Personalized modeling of stress and blood pressure reactivity using mobile health data.

Npj mental health research·2026
Same journal

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits.

Reinforcement learning journal·2026
Same journal

Non-Stationary Latent Auto-Regressive Bandits.

Reinforcement learning journal·2026
查看所有相关文章

相关实验视频

Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

16.1K

什么时候以及为什么超标折扣对于强化学习干预措施的重要性

Ian M Moore1, Eura Nofshin1, Siddharth Swaroop1

  • 1Department of Computer Science, Harvard University, USA.

Reinforcement learning journal
|March 16, 2026
PubMed
概括
此摘要是机器生成的。

人工智能代理可以通过模拟他们的奖励折扣来更好地指导人类. 这项研究引入了一个指数折扣因子近似,用于过度折扣,改进人工智能干预,并在在线学习中惊人地超越了过度折扣模型.

关键词:
基于代理人的人类建模.人与人工智能的互动过度的折扣是超标的

更多相关视频

Errors as a Means of Reducing Impulsive Food Choice
07:07

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

9.3K
Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats
09:12

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

10.3K

相关实验视频

Last Updated: Mar 17, 2026

Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

16.1K
Errors as a Means of Reducing Impulsive Food Choice
07:07

Errors as a Means of Reducing Impulsive Food Choice

Published on: June 5, 2016

9.3K
Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats
09:12

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

10.3K

科学领域:

  • 人工智能的人工智能
  • 认知科学 认知科学
  • 强化学习是一种强化学习.

背景情况:

  • 人类的决策经常表现出对未来奖励的夸张性折扣.
  • 目前的强化学习 (RL) 模型主要使用人类行为的指数折扣,简化了规划.
  • 这种差异对旨在有效指导人类行为的人工智能代理提出了挑战.

研究的目的:

  • 为了研究计算成本和性能效益之间的权衡,用高位折扣对人类进行建模.
  • 开发和评估一个人工智能政策,修改人类的折扣行为,以实现遥远的目标.
  • 要确定用指数因子近似的过度折扣是否在计算上可行和有效.

主要方法:

  • 导出一个固定的指数折扣因子,以近似人体模型中的过度折扣.
  • 证明了近似的理论保证,确保没有错过必要的AI干预.
  • 将近值与平均危险率方法进行比较,以减少不必要的干预 (错误阳性).

主要成果:

  • 导出的指数近似保证AI代理不会错过关键的干预.
  • 与平均危险率方法相比,近似结果导致错误阳性结果较少.
  • 实验结果表明,指数近似在在线学习场景中胜过真正的过度模型.

结论:

  • 对AI代理来说,用一个固定的指数系数来近似过度折扣是一个可行的策略.
  • 这种方法通过改善人类目标导向行为来提高人工智能干预的有效性.
  • 令人惊的发现是,指数近似在在线学习中表现出色,因此需要进一步调查人工智能与人类交互的动态.