Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Comparison between RL and RC circuits

Comparison between RL and RC circuits

An RC circuit consists of resistance and capacitance, while in an RL circuit, capacitance is replaced by an inductor. RL and RC circuits are first-order differential circuits that store energy. An RC circuit stores energy in the electric field, while an RL circuit stores energy in the magnetic field. When connected to a battery, an RC circuit charges the capacitor, causing the current to decrease from maximum to zero upon being fully charged. This increases the voltage across the capacitor from...

Cause and Effect

Cause and Effect

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable, is actually causing the systematic movement in our variables of interest. For instance, as sales in ice cream increase, so does the overall rate of crime. Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing crime do you think you might decide to treat yourself to a cone?

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Reducing Line Loss

Reducing Line Loss

In a three-phase circuit, line loss is an indicator of energy dissipated as heat due to the resistance of transmission lines. To address this, incorporating transformers into the system—a step-up transformer at the source and a step-down transformer at the load—is a strategic solution. Two three-phase transformers are introduced to improve this.
With a step-up transformer at the source, the voltage is increased, thereby reducing the current in the transmission lines since power loss...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Unsupervised Skill Discovery Through Skill Regions Differentiation.

IEEE transactions on neural networks and learning systems·2025

Same author

On the Value of Myopic Behavior in Policy Reuse.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.

Neural networks : the official journal of the International Neural Network Society·2024

Same author

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.

IEEE transactions on neural networks and learning systems·2023

Same author

Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2022

Same author

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2021

Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jul 12, 2025

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

虚假的相关性减少离线强化学习强化学习.

Zhihong Deng, Zuyue Fu, Lingxiao Wang

IEEE transactions on pattern analysis and machine intelligence

|October 30, 2023

概括

此摘要是机器生成的。

本研究为线下强化学习 (RL) 引入了falSe相关性减少 (SCORE),以解决不确定性和决策之间的错误相关性. 通过使用化行为克隆调节器,SCORE提高了性能,并加速了融合.

更多相关视频

A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

Published on: May 16, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

相关实验视频

Last Updated: Jul 12, 2025

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

Published on: May 16, 2025

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

科学领域:

人工智能的人工智能
机器学习机器学习
强化学习是一种强化学习.

背景情况:

线下强化学习 (RL) 使用大型数据集进行顺序决策.
现有的方法主要侧重于分布外 (OOD) 行动,忽视不确定性驱动的次优化.

研究的目的:

解决认识不确定性和离线RL决策之间的错误相关性这一关键问题.
提出一种新的算法,falSe 相关性减少 (SCORE),以提高线下RL的性能和可靠性.

主要方法:

SCORE使用一个化行为克隆调节器来改进不确定性估计.
这种规范化是缓解虚假相关性引起的次优度的关键.

主要成果:

在标准线下RL基准 (D4RL) 上,SCORE实现了最先进的 (SoTA) 性能.
经验结果表明,任务完成速度加快了3.1倍.
理论分析验证了算法的趋同到一个最佳的政策.

结论:

SCORE有效地减少了线下RL中的错误相关性,从而改善了决策.
该算法既提供了实际有效性,也提供了对趋同的理论保证.