Jove
Visualize
联系我们
JoVE
x logofacebook logolinkedin logoyoutube logo
关于 JoVE
概览领导团队博客JoVE 帮助中心
作者
出版流程编辑委员会范围与政策同行评审常见问题投稿
图书馆员
用户评价订阅访问资源图书馆顾问委员会常见问题
研究
JoVE JournalMethods CollectionsJoVE Encyclopedia of Experiments存档
教育
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab Manual教师资源中心教师网站
使用条款与条件
隐私政策
政策

相关概念视频

Reinforcement Schedules01:24

Reinforcement Schedules

160
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
160
Reinforcement01:23

Reinforcement

221
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
221
Comparison between RL and RC circuits01:24

Comparison between RL and RC circuits

4.0K
An RC circuit consists of resistance and capacitance, while in an RL circuit, capacitance is replaced by an inductor. RL and RC circuits are first-order differential circuits that store energy. An RC circuit stores energy in the electric field, while an RL circuit stores energy in the magnetic field. When connected to a battery, an RC circuit charges the capacitor, causing the current to decrease from maximum to zero upon being fully charged. This increases the voltage across the capacitor from...
4.0K
Cause and Effect01:53

Cause and Effect

10.9K
While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable, is actually causing the systematic movement in our variables of interest. For instance, as sales in ice cream increase, so does the overall rate of crime. Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing crime do you think you might decide to treat yourself to a cone?
10.9K
Observational Learning01:12

Observational Learning

188
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
188
Reducing Line Loss01:18

Reducing Line Loss

156
In a three-phase circuit, line loss is an indicator of energy dissipated as heat due to the resistance of transmission lines. To address this, incorporating transformers into the system—a step-up transformer at the source and a step-down transformer at the load—is a strategic solution. Two three-phase transformers are introduced to improve this.
With a step-up transformer at the source, the voltage is increased, thereby reducing the current in the transmission lines since power loss...
156

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序
Same author

Unsupervised Skill Discovery Through Skill Regions Differentiation.

IEEE transactions on neural networks and learning systems·2025
Same author

On the Value of Myopic Behavior in Policy Reuse.

IEEE transactions on pattern analysis and machine intelligence·2025
Same author

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.

Neural networks : the official journal of the International Neural Network Society·2024
Same author

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.

IEEE transactions on neural networks and learning systems·2023
Same author

Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2022
Same author

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2021
Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
查看所有相关文章

相关实验视频

Updated: Jul 12, 2025

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

7.6K

虚假的相关性减少离线强化学习强化学习.

Zhihong Deng, Zuyue Fu, Lingxiao Wang

    IEEE transactions on pattern analysis and machine intelligence
    |October 30, 2023
    PubMed
    概括
    此摘要是机器生成的。

    本研究为线下强化学习 (RL) 引入了falSe相关性减少 (SCORE),以解决不确定性和决策之间的错误相关性. 通过使用化行为克隆调节器,SCORE提高了性能,并加速了融合.

    更多相关视频

    A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents
    06:25

    A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

    Published on: May 16, 2025

    195
    Pavlovian Conditioned Approach Training in Rats
    06:57

    Pavlovian Conditioned Approach Training in Rats

    Published on: February 4, 2016

    11.0K

    相关实验视频

    Last Updated: Jul 12, 2025

    Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
    08:05

    Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

    Published on: June 30, 2020

    7.6K
    A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents
    06:25

    A Real-Time Interactive System for Studying Confrontational Pursuit Behavior in Rodents

    Published on: May 16, 2025

    195
    Pavlovian Conditioned Approach Training in Rats
    06:57

    Pavlovian Conditioned Approach Training in Rats

    Published on: February 4, 2016

    11.0K

    科学领域:

    • 人工智能的人工智能
    • 机器学习 机器学习
    • 强化学习是一种强化学习.

    背景情况:

    • 线下强化学习 (RL) 使用大型数据集进行顺序决策.
    • 现有的方法主要侧重于分布外 (OOD) 行动,忽视不确定性驱动的次优化.

    研究的目的:

    • 解决认识不确定性和离线RL决策之间的错误相关性这一关键问题.
    • 提出一种新的算法,falSe 相关性减少 (SCORE),以提高线下RL的性能和可靠性.

    主要方法:

    • SCORE使用一个化行为克隆调节器来改进不确定性估计.
    • 这种规范化是缓解虚假相关性引起的次优度的关键.

    主要成果:

    • 在标准线下RL基准 (D4RL) 上,SCORE实现了最先进的 (SoTA) 性能.
    • 经验结果表明,任务完成速度加快了3.1倍.
    • 理论分析验证了算法的趋同到一个最佳的政策.

    结论:

    • SCORE有效地减少了线下RL中的错误相关性,从而改善了决策.
    • 该算法既提供了实际有效性,也提供了对趋同的理论保证.