Search research articles

お問い合わせ

JoVEについて

概要リーダーシップブログ JoVEヘルプセンター

著者向け

出版プロセス編集委員会範囲と方針査読よくある質問投稿

図書館員向け

推薦の声購読アクセスリソース図書館諮問委員会よくある質問

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments アーカイブ

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教員リソースセンター教員サイト

プライバシーポリシー

関連する概念動画

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Velocity and Position by Integral Method

Velocity and Position by Integral Method

If acceleration as a function of time is known, then velocity and position functions can be derived using integral calculus. For constant acceleration, the integral equations refer to the first and second kinematic equations for velocity and position functions, respectively.
Consider an example to calculate the velocity and position from the acceleration function. A motorboat is traveling at a constant velocity of 5.0 m/s when it starts to decelerate to arrive at the dock. Its acceleration is...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Average and Instantaneous Velocity Vectors

Average and Instantaneous Velocity Vectors

To calculate other physical quantities in kinematics, the time variable must be introduced. The time variable not only allows us to state where an object is (its position) during its motion, but also how fast it’s moving. The speed at which an object is moving is given by the rate at which the position changes with time. For each position, a particular time is assigned. If the details of the motion at each instant are not important, the rate is usually expressed as the average velocity v.

Instantaneous Velocity - I

Instantaneous Velocity - I

The average velocity during a time interval cannot tell us how fast or in what direction a particle is moving at any given time during the interval. To calculate this, it is important to know the instantaneous velocity, which is the velocity at a specific instant of time or at a specific point along the path. Instantaneous velocity is the quantity that measures how fast an object is moving along its path. In other words, the instantaneous velocity vx of an object is the limit of the average...

こちらも読む

関連記事

共著者、ジャーナル、引用グラフによってこの研究に関連する記事。

並び替え

Same author

Causal-StoNet: Causal Inference for High-Dimensional Complex Data.

... International Conference on Learning Representations·2026

Same author

Conformal Prediction in Clinical Artificial Intelligence: Enhancing Model Reliability and Interpretability.

Chest·2026

Same author

Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior.

Journal of data science : JDS·2025

Same author

Extended fiducial inference for individual treatment effects via deep neural networks.

Statistics and computing·2025

Same author

A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules.

Statistica Sinica·2025

Same author

Extended fiducial inference: toward an automated process of statistical inference.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2025

Same journal

Continual Slow-and-Fast Adaptation of Latent Neural Dynamics (CoSFan): Meta-Learning What-How & When to Adapt.

... International Conference on Learning Representations·2026

Same journal

Topology-Aware Segmentation Using Discrete Morse Theory.

... International Conference on Learning Representations·2026

Same journal

TOPODIFFUSIONNET: A TOPOLOGY-AWARE DIFFUSION MODEL.

... International Conference on Learning Representations·2026

Same journal

GEOMETRY OF LONG-TAILED REPRESENTATION LEARNING: REBALANCING FEATURES FOR SKEWED DISTRIBUTIONS.

... International Conference on Learning Representations·2026

Same journal

Probabilistic Geometric Principal Component Analysis with application to neural data.

... International Conference on Learning Representations·2026

Same journal

BRAID: Input-driven nonlinear dynamical modeling of neural-behavioral data.

... International Conference on Learning Representations·2026

関連記事をすべて見る

Search research articles

関連する実験動画

Updated: Feb 24, 2026

Tracking Rats in Operant Conditioning Chambers Using a Versatile Homemade Video Camera and DeepLabCut

Tracking Rats in Operant Conditioning Chambers Using a Versatile Homemade Video Camera and DeepLabCut

Published on: June 15, 2020

深層強化学習における高速価値追跡

Frank Shih¹, Faming Liang¹

¹Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.

... International Conference on Learning Representations

|February 23, 2026

まとめ

この要約は機械生成です。

この研究では、新しい強化学習（RL）アルゴリズムであるLangevin化カルマン時間差（LKTD）を紹介します。LKTDは、カルマンフィルタリングと確率的勾配マルコフ連鎖モンテカルロ法を活用して、深層強化学習における不確実性を定量化します。

キーワード:

強化学習不確実性定量化カルマンフィルタリング深層学習確率的勾配降下法

さらに関連する動画

Behavioral Training Procedures for Head-fixed Virtual Reality in Mice

Behavioral Training Procedures for Head-fixed Virtual Reality in Mice

Published on: September 6, 2024

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Published on: November 7, 2025

関連する実験動画

Last Updated: Feb 24, 2026

Tracking Rats in Operant Conditioning Chambers Using a Versatile Homemade Video Camera and DeepLabCut

Tracking Rats in Operant Conditioning Chambers Using a Versatile Homemade Video Camera and DeepLabCut

Published on: June 15, 2020

Behavioral Training Procedures for Head-fixed Virtual Reality in Mice

Behavioral Training Procedures for Head-fixed Virtual Reality in Mice

Published on: September 6, 2024

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Published on: November 7, 2025

科学分野:

人工知能
機械学習
制御理論

背景:

強化学習（RL）エージェントは、逐次的意思決定のために環境と相互作用する。
現在のRLアルゴリズムは、環境の確率性と不確実性定量化を見落としがちである。
静的モデルは、動的な相互作用を無視して、点推定に焦点を当てている。

研究の目的:

深層強化学習のための新しいスケーラブルなサンプリングアルゴリズムを導入する。
不確実性定量化に関する既存のRL手法の限界に対処する。
RLトレーニング中に不確実性を定量化および監視する方法を開発する。

主な方法:

カルマンフィルタリングパラダイムを活用する。
Langevin化カルマン時間差（LKTD）アルゴリズムを導入する。
ニューラルネットワークパラメータの事後サンプリングのために確率的勾配マルコフ連鎖モンテカルロ法（SGMCMC）を利用する。

主要な成果:

穏当な条件下でLKTD事後サンプルが定常分布に収束することを証明する。
価値関数とモデルパラメータの不確実性の定量化を可能にする。
深層強化学習におけるポリシー更新中に不確実性を監視することを可能にする。

結論:

LKTDアルゴリズムは、RLにおける不確実性定量化のための堅牢なアプローチを提供する。
LKTDは、より適応性があり信頼性の高い強化学習システムを促進する。
この方法は、エージェントと環境の相互作用における不確実性の理解と管理を強化する。