Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Sampling Continuous Time Signal

Sampling Continuous Time Signal

In signal processing, a continuous-time signal can be sampled using an impulse-train sampling technique, followed by the zero-order hold method. Impulse-train sampling involves the use of a periodic impulse train, which consists of a series of delta functions spaced at regular intervals determined by the sampling period. When a continuous-time signal is multiplied by this impulse train, it generates impulses with amplitudes corresponding to the signal's values at the sampling points.
In the...

Entropy Change in Reversible Processes

Entropy Change in Reversible Processes

In the Carnot engine, which achieves the maximum efficiency between two reservoirs of fixed temperatures, the total change in entropy is zero. The observation can be generalized by considering any reversible cyclic process consisting of many Carnot cycles. Thus, it can be stated that the total entropy change of any ideal reversible cycle is zero.
The statement can be further generalized to prove that entropy is a state function. Take a cyclic process between any two points on a p-V diagram.

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Flood regime and dam operation jointly determine cyanobacterial dynamics in deep oligotrophic-mesotrophic reservoirs.

Water research·2026

Same author

Brassica Yellows Virus and Turnip Mosaic Virus: Asymmetric Interaction in the Mixed Infections in <i>Nicotiana benthamiana</i> and <i>Brassica napus</i> L.

Phytopathology·2026

Same author

Enhancing Stability of Probabilistic Model-Based Reinforcement Learning by Adaptive Noise Filtering.

IEEE transactions on neural networks and learning systems·2026

Same author

Author Correction: Role of the real first interface in regulating ionic signal of nanochannels.

Nature communications·2026

Same author

Lipoprotein(a): structural basis, bidirectional risk, and therapeutic frontiers.

Journal of clinical biochemistry and nutrition·2026

Same author

Nitrate sources and transformations in a river-reservoir system: Response to extreme flooding and various land use.

Journal of hydrology·2025

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jul 11, 2025

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

相对度规范化样本-高效增强学习与连续行动.

Zhiwei Shang, Renxing Li, Chunhua Zheng

IEEE transactions on neural networks and learning systems

|November 9, 2023

概括

此摘要是机器生成的。

一种新的强化学习 (RL) 方法,持续动态政策编程 (CDPP),提高了持续行动的学习稳定性和样本效率. 它使用相对调节来更好地探索和更新复杂任务中的政策.

更多相关视频

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

相关实验视频

Last Updated: Jul 11, 2025

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

科学领域:

人工智能的人工智能
机器学习机器学习
机器人技术机器人技术机器人技术

背景情况:

当前的强化学习 (RL) 方法在连续行动空间中难以获得学习稳定性和样本效率.
现有的关键行为体 (AC) 框架,如深度决定性政策梯度 (DDPG),在持续控制任务中面临挑战.

研究的目的:

引入一种新的RL方法,即持续动态政策编程 (CDPP),以解决持续行动RL中的稳定性和效率问题.
通过整合相对调整以提高性能来增强演员-关键框架.

主要方法:

扩展的相对调整从基于价值的到演员关键 (AC) 框架,特别是 DDPG.
采用蒙特卡洛估计来处理难以处理的软max操作,而不是连续操作.
利用了Mellowmax运算符,并引入了用于指导演员探索的博尔兹曼采样策略.

主要成果:

证明了相对调节对勘探行为和政策更新在连续行动RL中的积极影响.
与基线方法相比,实现了卓越的学习能力,探索效率和稳定性.
通过基准和真实机器人模拟任务验证了方法.

结论:

通过持续的行动,CDPP显著提高了RL中的样本效率和学习稳定性.
相对调整和博尔兹曼抽样的整合为复杂的控制问题提供了强大的解决方案.