Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Multi-omics profiling reveals EMT-driven fibroblast activation in the renal injury niche.

Cellular and molecular life sciences : CMLS·2026

Same author

Effects of macro- and micronutrient intake on bone mineral density, osteoporotic fracture risk, inflammation, and functional rehabilitation outcomes in orthopedic patients: a systematic review and meta-analysis.

Frontiers in nutrition·2026

Same author

A Survey on Vision-Language-Action Models for Embodied AI.

IEEE transactions on neural networks and learning systems·2026

Same author

Signal similarity-informed generative adversarial network for prediction of basal wetness conditions in Antarctica: a case study in the AGAP region.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same author

BDNF insufficiency exacerbates ALS progression.

Cell reports. Medicine·2026

Same author

DualGPT-AB: a dual-stage generative optimization framework for therapeutic antibody design.

Nature computational science·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jun 25, 2025

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

强大的多目标强化学习考虑到环境的不确定性

Xiangkun He, Jianye Hao, Xu Chen

IEEE transactions on neural networks and learning systems

|May 23, 2024

概括

此摘要是机器生成的。

本研究引入了强大的多目标强化学习 (RMORL),以解决决策中的环境不确定性. RMORL培养了一个单一的模型,用于强大的帕雷托最佳政策,在复杂的场景中提高绩效.

更多相关视频

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Published on: December 9, 2012

相关实验视频

Last Updated: Jun 25, 2025

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents

Published on: September 10, 2018

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Investigating Motor Skill Learning Processes with a Robotic Manipulandum

Published on: February 12, 2017

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Spatial Multiobjective Optimization of Agricultural Conservation Practices using a SWAT Model and an Evolutionary Algorithm

Published on: December 9, 2012

科学领域:

人工智能的人工智能
机器学习机器学习
优化优化优化优化

背景情况:

现实世界中的问题往往涉及到多个相互冲突的目标,需要对偏好进行权衡.
环境的不确定性,如变化或噪音,可以导致低于最佳的政策,尽管目标是帕雷托最佳性.

研究的目的:

提出一种新的,强大的多目标强化学习 (RMORL) 范式.
培养一个能够在不同偏好空间中接近强大的帕雷托最佳政策的单一模型.

主要方法:

模拟环境干扰作为零和游戏中的对抗性代理,集成到多目标马尔科夫决策过程 (MOMDP) 中.
开发了一种对抗性防御技术,以限制在特定偏好的政策变化下对观察性扰动进行防御.

主要成果:

拟议的RMORL技术在五个具有连续行动空间的多目标环境中进行了评估.
通过与经典和最先进的基线方法进行比较来证明有效性.

结论:

实际上,RMORL有效地提高了对环境不确定性和观测干扰的政策稳定性.
该方法使单一模型能够在整个偏好空间中实现强大的帕雷托最佳性.