Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Reinforcement Schedules01:24

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
Reinforcement01:23

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
Primary and Secondary Reinforcers01:23

Primary and Secondary Reinforcers

In psychology, reinforcement is a key concept in behavior modification. B.F. Skinner demonstrated this with his experiments involving rats in what is known as a Skinner box. The rats learned to press a lever to receive food, a primary reinforcer that fulfilled their innate need for nourishment.
Effective reinforcers for humans vary depending on the individual and the context. Primary reinforcers, such as food, water, sleep, shelter, and pleasure, have inherent value and satisfy basic biological...
Compensation Mechanisms01:28

Compensation Mechanisms

The human body employs intricate mechanisms to counteract changes in blood pH, preventing conditions like acidosis (pH < 7.35) and alkalosis (pH > 7.45). These compensatory responses aim to restore normal arterial blood pH by engaging respiratory or renal systems, depending on the source of the imbalance.
Respiratory Compensation
This mechanism addresses metabolic-induced pH imbalances by adjusting breathing rates. Respiratory compensation begins within minutes of detecting a pH...
Incentive Theory: Pull Theory of Motivation01:18

Incentive Theory: Pull Theory of Motivation

Incentive theory, or the "pull theory" of motivation, suggests that external rewards primarily drive behavior. Individuals are motivated to engage in activities when they anticipate a desirable outcome. This is why people often work hard for promotions or study intensively to achieve high grades. These incentives can be tangible, physical rewards such as money or promotions, or intangible, non-physical rewards like praise and social recognition.
The theory differentiates between intrinsic and...
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Consensus Paper: Models of Cerebellar Functions.

Cerebellum (London, England)·2026
Same author

Extraction of robust functional connectivity patterns across psychiatric disorders using principal component analysis-based feature selection.

Imaging neuroscience (Cambridge, Mass.)·2026
Same author

The human dorsal anterior cingulate facilitates acceptance of unfair offers and regulates inequity aversion.

PLoS biology·2026
Same author

Distinct contributions of prefrontal, parietal, and cingulate signals to exploratory decisions.

Communications biology·2026
Same author

A computational model of canonical cortical microcircuits for dynamic Bayesian inference and control as inference.

Neuroscience research·2025
Same author

Social dominance orientation underlies social valuation in a competitive social hierarchy.

Frontiers in psychology·2025
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

Related Experiment Video

Updated: May 26, 2026

The HoneyComb Paradigm for Research on Collective Human Behavior
06:48

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

MOSAIC for multiple-reward environments.

Norikazu Sugimoto1, Masahiko Haruno, Kenji Doya

  • 1Center for Information and Neural Networks, National Institute of Information and Communications Technology, Kyoto 619-0288, Japan. xsugi@nict.go.jp

Neural Computation
|December 16, 2011
PubMed
Summary
This summary is machine-generated.

This study introduces MOSAIC-MR, a novel reinforcement learning (RL) architecture designed for robots. MOSAIC-MR effectively handles complex, switching dynamics and reward functions, outperforming existing methods in simulations.

More Related Videos

Utilizing a Reconfigurable Maze System to Enhance the Reproducibility of Spatial Navigation Tests in Rodents
04:41

Utilizing a Reconfigurable Maze System to Enhance the Reproducibility of Spatial Navigation Tests in Rodents

Published on: December 2, 2022

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
09:01

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

Related Experiment Videos

Last Updated: May 26, 2026

The HoneyComb Paradigm for Research on Collective Human Behavior
06:48

The HoneyComb Paradigm for Research on Collective Human Behavior

Published on: January 19, 2019

Utilizing a Reconfigurable Maze System to Enhance the Reproducibility of Spatial Navigation Tests in Rodents
04:41

Utilizing a Reconfigurable Maze System to Enhance the Reproducibility of Spatial Navigation Tests in Rodents

Published on: December 2, 2022

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents
09:01

The Double-H Maze: A Robust Behavioral Test for Learning and Memory in Rodents

Published on: July 8, 2015

Area of Science:

  • Robotics
  • Artificial Intelligence
  • Machine Learning

Background:

  • Reinforcement learning (RL) enables autonomous robots to learn control policies for maximizing cumulative rewards in complex environments.
  • High-performance RL requires controllers to manage intricate external dynamics and task-specific reward functions.
  • Robots in dynamic tasks like games face challenges from environmental factors and implicit switching of multiple dynamics and reward functions, especially in multi-agent scenarios with unobservable goals.

Purpose of the Study:

  • To address the double complexity of dynamics and reward functions in reinforcement learning agents.
  • To extend the modular selection and identification for control (MOSAIC) framework to handle nonstationary dynamics and reward functions in RL.
  • To propose and evaluate the MOSAIC-MR architecture for improved RL agent design.

Main Methods:

  • The study extends the MOSAIC framework, which uses a forward model's prediction error to select controllers for nonstationary dynamics.
  • The proposed MOSAIC-MR architecture selects and learns RL controllers based on the temporal difference (TD) error of the RL controller, utilizing errors from both dynamics (forward model) and reward predictors.
  • Unlike prior MOSAIC variants, MOSAIC-MR allows flexible associations between RL controllers and predictors, not pre-assigned fixed pairings.

Main Results:

  • Simulation results indicate that the MOSAIC-MR architecture significantly outperforms other comparable methods.
  • The enhanced performance is attributed to MOSAIC-MR's ability to flexibly associate RL controllers with dynamics and reward predictors.
  • The flexible association capability allows for more robust adaptation to complex and switching environmental conditions.

Conclusions:

  • MOSAIC-MR provides an effective solution for reinforcement learning agents operating in environments with complex, switching dynamics and reward functions.
  • The architecture's flexible association mechanism is key to its superior performance compared to existing approaches.
  • This work advances the design of autonomous robotic systems capable of sophisticated decision-making in challenging, dynamic settings.