MOSAIC for multiple-reward environments | JoVE Visualize

Area of Science:

Robotics
Artificial Intelligence
Machine Learning

Background:

Reinforcement learning (RL) enables autonomous robots to learn control policies for maximizing cumulative rewards in complex environments.
High-performance RL requires controllers to manage intricate external dynamics and task-specific reward functions.
Robots in dynamic tasks like games face challenges from environmental factors and implicit switching of multiple dynamics and reward functions, especially in multi-agent scenarios with unobservable goals.

Purpose of the Study:

To address the double complexity of dynamics and reward functions in reinforcement learning agents.
To extend the modular selection and identification for control (MOSAIC) framework to handle nonstationary dynamics and reward functions in RL.
To propose and evaluate the MOSAIC-MR architecture for improved RL agent design.

Main Methods:

The study extends the MOSAIC framework, which uses a forward model's prediction error to select controllers for nonstationary dynamics.
The proposed MOSAIC-MR architecture selects and learns RL controllers based on the temporal difference (TD) error of the RL controller, utilizing errors from both dynamics (forward model) and reward predictors.
Unlike prior MOSAIC variants, MOSAIC-MR allows flexible associations between RL controllers and predictors, not pre-assigned fixed pairings.

Main Results:

Simulation results indicate that the MOSAIC-MR architecture significantly outperforms other comparable methods.
The enhanced performance is attributed to MOSAIC-MR's ability to flexibly associate RL controllers with dynamics and reward predictors.
The flexible association capability allows for more robust adaptation to complex and switching environmental conditions.

Conclusions:

MOSAIC-MR provides an effective solution for reinforcement learning agents operating in environments with complex, switching dynamics and reward functions.
The architecture's flexible association mechanism is key to its superior performance compared to existing approaches.
This work advances the design of autonomous robotic systems capable of sophisticated decision-making in challenging, dynamic settings.