Influence-aware memory architectures for deep reinforcement learning in POMDPs | JoVE Visualize

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Agents often face perceptual limitations, leading to insufficient environmental information for optimal decision-making.
Recurrent Neural Networks (RNNs) are used in deep reinforcement learning to memorize past observations but face training and convergence challenges with high-dimensional data.
Partial observability in environments necessitates effective methods for agents to infer hidden state information from action-observation histories.

Purpose of the Study:

To propose a novel memory architecture, influence-aware memory, to address the training difficulties and performance limitations of standard RNNs in deep reinforcement learning.
To enhance the agent's ability to uncover hidden state information despite perceptual limitations.
To improve training speed, policy performance, and runtime efficiency compared to existing methods.

Main Methods:

Developed an influence-aware memory architecture that restricts recurrent layer inputs to variables influencing hidden state information.
Integrated a feedforward neural network to process non-influential observation variables.
Allowed information flow without mandatory storage in the RNN's internal memory, differing from standard RNN feedback mechanisms.

Main Results:

The influence-aware memory architecture significantly outperformed standard recurrent architectures in both training speed and policy performance.
The proposed method demonstrated reduced runtime compared to conventional approaches.
Achieved better performance scores than methods that stack multiple observations to mitigate partial observability.

Conclusions:

Influence-aware memory provides a theoretically inspired and effective solution for handling partial observability in deep reinforcement learning.
By enabling recurrent layers to focus on critical variables, the approach enhances learning efficiency and agent performance.
This architecture offers a promising direction for developing more capable and efficient intelligent agents in complex environments.