Input-to-State Safety for Reinforcement Learning | JoVE Visualize

Area of Science:

Control Theory
Machine Learning
Robotics

Background:

Reinforcement learning (RL) often struggles with safety guarantees in real-world systems, especially under input constraints.
Ensuring safety during exploration and learning is critical for deploying RL in safety-critical applications like robotics and autonomous systems.
Input saturation in dynamical systems poses significant challenges to traditional control and learning methods.

Purpose of the Study:

To develop a novel off-policy, safe reinforcement learning (RL) approach for nonlinear dynamical systems operating under input saturation.
To guarantee safe initialization, exploration, and learning of optimal control laws within system constraints.
To rigorously establish the safety, optimality, and stability properties of the proposed RL framework.

Main Methods:

Formulating safe exploration as a robust control problem using input-to-state safe control barrier functions (ISSf-CBFs) to define an enlarged safe set.
Proposing a novel epsilon-tuning law for adaptive safety constraint enforcement, encouraging exploration near boundaries while maintaining set invariance.
Incorporating a safety-aware cost function and developing a novel off-policy equation under input saturation for learning optimal control laws using neural networks.

Main Results:

The proposed $\epsilon $-tuning law effectively manages exploration noise, enabling efficient state-space exploration without compromising system safety.
The framework guarantees safe learning of optimal control laws even under input saturation limits.
Mathematical rigor is applied to establish novel safety, optimality, and stability properties of the off-policy safe RL approach.

Conclusions:

The developed off-policy safe reinforcement learning framework effectively addresses safety challenges in nonlinear dynamical systems with input saturation.
The approach enables safe initialization, exploration, and learning of control policies, demonstrating high efficacy through simulations.
This work provides a robust method for applying RL to safety-critical systems where input constraints are present.