HPRS: hierarchical potential-based reward shaping from task specifications | JoVE Visualize

Area of Science:

Robotics
Artificial Intelligence
Control Theory

Background:

Reinforcement learning (RL) for robotics policy synthesis heavily depends on reward signals.
Current methods struggle to generate effective reward signals that satisfy diverse, high-level requirements.
Automated reward definition from formal requirements is an active research area with existing limitations.

Purpose of the Study:

To introduce an automated methodology for generating reward signals that accurately reflect hierarchical task requirements.
To develop a novel approach, hierarchical, potential-based reward shaping (HPRS), for creating effective and multi-objective reward functions.
To demonstrate HPRS's capability in producing policies that satisfy complex safety, target, and comfort requirements.

Main Methods:

Defining tasks as partially ordered sets of safety, target, and comfort requirements.
Automatically translating these requirements into a hierarchical reward structure where rewards are functions of each other.
Employing potential-based reward shaping to convert sparse rewards into dense rewards while preserving policy optimality.
Conducting experiments on eight robotics benchmarks and two sim-to-real F1TENTH vehicle applications.

Main Results:

HPRS successfully generates policies that satisfy complex hierarchical requirements across various robotics benchmarks.
Compared to state-of-the-art methods, HPRS demonstrates faster convergence and superior performance using the rank-preserving policy-assessment metric.
Ablation studies reveal that HPRS effectively utilizes comfort requirements when aligned with safety and target goals, and disregards them when in conflict.
Sim-to-real experiments show that HPRS facilitates domain transfer without requiring manual parameter tuning or adaptation.

Conclusions:

HPRS offers an effective automated approach for synthesizing robotics policies that meet complex, hierarchical requirements.
The method enhances reward signal quality, leading to improved training efficiency and policy performance.
HPRS simplifies the design process by automatically balancing competing objectives and shows practical viability in real-world robotics.
Hierarchical task specification design aids in robust sim-to-real transfer for robotics applications.