Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization | JoVE Visualize

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Multiagent reinforcement learning (MARL) is utilized for coordination optimization due to its scalability and task distribution capabilities.
Existing MARL convergence results are largely limited to repeated games, neglecting adaptation to dynamic environments.
Few MARL algorithms address environmental shifts, such as fluctuating traffic or unexpected obstacles for automated guided vehicles.

Purpose of the Study:

To propose a novel cooperative MARL algorithm, adaptive individual Q-learning (A-IQL), designed for adaptation to switched environments.
To analyze the convergence properties of A-IQL in stochastic games with chronologically ordered deterministic state transitions.
To investigate the impact of the update period (T) on A-IQL's convergence.

Main Methods:

The adaptive individual Q-learning (A-IQL) algorithm is proposed, where each agent updates its Q-function with a period T.
Convergence analysis is performed for stochastic games with deterministic state transitions in chronological order.
A fictitious stochastic game is used to study the influence of period T on convergence.
The algorithm's efficacy is validated through simulations in two distinct switched environments: distributed sensor network (DSN) and target transportation tasks.

Main Results:

A-IQL demonstrates the ability to learn optimal joint strategies in stochastic games with specific transition properties.
The study analyzes the relationship between the update period T and the algorithm's convergence behavior.
Empirical validation confirms A-IQL's effectiveness in dynamic scenarios, including DSN and target transportation tasks.

Conclusions:

The proposed A-IQL algorithm offers a viable solution for coordination optimization in multiagent systems facing dynamic and switched environments.
A-IQL provides a framework for agents to adapt their strategies effectively, enhancing overall system performance.
The findings highlight the importance of adaptive mechanisms in MARL for real-world applications.