Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Kernel-based least squares policy iteration for reinforcement learning.

Xin Xu¹, Dewen Hu, Xicheng Lu

¹Institute of Automation, College of Mechatronics and Automation, National University of Defense Technology, Changsha 410073, P. R. China. xuxin_mail@263.net

IEEE Transactions on Neural Networks

|August 3, 2007

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations.

Nature communications·2026

Same author

Dual controllability de-differentiation of functional brain networks in major depressive disorder: Insights from large-scale neuroimaging and transcriptomic integration.

Journal of affective disorders·2026

Same author

Functional connectivity-based classification and subtyping of major depression for precision mental health: An ensemble graph neural network approach.

PLOS digital health·2026

Same author

Multi-Site Transfer Classification of Major Depressive Disorder: An fMRI Study in 3335 Subjects.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2026

Same author

Transfer learning from 2D natural images to 4D fMRI brain images via geometric mapping.

Medical image analysis·2026

Same author

Classifying Major Depressive Disorder Using Multimodal MRI Data: A Personalized Federated Algorithm.

Brain sciences·2025

Same journal

Universal perceptron and DNA-like learning algorithm for binary neural networks: LSBF and PBF implementations.

IEEE transactions on neural networks·2013

Same journal

Guest editorial: special section on white box nonlinear prediction models.

IEEE transactions on neural networks·2011

Same journal

Data-based fault-tolerant control of high-speed trains with traction/braking notch nonlinearities and actuator failures.

IEEE transactions on neural networks·2011

Same journal

Guest editorial: special section on data-based control, modeling, and optimization.

IEEE transactions on neural networks·2011

Same journal

Neural network-based multiple robot simultaneous localization and mapping.

IEEE transactions on neural networks·2011

Same journal

Data-driven model-free adaptive control for a class of MIMO nonlinear discrete-time systems.

IEEE transactions on neural networks·2011

See all related articles

A new kernel-based least squares policy iteration (KLSPI) algorithm enhances reinforcement learning (RL) for complex systems. KLSPI offers improved control policies and automatic feature selection, outperforming existing methods in efficiency and quality.

Area of Science:

Machine Learning
Control Systems
Artificial Intelligence

Background:

Reinforcement learning (RL) faces challenges in large or continuous state spaces.
Adaptive feedback control of uncertain dynamic systems requires efficient policy optimization.
Existing approximate RL methods struggle with convergence and feature selection.

Purpose of the Study:

Introduce a kernel-based least squares policy iteration (KLSPI) algorithm for RL.
Enable adaptive feedback control for uncertain dynamic systems with minimal a priori knowledge.
Improve convergence, optimality guarantees, and generalization ability in RL algorithms.

Main Methods:

Developed a kernel-based least squares policy iteration (KLSPI) algorithm.
Proposed a kernel-based least squares temporal-difference algorithm (KLSTD-Q) for policy evaluation.

Related Experiment Videos

Implemented kernel sparsification using approximate linear dependency (ALD) for feature selection and generalization.

Main Results:

KLSPI achieves near-optimal control policies with high precision and convergence guarantees.
ALD-based kernel sparsification enables automatic feature selection.
Experiments show KLSPI outperforms traditional LSPI in learning efficiency and policy quality on stochastic chain problems.
KLSPI demonstrates effectiveness in nonlinear control tasks like ship heading and acrobot swing-up control.

Conclusions:

KLSPI offers a general RL method with enhanced generalization and convergence for large-scale Markov decision problems.
The algorithm optimizes controller performance with limited information on uncertain systems.
KLSPI is applicable to online learning control scenarios.