Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Kernel-based least squares policy iteration for reinforcement learning.

Xin Xu1, Dewen Hu, Xicheng Lu

  • 1Institute of Automation, College of Mechatronics and Automation, National University of Defense Technology, Changsha 410073, P. R. China. xuxin_mail@263.net

IEEE Transactions on Neural Networks
|August 3, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations.

Nature communications·2026
Same author

Dual controllability de-differentiation of functional brain networks in major depressive disorder: Insights from large-scale neuroimaging and transcriptomic integration.

Journal of affective disorders·2026
Same author

Functional connectivity-based classification and subtyping of major depression for precision mental health: An ensemble graph neural network approach.

PLOS digital health·2026
Same author

Multi-Site Transfer Classification of Major Depressive Disorder: An fMRI Study in 3335 Subjects.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2026
Same author

Transfer learning from 2D natural images to 4D fMRI brain images via geometric mapping.

Medical image analysis·2026
Same author

Classifying Major Depressive Disorder Using Multimodal MRI Data: A Personalized Federated Algorithm.

Brain sciences·2025
Same journal

Universal perceptron and DNA-like learning algorithm for binary neural networks: LSBF and PBF implementations.

IEEE transactions on neural networks·2013
Same journal

Guest editorial: special section on white box nonlinear prediction models.

IEEE transactions on neural networks·2011
Same journal

Data-based fault-tolerant control of high-speed trains with traction/braking notch nonlinearities and actuator failures.

IEEE transactions on neural networks·2011
Same journal

Guest editorial: special section on data-based control, modeling, and optimization.

IEEE transactions on neural networks·2011
Same journal

Neural network-based multiple robot simultaneous localization and mapping.

IEEE transactions on neural networks·2011
Same journal

Data-driven model-free adaptive control for a class of MIMO nonlinear discrete-time systems.

IEEE transactions on neural networks·2011
See all related articles

A new kernel-based least squares policy iteration (KLSPI) algorithm enhances reinforcement learning (RL) for complex systems. KLSPI offers improved control policies and automatic feature selection, outperforming existing methods in efficiency and quality.

Area of Science:

  • Machine Learning
  • Control Systems
  • Artificial Intelligence

Background:

  • Reinforcement learning (RL) faces challenges in large or continuous state spaces.
  • Adaptive feedback control of uncertain dynamic systems requires efficient policy optimization.
  • Existing approximate RL methods struggle with convergence and feature selection.

Purpose of the Study:

  • Introduce a kernel-based least squares policy iteration (KLSPI) algorithm for RL.
  • Enable adaptive feedback control for uncertain dynamic systems with minimal a priori knowledge.
  • Improve convergence, optimality guarantees, and generalization ability in RL algorithms.

Main Methods:

  • Developed a kernel-based least squares policy iteration (KLSPI) algorithm.
  • Proposed a kernel-based least squares temporal-difference algorithm (KLSTD-Q) for policy evaluation.

Related Experiment Videos

  • Implemented kernel sparsification using approximate linear dependency (ALD) for feature selection and generalization.
  • Main Results:

    • KLSPI achieves near-optimal control policies with high precision and convergence guarantees.
    • ALD-based kernel sparsification enables automatic feature selection.
    • Experiments show KLSPI outperforms traditional LSPI in learning efficiency and policy quality on stochastic chain problems.
    • KLSPI demonstrates effectiveness in nonlinear control tasks like ship heading and acrobot swing-up control.

    Conclusions:

    • KLSPI offers a general RL method with enhanced generalization and convergence for large-scale Markov decision problems.
    • The algorithm optimizes controller performance with limited information on uncertain systems.
    • KLSPI is applicable to online learning control scenarios.