Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Reinforcement learning in continuous time and space.

K Doya1

  • 1ATR Human Information Processing Research Laboratories, Soraku, Kyoto 619-0288, Japan.

Neural Computation
|January 15, 2000
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Expected energy-based restricted Boltzmann machine for classification.

Neural networks : the official journal of the International Neural Network Society·2014
Same author

Humans can adopt optimal discounting strategy under real-time constraints.

PLoS computational biology·2006
Same author

fMRI investigation of cortical and subcortical networks in the learning of abstract and effector-specific representations of motor sequences.

NeuroImage·2006
Same author

What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?

Neural networks : the official journal of the International Neural Network Society·2003
Same author

Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach.

Journal of cognitive neuroscience·2001
Same author

Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control.

Neuroscience·2001
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

This study introduces a novel reinforcement learning framework for continuous-time systems, enhancing policy improvement and value function estimation. Continuous actor-critic methods significantly outperform discrete ones for complex control tasks.

Area of Science:

  • Reinforcement Learning
  • Control Theory
  • Dynamical Systems

Background:

  • Continuous-time dynamical systems pose challenges for traditional discrete-time reinforcement learning.
  • The Hamilton-Jacobi-Bellman (HJB) equation provides a foundation for infinite-horizon optimal control problems.

Purpose of the Study:

  • To develop a reinforcement learning framework for continuous-time systems without prior discretization.
  • To derive algorithms for value function estimation and policy improvement based on the HJB equation.

Main Methods:

  • Formulating value function estimation as minimizing a continuous-time temporal difference (TD) error.
  • Developing continuous actor-critic and value-gradient-based policy improvement methods.
  • Utilizing backward Euler approximation and exponential eligibility traces for updates.

Related Experiment Videos

Main Results:

  • Continuous actor-critic methods achieved faster learning than discrete counterparts in pendulum swing-up tasks.
  • Value-gradient-based policies demonstrated superior performance compared to actor-critic methods.
  • Exponential eligibility traces offered more efficient and stable value function updates than Euler approximation.

Conclusions:

  • The proposed framework effectively addresses continuous-time reinforcement learning challenges.
  • Continuous methods, particularly value-gradient-based policies, show significant advantages in complex control scenarios.
  • The framework is validated on both simple (pendulum) and complex (cart-pole) dynamical systems.