Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Reinforcement learning state estimator.

Jun Morimoto1, Kenji Doya

  • 1JST, ICORP, Computational Brain Project, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012 Japan. xmorimo@atr.jp

Neural Computation
|February 15, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Towards natural stand-up movement support: guiding higher-dimensional muscle activation using a Lower-DOF assistive chair.

Frontiers in bioengineering and biotechnology·2026
Same author

Rapid functional reorganization of the targeted contralesional hemisphere induced by one week of noninvasive closed-loop neurofeedback guides motor recovery in post-stroke patients with chronic motor impairment: a phase I trial.

Communications medicine·2026
Same author

A computational model of canonical cortical microcircuits for dynamic Bayesian inference and control as inference.

Neuroscience research·2025
Same author

Dynamical modeling of torso stability in running via hip-knee three pairs of six springs.

Bioinspiration & biomimetics·2025
Same author

Neural-enhanced motion-to-EMG: refining simulated muscle activity from musculoskeletal models using a Seq2Seq approach.

Frontiers in bioengineering and biotechnology·2025
Same author

Possible contribution to data-driven primate research: Comment on "Kinematic coding: Measuring information in naturalistic behaviour" by Becchio, Pullar, Scaliti, and Panzeri.

Physics of life reviews·2025
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

This study introduces reinforcement learning for estimating hidden states in nonlinear systems by using delayed penalties. The method successfully estimates pendulum dynamics and controllers, even learning system dynamics simultaneously.

Area of Science:

  • Robotics
  • Control Theory
  • Machine Learning

Background:

  • Estimating hidden variables in nonlinear dynamical systems is challenging due to unobservable errors.
  • Traditional methods struggle with unobserved states, limiting applications in complex systems.

Purpose of the Study:

  • To propose a novel reinforcement learning (RL) framework for hidden state and parameter estimation in nonlinear dynamical systems.
  • To address the challenge of unobservable estimation errors by reformulating them as delayed penalties.

Main Methods:

  • Developed a method to construct nonlinear state estimators using RL, specifically the policy gradient method, to find optimal feedback input gains.
  • Applied a delayed penalty approach to observable variable errors, enabling the use of RL for state estimation.

Related Experiment Videos

  • Simultaneously trained a swing-up controller and a state estimator for pendulum dynamics.
  • Main Results:

    • Successfully estimated hidden variables (joint angle and angular velocity) for a single pendulum system by observing only one variable.
    • Demonstrated simultaneous acquisition of a state estimator and a swing-up controller for pendulum dynamics using RL.
    • Showcased the ability to estimate the pendulum's dynamics concurrently with hidden variable estimation during the swing-up task.
    • Extended the application to a two-linked biped model, validating the method's versatility.

    Conclusions:

    • Reinforcement learning offers a viable and effective approach for nonlinear state estimation, even with unobservable errors.
    • The proposed method allows for simultaneous learning of state estimation and control policies, enhancing system performance.
    • The technique shows promise for complex robotic systems, including bipedal locomotion, by enabling dynamic system identification and state estimation.