Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

One-Degree-of-Freedom System

One-Degree-of-Freedom System

In mechanical engineering, one-degree-of-freedom systems form the basis of a wide range of electrical and mechanical components. Using these models, engineers can predict the behavior of various parts in a larger system, which gives them insight into how different forces interact with each other.
A one-degree-of-freedom system is defined by an independent variable that determines its state and behavior. One example of a one-degree-of-freedom system is a simple harmonic oscillator, such as a...

Statically Indeterminate Problem Solving

Statically Indeterminate Problem Solving

Statically indeterminate problems are those where statics alone can not determine the internal forces or reactions. Consider a structure comprising two cylindrical rods made of steel and brass. These rods are joined at point B and restrained by rigid supports at points A and C. Now, the reactions at points A and C and the deflection at point B are to be determined. This rod structure is classified as statically indeterminate as the structure has more supports than are necessary for maintaining...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Linear Approximation in Time Domain

Linear Approximation in Time Domain

Nonlinear systems often require sophisticated approaches for accurate modeling and analysis, with state-space representation being particularly effective. This method is especially useful for systems where variables and parameters vary with time or operating conditions, such as in a simple pendulum or a translational mechanical system with nonlinear springs.
For a simple pendulum with a mass evenly distributed along its length and the center of mass located at half the pendulum's length,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A DFT Study on the Mechanism of Selective Formation of Substituted Azepines From 1-Azabutadienes and Cyclopropanes.

Chemistry (Weinheim an der Bergstrasse, Germany)·2026

Same author

Enhancing Stability of Probabilistic Model-Based Reinforcement Learning by Adaptive Noise Filtering.

IEEE transactions on neural networks and learning systems·2026

Same author

Weber-Fechner law in temporal difference learning derived from control as inference.

Frontiers in robotics and AI·2025

Same author

Neural-enhanced motion-to-EMG: refining simulated muscle activity from musculoskeletal models using a Seq2Seq approach.

Frontiers in bioengineering and biotechnology·2025

Same author

Luteolin protects human ARPE-19 retinal pigment epithelium cells from blue light-induced phototoxicity through activation of Nrf2/Keap1 signaling.

Free radical research·2025

Same author

Practical Probabilistic Model-Based Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling.

IEEE transactions on neural networks and learning systems·2024

Same journal

Q-learning based asynchronous Boolean control networks stabilization with data loss.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

New results on prescribed-time synchronization of complex networks via intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 26, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states.

Yunduan Cui¹, Takamitsu Matsubara¹, Kenji Sugimoto¹

¹Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan.

Neural Networks : the Official Journal of the International Neural Network Society

|July 22, 2017

Summary

This summary is machine-generated.

Kernel Dynamic Policy Programming (KDPP) offers a novel reinforcement learning approach for high-dimensional systems. This method efficiently learns policies in complex environments, overcoming computational challenges in robotics.

Keywords:

Kernel methods Reinforcement learning Robot learning

Related Experiment Videos

Last Updated: Feb 26, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Area of Science:

Robotics
Artificial Intelligence
Machine Learning

Background:

Model-free reinforcement learning in high-dimensional Markov decision processes faces challenges with brittleness and computational complexity.
Existing value function approaches struggle to scale effectively to systems with numerous state dimensions.

Purpose of the Study:

To introduce a new value function approach for reinforcement learning applicable to high-dimensional state spaces.
To enhance the applicability of value function methods in complex robotic systems.

Main Methods:

Proposed Kernel Dynamic Policy Programming (KDPP), a novel algorithm for model-free reinforcement learning.
KDPP utilizes Kullback-Leibler divergence for smooth value function updates, stabilizing learning.
Employs the kernel trick for efficient value function approximation in high-dimensional state spaces.

Main Results:

KDPP demonstrated superior performance in a simulated n-DOF manipulator reaching task, learning a viable policy at n=40.
Successfully applied KDPP to a real-world robotic hand (32-dimensional state space) for unscrewing a bottle cap.
Achieved efficient learning with limited samples and ordinary computing resources on the robotic system.

Conclusions:

KDPP effectively addresses computational complexity and brittleness in high-dimensional reinforcement learning.
The algorithm enables practical application of value function-based reinforcement learning to complex robotic systems.
KDPP represents a significant advancement for reinforcement learning in real-world, high-dimensional applications.