Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

State Space Representation01:27

State Space Representation

625
The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...
625
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

441
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
441
One-Degree-of-Freedom System01:24

One-Degree-of-Freedom System

879
In mechanical engineering, one-degree-of-freedom systems form the basis of a wide range of electrical and mechanical components. Using these models, engineers can predict the behavior of various parts in a larger system, which gives them insight into how different forces interact with each other.
A one-degree-of-freedom system is defined by an independent variable that determines its state and behavior. One example of a one-degree-of-freedom system is a simple harmonic oscillator, such as a...
879
Statically Indeterminate Problem Solving01:16

Statically Indeterminate Problem Solving

781
Statically indeterminate problems are those where statics alone can not determine the internal forces or reactions. Consider a structure comprising two cylindrical rods made of steel and brass. These rods are joined at point B and restrained by rigid supports at points A and C. Now, the reactions at points A and C and the deflection at point B are to be determined. This rod structure is classified as statically indeterminate as the structure has more supports than are necessary for maintaining...
781
Reinforcement01:23

Reinforcement

1.0K
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
1.0K
Linear Approximation in Time Domain01:21

Linear Approximation in Time Domain

384
Nonlinear systems often require sophisticated approaches for accurate modeling and analysis, with state-space representation being particularly effective. This method is especially useful for systems where variables and parameters vary with time or operating conditions, such as in a simple pendulum or a translational mechanical system with nonlinear springs.
For a simple pendulum with a mass evenly distributed along its length and the center of mass located at half the pendulum's length,...
384

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A DFT Study on the Mechanism of Selective Formation of Substituted Azepines From 1-Azabutadienes and Cyclopropanes.

Chemistry (Weinheim an der Bergstrasse, Germany)·2026
Same author

Enhancing Stability of Probabilistic Model-Based Reinforcement Learning by Adaptive Noise Filtering.

IEEE transactions on neural networks and learning systems·2026
Same author

Weber-Fechner law in temporal difference learning derived from control as inference.

Frontiers in robotics and AI·2025
Same author

Neural-enhanced motion-to-EMG: refining simulated muscle activity from musculoskeletal models using a Seq2Seq approach.

Frontiers in bioengineering and biotechnology·2025
Same author

Luteolin protects human ARPE-19 retinal pigment epithelium cells from blue light-induced phototoxicity through activation of Nrf2/Keap1 signaling.

Free radical research·2025
Same author

Practical Probabilistic Model-Based Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling.

IEEE transactions on neural networks and learning systems·2024
Same journal

Q-learning based asynchronous Boolean control networks stabilization with data loss.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

New results on prescribed-time synchronization of complex networks via intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Feb 26, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
11:54

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

5.2K

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states.

Yunduan Cui1, Takamitsu Matsubara1, Kenji Sugimoto1

  • 1Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan.

Neural Networks : the Official Journal of the International Neural Network Society
|July 22, 2017
PubMed
Summary
This summary is machine-generated.

Kernel Dynamic Policy Programming (KDPP) offers a novel reinforcement learning approach for high-dimensional systems. This method efficiently learns policies in complex environments, overcoming computational challenges in robotics.

Keywords:
Kernel methodsReinforcement learningRobot learning

Related Experiment Videos

Last Updated: Feb 26, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
11:54

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

5.2K

Area of Science:

  • Robotics
  • Artificial Intelligence
  • Machine Learning

Background:

  • Model-free reinforcement learning in high-dimensional Markov decision processes faces challenges with brittleness and computational complexity.
  • Existing value function approaches struggle to scale effectively to systems with numerous state dimensions.

Purpose of the Study:

  • To introduce a new value function approach for reinforcement learning applicable to high-dimensional state spaces.
  • To enhance the applicability of value function methods in complex robotic systems.

Main Methods:

  • Proposed Kernel Dynamic Policy Programming (KDPP), a novel algorithm for model-free reinforcement learning.
  • KDPP utilizes Kullback-Leibler divergence for smooth value function updates, stabilizing learning.
  • Employs the kernel trick for efficient value function approximation in high-dimensional state spaces.

Main Results:

  • KDPP demonstrated superior performance in a simulated n-DOF manipulator reaching task, learning a viable policy at n=40.
  • Successfully applied KDPP to a real-world robotic hand (32-dimensional state space) for unscrewing a bottle cap.
  • Achieved efficient learning with limited samples and ordinary computing resources on the robotic system.

Conclusions:

  • KDPP effectively addresses computational complexity and brittleness in high-dimensional reinforcement learning.
  • The algorithm enables practical application of value function-based reinforcement learning to complex robotic systems.
  • KDPP represents a significant advancement for reinforcement learning in real-world, high-dimensional applications.