Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
Time-Domain Interpretation of PD Control01:07

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...
Statically Indeterminate Problem Solving01:16

Statically Indeterminate Problem Solving

Statically indeterminate problems are those where statics alone can not determine the internal forces or reactions. Consider a structure comprising two cylindrical rods made of steel and brass. These rods are joined at point B and restrained by rigid supports at points A and C. Now, the reactions at points A and C and the deflection at point B are to be determined. This rod structure is classified as statically indeterminate as the structure has more supports than are necessary for maintaining...
PD Controller: Design01:26

PD Controller: Design

In automotive engineering, car suspension systems often employ Proportional Derivative (PD) controllers to enhance performance. PD controllers are utilized to adjust the damping force in response to road conditions. A controller, acting as an amplifier with a constant gain, demonstrates proportional control, with output directly mirroring input.
Designing a continuous-data controller requires selecting and linking components like adders and integrators, which are fundamental in Proportional,...
Reinforcement Schedules01:24

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Input-to-State Safety for Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2026
Same author

Safe Physics-Informed Machine Learning for Optimal Predefined-Time Stabilization: A Lyapunov-Based Approach.

IEEE transactions on neural networks and learning systems·2025
Same author

Online and Robust Intermittent Motion Planning in Dynamic and Changing Environments.

IEEE transactions on neural networks and learning systems·2023
Same author

Cooperative Finitely Excited Learning for Dynamical Games.

IEEE transactions on cybernetics·2023
Same author

Safety-Aware Pursuit-Evasion Games in Unknown Environments Using Gaussian Processes and Finite-Time Convergent Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2022
Same author

Adaptive Neural Network Stochastic-Filter-Based Controller for Attitude Tracking With Disturbance Rejection.

IEEE transactions on neural networks and learning systems·2022
Same journal

Strategic Ability Updating in Concurrent Games by Coalitional Commitment.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2015
Same journal

Meta-Analysis of the First Facial Expression Recognition Challenge.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
Same journal

Adjustable model-based fusion method for multispectral and panchromatic images.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
Same journal

Face Feature Weighted Fusion Based on Fuzzy Membership Degree for Video Face Recognition.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
Same journal

A New Adaptive Fast Cellular Automaton Neighborhood Detection and Rule Identification Algorithm.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
Same journal

Human-arm-and-hand-dynamic model with variability analyses for a stylus-based haptic interface.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
See all related articles

Related Experiment Video

Updated: Jun 14, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
11:54

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output

F L Lewis1, Kyriakos G Vamvoudakis

  • 1Automation and Robotics Research Institute, The University of Texas at Arlington, Fort Worth, TX 76118, USA. lewis@arri.uta.edu

IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication of the IEEE Systems, Man, and Cybernetics Society
|March 31, 2010
PubMed
Summary
This summary is machine-generated.

Approximate dynamic programming (ADP) now uses only system input/output data for control, eliminating the need for internal state information. This output feedback (OPFB) approach simplifies control system implementation for linear dynamical systems.

Related Experiment Videos

Last Updated: Jun 14, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface
11:54

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Area of Science:

  • Control Systems Engineering
  • Machine Learning
  • Reinforcement Learning

Background:

  • Approximate dynamic programming (ADP) is crucial for dynamical systems control but typically requires full internal state information.
  • Practical applications often lack complete system state knowledge, limiting ADP's use.
  • Output feedback (OPFB) control is an alternative but has its own challenges.

Purpose of the Study:

  • To develop ADP methods that utilize only measurable input/output data for controlling linear dynamical systems.
  • To adapt ADP for scenarios where internal system states are unobservable.
  • To create OPFB controllers with performance comparable to state-variable feedback.

Main Methods:

  • Implementation of ADP using only system input/output data, termed output feedback (OPFB).
  • Development of policy iteration and value iteration algorithms for OPFB.
  • Analysis of linear, deterministic dynamical systems, with stochastic equivalents being partially observable Markov decision processes.

Main Results:

  • Achieved convergence to an optimal controller using only OPFB, without needing system dynamics knowledge.
  • Demonstrated that only system order and an upper bound on the observability index are required.
  • The learned OPFB controller is a polynomial autoregressive moving-average (ARMA) controller.

Conclusions:

  • ADP can be effectively implemented using only input/output data, overcoming the need for internal state information.
  • The developed OPFB methods offer a practical approach to reinforcement learning control for linear systems.
  • The polynomial ARMA controller derived through this method matches the performance of optimal state variable feedback controllers.