Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

Time-Domain Interpretation of PD Control

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...

Statically Indeterminate Problem Solving

Statically Indeterminate Problem Solving

Statically indeterminate problems are those where statics alone can not determine the internal forces or reactions. Consider a structure comprising two cylindrical rods made of steel and brass. These rods are joined at point B and restrained by rigid supports at points A and C. Now, the reactions at points A and C and the deflection at point B are to be determined. This rod structure is classified as statically indeterminate as the structure has more supports than are necessary for maintaining...

PD Controller: Design

PD Controller: Design

In automotive engineering, car suspension systems often employ Proportional Derivative (PD) controllers to enhance performance. PD controllers are utilized to adjust the damping force in response to road conditions. A controller, acting as an amplifier with a constant gain, demonstrates proportional control, with output directly mirroring input.
Designing a continuous-data controller requires selecting and linking components like adders and integrators, which are fundamental in Proportional,...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Input-to-State Safety for Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2026

Same author

Safe Physics-Informed Machine Learning for Optimal Predefined-Time Stabilization: A Lyapunov-Based Approach.

IEEE transactions on neural networks and learning systems·2025

Same author

Online and Robust Intermittent Motion Planning in Dynamic and Changing Environments.

IEEE transactions on neural networks and learning systems·2023

Same author

Cooperative Finitely Excited Learning for Dynamical Games.

IEEE transactions on cybernetics·2023

Same author

Safety-Aware Pursuit-Evasion Games in Unknown Environments Using Gaussian Processes and Finite-Time Convergent Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2022

Same author

Adaptive Neural Network Stochastic-Filter-Based Controller for Attitude Tracking With Disturbance Rejection.

IEEE transactions on neural networks and learning systems·2022

Same journal

Strategic Ability Updating in Concurrent Games by Coalitional Commitment.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2015

Same journal

Meta-Analysis of the First Facial Expression Recognition Challenge.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Adjustable model-based fusion method for multispectral and panchromatic images.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Face Feature Weighted Fusion Based on Fuzzy Membership Degree for Video Face Recognition.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

A New Adaptive Fast Cellular Automaton Neighborhood Detection and Rule Identification Algorithm.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Human-arm-and-hand-dynamic model with variability analyses for a stylus-based haptic interface.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 14, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output

F L Lewis¹, Kyriakos G Vamvoudakis

¹Automation and Robotics Research Institute, The University of Texas at Arlington, Fort Worth, TX 76118, USA. lewis@arri.uta.edu

IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication of the IEEE Systems, Man, and Cybernetics Society

|March 31, 2010

Summary

This summary is machine-generated.

Approximate dynamic programming (ADP) now uses only system input/output data for control, eliminating the need for internal state information. This output feedback (OPFB) approach simplifies control system implementation for linear dynamical systems.

Related Experiment Videos

Last Updated: Jun 14, 2026

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Area of Science:

Control Systems Engineering
Machine Learning
Reinforcement Learning

Background:

Approximate dynamic programming (ADP) is crucial for dynamical systems control but typically requires full internal state information.
Practical applications often lack complete system state knowledge, limiting ADP's use.
Output feedback (OPFB) control is an alternative but has its own challenges.

Purpose of the Study:

To develop ADP methods that utilize only measurable input/output data for controlling linear dynamical systems.
To adapt ADP for scenarios where internal system states are unobservable.
To create OPFB controllers with performance comparable to state-variable feedback.

Main Methods:

Implementation of ADP using only system input/output data, termed output feedback (OPFB).
Development of policy iteration and value iteration algorithms for OPFB.
Analysis of linear, deterministic dynamical systems, with stochastic equivalents being partially observable Markov decision processes.

Main Results:

Achieved convergence to an optimal controller using only OPFB, without needing system dynamics knowledge.
Demonstrated that only system order and an upper bound on the observability index are required.
The learned OPFB controller is a polynomial autoregressive moving-average (ARMA) controller.

Conclusions:

ADP can be effectively implemented using only input/output data, overcoming the need for internal state information.
The developed OPFB methods offer a practical approach to reinforcement learning control for linear systems.
The polynomial ARMA controller derived through this method matches the performance of optimal state variable feedback controllers.