Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Linear Approximation in Time Domain

Linear Approximation in Time Domain

Nonlinear systems often require sophisticated approaches for accurate modeling and analysis, with state-space representation being particularly effective. This method is especially useful for systems where variables and parameters vary with time or operating conditions, such as in a simple pendulum or a translational mechanical system with nonlinear springs.
For a simple pendulum with a mass evenly distributed along its length and the center of mass located at half the pendulum's length,...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Linear Approximation in Frequency Domain

Linear Approximation in Frequency Domain

Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

Time-Domain Interpretation of PD Control

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...

Second Order systems II

Second Order systems II

In an underdamped second-order system, where the damping ratio ζ is between 0 and 1, a unit-step input results in a transfer function that, when transformed using the inverse Laplace method, reveals the output response. The output exhibits a damped sinusoidal oscillation, and the difference between the input and output is termed the error signal. This error signal also demonstrates damped oscillatory behavior. Eventually, as the system reaches a steady state, the error diminishes to zero.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Herpes zoster as a vaccine-preventable risk factor increases the risk of dementia: A nested case-control study in Chinese population.

Human vaccines & immunotherapeutics·2026

Same author

Soft sensor-driven spatiotemporal-periodic synergistic predictive control for blast furnace gas flow.

ISA transactions·2026

Same author

Mitochondrial mGPDH Modulates Fibroblast Function in Diabetic Wound Healing via the SIRT1-c-Myc-TGF-β1 Axis.

Diabetes·2025

Same author

TNK2 promotes the EMT proliferation and invasion of esophageal squamous cell carcinoma by enhancing FOXO1 through the AKT pathway.

International immunopharmacology·2025

Same author

High-throughput atomic force microscopy measurements reveal mechanical signatures of cell mixtures for liquid biopsy.

Nanoscale·2025

Same author

Correction: Association of human breast cancer CD44<sup>-</sup>/CD24<sup>-</sup> cells with delayed distant metastasis.

eLife·2025

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 25, 2025

Design and Application of a Fault Detection Method Based on Adaptive Filters and Rotational Speed Estimation for an Electro-Hydrostatic Actuator

Design and Application of a Fault Detection Method Based on Adaptive Filters and Rotational Speed Estimation for an Electro-Hydrostatic Actuator

Published on: October 28, 2022

Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation.

Yongliang Yang, Bahare Kiumarsi, Hamidreza Modares

IEEE Transactions on Neural Networks and Learning Systems

|August 11, 2021

Summary

This summary is machine-generated.

A new model-free lambda-policy iteration (λ-PI) algorithm solves the discrete-time linear quadratic regulation (LQR) problem. This approach offers faster convergence than value iteration and doesn't need an initial policy.

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Related Experiment Videos

Last Updated: Oct 25, 2025

Design and Application of a Fault Detection Method Based on Adaptive Filters and Rotational Speed Estimation for an Electro-Hydrostatic Actuator

Design and Application of a Fault Detection Method Based on Adaptive Filters and Rotational Speed Estimation for an Electro-Hydrostatic Actuator

Published on: October 28, 2022

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Area of Science:

Control Theory
Reinforcement Learning
Optimization

Background:

The linear quadratic regulation (LQR) problem is crucial in control systems.
Traditional methods for solving LQR often require a model or specific initial conditions.
Iterative methods like policy iteration (PI) and value iteration (VI) are common.

Purpose of the Study:

Introduce a model-free lambda-policy iteration (λ-PI) algorithm for discrete-time LQR.
Develop an iterative solution that bypasses the need for a system model.
Enhance convergence properties and robustness compared to existing methods.

Main Methods:

Define novel weighted Bellman and composite Bellman operators.
Formulate λ-PI as a fixed-point iteration using the composite Bellman operator.
Employ off-policy reinforcement learning for model-free extension.

Main Results:

The λ-PI algorithm guarantees convergence through contraction and monotonic properties of the composite Bellman operator.
λ-PI demonstrates superior convergence rates compared to value iteration (VI).
Off-policy λ-PI variants exhibit robustness against probing noise.

Conclusions:

The proposed model-free λ-PI is an effective method for solving discrete-time LQR problems.
The algorithm eliminates the need for an admissible initial policy, unlike traditional PI.
Simulation results validate the efficacy and robustness of the λ-PI algorithm.