Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Time-Domain Interpretation of PD Control

Time-Domain Interpretation of PD Control

Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...

PD Controller: Design

PD Controller: Design

In automotive engineering, car suspension systems often employ Proportional Derivative (PD) controllers to enhance performance. PD controllers are utilized to adjust the damping force in response to road conditions. A controller, acting as an amplifier with a constant gain, demonstrates proportional control, with output directly mirroring input.
Designing a continuous-data controller requires selecting and linking components like adders and integrators, which are fundamental in Proportional,...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

PI Controller: Design

PI Controller: Design

Proportional Integral (PI) controllers are a fundamental component in modern control systems, widely used to enhance performance and mitigate steady-state errors. They are particularly effective in applications such as automatic brightness adjustment on smartphones, where they excel at mitigating steady-state errors for step-function inputs. Unlike PD controllers, which require time-varying errors to function optimally, PI controllers leverage their integral component to address residual...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Decomposed Multi-Modality Fusion: Integrating Frames and Events for Efficient Visuomotor Policies.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Learning predictive control based on extended fuzzy state observation for trajectory tracking of an uncertain manipulator.

ISA transactions·2025

Same author

Enhanced T<sub>g</sub> Prediction in Polyimide via PolySDA: A Novel Shallow-Deep Multimodal Fusion Framework.

Macromolecular rapid communications·2025

Same author

Enhancing Graph Reconstruction: Uniting Dual-Level Graph Structure With Graph Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2025

Same author

A novel class of non-Gaussian system performance assessment and controller parameter tuning methods.

ISA transactions·2024

Same author

Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning.

Langmuir : the ACS journal of surfaces and colloids·2024

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Videos

Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation.

Luntong Li, Dazi Li, Tianheng Song

IEEE Transactions on Neural Networks and Learning Systems

|April 24, 2020

Summary

This summary is machine-generated.

This study introduces Regularized Dual-Averaging policy gradient (RDA-PG), an actor-critic method for reinforcement learning. RDA-PG uses L1-regularization for feature selection, improving learning efficiency and performance on complex tasks.

Related Experiment Videos

Area of Science:

Artificial Intelligence
Machine Learning
Control Theory

Background:

Actor-critic (AC) architectures are crucial for reinforcement learning (RL) with continuous states and actions.
Improving learning efficiency and convergence in AC methods remains a key challenge, often addressed through regularization and feature learning in policy evaluation.

Purpose of the Study:

To propose a novel actor-critic learning control method incorporating regularization and feature selection for enhanced policy gradient estimation.
To introduce the Regularized Dual-Averaging policy gradient (RDA-PG) algorithm for efficient and effective RL.

Main Methods:

Utilizing L1-regularization within the actor network to perform automatic feature selection.
Employing the regularized dual-averaging (RDA) technique for policy parameter updates, balancing past policy gradients with L1-regularization.
Establishing convergence guarantees using two-timescale stochastic approximation theory.

Main Results:

RDA-PG successfully performs feature selection in the actor network, learning sparse representations for both stochastic and deterministic policies.
The algorithm efficiently solves the minimization problem inherent in the RDA technique.
Demonstrated superior performance compared to existing AC algorithms on RL benchmarks with irrelevant or redundant features.

Conclusions:

The proposed RDA-PG algorithm offers an effective approach to enhance actor-critic learning through integrated feature selection and regularization.
RDA-PG provides a robust method for learning near-optimal stochastic and deterministic policies, particularly in scenarios with high-dimensional or noisy feature spaces.