Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

742
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
742
Regression Toward the Mean01:52

Regression Toward the Mean

6.8K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.8K
Time-Domain Interpretation of PD Control01:07

Time-Domain Interpretation of PD Control

311
Proportional-Derivative (PD) control is a widely used control method in various engineering systems to enhance stability and performance. In a system with only proportional control, common issues include high maximum overshoot and oscillation, observed in both the error signal and its rate of change. This behavior can be divided into three distinct phases: initial overshoot, subsequent undershoot, and gradual stabilization.
Consider the example of control of motor torque. Initially, a positive...
311
PD Controller: Design01:26

PD Controller: Design

544
In automotive engineering, car suspension systems often employ Proportional Derivative (PD) controllers to enhance performance. PD controllers are utilized to adjust the damping force in response to road conditions. A controller, acting as an amplifier with a constant gain, demonstrates proportional control, with output directly mirroring input.
Designing a continuous-data controller requires selecting and linking components like adders and integrators, which are fundamental in Proportional,...
544
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

8.8K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
8.8K
PI Controller: Design01:24

PI Controller: Design

1.0K
Proportional Integral (PI) controllers are a fundamental component in modern control systems, widely used to enhance performance and mitigate steady-state errors. They are particularly effective in applications such as automatic brightness adjustment on smartphones, where they excel at mitigating steady-state errors for step-function inputs. Unlike PD controllers, which require time-varying errors to function optimally, PI controllers leverage their integral component to address residual...
1.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Decomposed Multi-Modality Fusion: Integrating Frames and Events for Efficient Visuomotor Policies.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Learning predictive control based on extended fuzzy state observation for trajectory tracking of an uncertain manipulator.

ISA transactions·2025
Same author

Enhanced T<sub>g</sub> Prediction in Polyimide via PolySDA: A Novel Shallow-Deep Multimodal Fusion Framework.

Macromolecular rapid communications·2025
Same author

Enhancing Graph Reconstruction: Uniting Dual-Level Graph Structure With Graph Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2025
Same author

A novel class of non-Gaussian system performance assessment and controller parameter tuning methods.

ISA transactions·2024
Same author

Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning.

Langmuir : the ACS journal of surfaces and colloids·2024
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Videos

Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation.

Luntong Li, Dazi Li, Tianheng Song

    IEEE Transactions on Neural Networks and Learning Systems
    |April 24, 2020
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces Regularized Dual-Averaging policy gradient (RDA-PG), an actor-critic method for reinforcement learning. RDA-PG uses L1-regularization for feature selection, improving learning efficiency and performance on complex tasks.

    Related Experiment Videos

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Control Theory

    Background:

    • Actor-critic (AC) architectures are crucial for reinforcement learning (RL) with continuous states and actions.
    • Improving learning efficiency and convergence in AC methods remains a key challenge, often addressed through regularization and feature learning in policy evaluation.

    Purpose of the Study:

    • To propose a novel actor-critic learning control method incorporating regularization and feature selection for enhanced policy gradient estimation.
    • To introduce the Regularized Dual-Averaging policy gradient (RDA-PG) algorithm for efficient and effective RL.

    Main Methods:

    • Utilizing L1-regularization within the actor network to perform automatic feature selection.
    • Employing the regularized dual-averaging (RDA) technique for policy parameter updates, balancing past policy gradients with L1-regularization.
    • Establishing convergence guarantees using two-timescale stochastic approximation theory.

    Main Results:

    • RDA-PG successfully performs feature selection in the actor network, learning sparse representations for both stochastic and deterministic policies.
    • The algorithm efficiently solves the minimization problem inherent in the RDA technique.
    • Demonstrated superior performance compared to existing AC algorithms on RL benchmarks with irrelevant or redundant features.

    Conclusions:

    • The proposed RDA-PG algorithm offers an effective approach to enhance actor-critic learning through integrated feature selection and regularization.
    • RDA-PG provides a robust method for learning near-optimal stochastic and deterministic policies, particularly in scenarios with high-dimensional or noisy feature spaces.