Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Actor-Observer Effect

Actor-Observer Effect

The actor-observer effect, a cognitive bias closely linked to the fundamental attribution error, refers to the tendency for individuals to attribute their behavior to external, situational factors while explaining others’ behavior in terms of internal, dispositional traits. This asymmetry in attribution significantly influences social perception and judgment.Cognitive Mechanisms Behind the EffectTwo primary psychological mechanisms contribute to the actor-observer effect: differences in...

What is an Electrochemical Gradient?

What is an Electrochemical Gradient?

Adenosine triphosphate, or ATP, is considered the primary energy source in cells. However, energy can also be stored in the electrochemical gradient of an ion across the plasma membrane, which is determined by two factors: its chemical and electrical gradients.
The chemical gradient relies on differences in the abundance of a substance on the outside versus the inside of a cell and flows from areas of high to low ion concentration. In contrast, the electrical gradient revolves around an...

Distance Corrections

Distance Corrections

To achieve precise distance measurements, especially in surveying and construction, certain corrections must be applied to account for potential sources of error like the standardization errors, temperature variations, and slope adjustments.Standardization error emerges when measurement equipment undergoes changes, such as wear, repairs, or weather impacts. To address this, surveyors compare the equipment’s readings to a standard. This process identifies any deviation that might lead to...

Power Factor Correction

Power Factor Correction

The power transmission to a factory involves the transfer of apparent power, a combination of active and reactive power. The power factor measures how effectively electrical power is converted into useful work output. The ratio of the real power (KW) that does the work to the apparent power (KVA) supplied to the circuit.

Predicting Molecular Geometry

Predicting Molecular Geometry

VSEPR Theory for Determination of Electron Pair Geometries

Critical Region, Critical Values and Significance Level

Critical Region, Critical Values and Significance Level

The critical region, critical value, and significance level are interdependent concepts crucial in hypothesis testing.
In hypothesis testing, a sample statistic is converted to a test statistic using z, t, or chi-square distribution. A critical region is an area under the curve in probability distributions demarcated by the critical value. When the test statistic falls in this region, it suggests that the null hypothesis must be rejected. As this region contains all those values of the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Decomposed Multi-Modality Fusion: Integrating Frames and Events for Efficient Visuomotor Policies.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Learning predictive control based on extended fuzzy state observation for trajectory tracking of an uncertain manipulator.

ISA transactions·2025

Same author

Enhanced T<sub>g</sub> Prediction in Polyimide via PolySDA: A Novel Shallow-Deep Multimodal Fusion Framework.

Macromolecular rapid communications·2025

Same author

Enhancing Graph Reconstruction: Uniting Dual-Level Graph Structure With Graph Reinforcement Learning.

IEEE transactions on neural networks and learning systems·2025

Same author

A novel class of non-Gaussian system performance assessment and controller parameter tuning methods.

ISA transactions·2024

Same author

Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning.

Langmuir : the ACS journal of surfaces and colloids·2024

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 8, 2026

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Published on: September 13, 2019

Actor-Critic Learning Control Based on -Regularized Temporal-Difference Prediction With Gradient Correction.

Luntong Li, Dazi Li, Tianheng Song

IEEE Transactions on Neural Networks and Learning Systems

|July 12, 2018

Summary

This summary is machine-generated.

This study introduces Critic-Iteration Policy Gradient (CIPG), a novel actor-critic framework. CIPG improves data efficiency and convergence for learning control problems by using a regularized RLS-TD critic for on-policy evaluation.

More Related Videos

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Related Experiment Videos

Last Updated: Feb 8, 2026

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Published on: September 13, 2019

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Area of Science:

Reinforcement Learning
Machine Learning
Control Theory

Background:

Actor-critic (AC) methods based on policy gradient (PG-based AC) are prevalent for learning control problems.
Enhancing data efficiency in the critic component of PG-based AC has led to research in recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation.
Existing RLS-TD critic implementations evaluate mixed policies from varying actors, hindering convergence proofs to optimal fixed points.

Purpose of the Study:

To propose a new actor-critic framework, Critic-Iteration Policy Gradient (CIPG), that addresses the convergence limitations of existing RLS-TD critic methods.
To enable on-policy learning of the state-value function for the current policy.
To achieve gradient ascent towards maximizing the discounted total reward.

Main Methods:

CIPG maintains fixed policy parameters within each iteration.
It employs an RLS-TD critic with -regularization for evaluating the fixed policy.
Convergence analysis is extended for PG with function approximation to incorporate the RLS-TD critic.

Main Results:

The -regularization term in the CIPG critic remains active throughout the learning process.
CIPG demonstrates superior learning efficiency compared to conventional AC methods.
Simulation results indicate a faster convergence rate for CIPG.

Conclusions:

CIPG provides a theoretically sound and practically effective framework for policy gradient actor-critic methods.
The proposed method overcomes convergence issues associated with RLS-TD critics evaluating non-fixed policies.
CIPG offers improved performance in learning control tasks.