Search research articles

Related Concept Videos

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Timing and Consequences on Behavior

Timing and Consequences on Behavior

In operant conditioning, the timing of reinforcement is crucial. For animals like rats and cats, immediate reinforcement (within a few seconds) is much more effective than delayed reinforcement. For example, a food reward for a rat needs to follow within 30 seconds of pressing a bar to be effective.
Humans, however, can respond to delayed reinforcers. We often make decisions between immediate small rewards and delayed larger rewards. This ability to delay gratification is a significant...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Sampling Continuous Time Signal

Sampling Continuous Time Signal

In signal processing, a continuous-time signal can be sampled using an impulse-train sampling technique, followed by the zero-order hold method. Impulse-train sampling involves the use of a periodic impulse train, which consists of a series of delta functions spaced at regular intervals determined by the sampling period. When a continuous-time signal is multiplied by this impulse train, it generates impulses with amplitudes corresponding to the signal's values at the sampling points.
In the...

Linear Approximation in Time Domain

Linear Approximation in Time Domain

Nonlinear systems often require sophisticated approaches for accurate modeling and analysis, with state-space representation being particularly effective. This method is especially useful for systems where variables and parameters vary with time or operating conditions, such as in a simple pendulum or a translational mechanical system with nonlinear springs.
For a simple pendulum with a mass evenly distributed along its length and the center of mass located at half the pendulum's length,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning.

Neural networks : the official journal of the International Neural Network Society·2017

Same author

Autonomous reinforcement learning with experience replay.

Neural networks : the official journal of the International Neural Network Society·2012

Same author

Real-time reinforcement learning by sequential Actor-Critics and experience replay.

Neural networks : the official journal of the International Neural Network Society·2009

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

See all related articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Sep 4, 2025

Author Spotlight: Unveiling Neural Mechanisms Through Automated Evaluation of Motor Learning and Myelin Plasticity Studies Using the Erasmus Ladder

Author Spotlight: Unveiling Neural Mechanisms Through Automated Evaluation of Motor Learning and Myelin Plasticity Studies Using the Erasmus Ladder

Published on: December 15, 2023

ACERAC: Efficient Reinforcement Learning in Fine Time Discretization.

Jakub Lyskawa, Pawel Wawrzynski

IEEE Transactions on Neural Networks and Learning Systems

|July 20, 2022

Summary

This summary is machine-generated.

This study introduces a novel reinforcement learning (RL) framework for physical machine control, enabling stochastically dependent actions for smoother, more effective learning and improved exploration in fine time discretizations.

More Related Videos

Recording Single Neurons' Action Potentials from Freely Moving Pigeons Across Three Stages of Learning

Recording Single Neurons' Action Potentials from Freely Moving Pigeons Across Three Stages of Learning

Published on: June 2, 2014

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

Related Experiment Videos

Last Updated: Sep 4, 2025

Author Spotlight: Unveiling Neural Mechanisms Through Automated Evaluation of Motor Learning and Myelin Plasticity Studies Using the Erasmus Ladder

Author Spotlight: Unveiling Neural Mechanisms Through Automated Evaluation of Motor Learning and Myelin Plasticity Studies Using the Erasmus Ladder

Published on: December 15, 2023

Recording Single Neurons' Action Potentials from Freely Moving Pigeons Across Three Stages of Learning

Recording Single Neurons' Action Potentials from Freely Moving Pigeons Across Three Stages of Learning

Published on: June 2, 2014

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Three Laboratory Procedures for Assessing Different Manifestations of Impulsivity in Rats

Published on: March 17, 2019

Area of Science:

Robotics
Machine Learning
Control Systems

Background:

Reinforcement learning (RL) aims to enable physical machines to learn optimal behaviors.
Current RL methods struggle with fine time discretization due to independent random actions causing jerky movements and insufficient exploration.
These limitations hinder RL's application in modern control systems.

Purpose of the Study:

To introduce a reinforcement learning framework for stochastically dependent actions in sequential time instances.
To develop an algorithm that optimizes policies for producing these dependent actions.
To address the limitations of existing RL methods in fine time discretization for control.

Main Methods:

Developed an RL framework with analytical tools for stochastically dependent actions.
Introduced an RL algorithm utilizing experience replay (ER) to adjust action sequence likelihood.
Optimized policies based on expected n-step returns for improved control.
Validated the algorithm against CDAU, PPO, SAC, and ACER in simulated control tasks.

Main Results:

The proposed RL algorithm demonstrated superior performance compared to existing methods in most simulated control problems.
The framework effectively handles stochastically dependent actions, mitigating jerky system behavior.
Improved policy optimization through experience replay and n-step returns was observed.

Conclusions:

The novel RL framework and algorithm are effective for control systems requiring fine time discretization.
Stochastically dependent actions enhance exploration and policy improvement in RL.
This approach overcomes key obstacles preventing wider RL adoption in physical machine control.