Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Probability Distributions

Probability Distributions

The probability of a random variable x is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Uniform Distribution

Uniform Distribution

The uniform distribution is a continuous probability distribution of events with an equal probability of occurrence. This distribution is rectangular.
Two essential properties of this distribution are

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

Sampling Continuous Time Signal

Sampling Continuous Time Signal

In signal processing, a continuous-time signal can be sampled using an impulse-train sampling technique, followed by the zero-order hold method. Impulse-train sampling involves the use of a periodic impulse train, which consists of a series of delta functions spaced at regular intervals determined by the sampling period. When a continuous-time signal is multiplied by this impulse train, it generates impulses with amplitudes corresponding to the signal's values at the sampling points.
In the...

Poisson Probability Distribution

Poisson Probability Distribution

A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Peripheral and central vestibular neuromodulation improve postural control in adolescent idiopathic scoliosis: a randomized, sham-controlled, multi-arm intervention study.

Journal of neuroengineering and rehabilitation·2026

Same author

scCCVGBen for benchmarking of single-cell representation learning anchored on a centroid-coupled variational graph attention autoencoder across scRNA-seq and scATAC-seq.

Frontiers in genetics·2026

Same author

Reduced HAV IgG Seropositivity Among Unvaccinated People Living with HIV: The Weak Shield.

Tropical medicine and infectious disease·2026

Same author

Immunosuppression, resistance burden, and qSOFA on short-term prognosis and difficult clearance in hospitalized patients with Salmonella infection: a single-center retrospective cohort study.

BMC infectious diseases·2026

Same author

LAIOR: a hyperbolic neural ODE variational framework for interpretable single-cell manifold learning and trajectory inference.

Frontiers in genetics·2026

Same author

Global and Coalition Cognition Graph Modeling for Interpretable Multiagent Reinforcement Learning.

IEEE transactions on cybernetics·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 15, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning.

Yuanyang Zhu, Zhi Wang, Yuanheng Zhu

IEEE Transactions on Neural Networks and Learning Systems

|August 27, 2024

Summary

This summary is machine-generated.

Discretizing continuous action spaces in reinforcement learning (RL) can increase variance. This study introduces a unimodal policy using Poisson distributions to improve stability and performance in complex control tasks.

More Related Videos

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

An Operant Intra-/Extra-dimensional Set-shift Task for Mice

An Operant Intra-/Extra-dimensional Set-shift Task for Mice

Published on: January 22, 2016

Related Experiment Videos

Last Updated: Jun 15, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

Published on: March 3, 2023

An Operant Intra-/Extra-dimensional Set-shift Task for Mice

An Operant Intra-/Extra-dimensional Set-shift Task for Mice

Published on: January 22, 2016

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Discretizing continuous action spaces in on-policy reinforcement learning (RL) simplifies optimization but can lead to high variance due to ignored action ordering.
The explosion of discrete actions without considering their inherent order can negatively impact policy gradient (PG) estimator performance.

Purpose of the Study:

To introduce a novel architecture for on-policy RL that constrains discrete policies to be unimodal.
To leverage the continuity of underlying continuous action spaces through explicit unimodal probability distributions.
To reduce variance in the policy gradient estimator and enhance learning stability.

Main Methods:

Implementing a unimodal discrete policy architecture using Poisson probability distributions.
Constraining the policy to be unimodal to better utilize the continuous nature of the action space.
Conducting extensive experiments on challenging control tasks, including the Humanoid task.

Main Results:

The unimodal discrete policy achieved significantly faster convergence compared to standard methods.
Demonstrated higher performance in complex on-policy RL tasks, particularly in highly challenging scenarios like Humanoid.
Theoretical analysis confirmed lower variance for the PG estimator with the proposed unimodal policy.

Conclusions:

The proposed unimodal discrete policy architecture effectively addresses the variance issues associated with action space discretization in on-policy RL.
This approach enhances learning stability and performance, especially in complex robotic control tasks.
The explicit use of unimodal probability distributions better exploits the continuity of the action space for improved RL outcomes.