Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Probability Distributions01:32

Probability Distributions

6.8K
 The probability of a random variable x  is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...
6.8K
Reinforcement Schedules01:24

Reinforcement Schedules

138
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
138
Uniform Distribution01:19

Uniform Distribution

4.8K
The uniform distribution is a continuous probability distribution of events with an equal probability of occurrence. This distribution is rectangular.
Two essential properties of this distribution are
4.8K
State Space Representation01:27

State Space Representation

178
The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...
178
Sampling Continuous Time Signal01:11

Sampling Continuous Time Signal

222
In signal processing, a continuous-time signal can be sampled using an impulse-train sampling technique, followed by the zero-order hold method. Impulse-train sampling involves the use of a periodic impulse train, which consists of a series of delta functions spaced at regular intervals determined by the sampling period. When a continuous-time signal is multiplied by this impulse train, it generates impulses with amplitudes corresponding to the signal's values at the sampling points.
In the...
222
Poisson Probability Distribution01:09

Poisson Probability Distribution

7.8K
A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...
7.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Peripheral and central vestibular neuromodulation improve postural control in adolescent idiopathic scoliosis: a randomized, sham-controlled, multi-arm intervention study.

Journal of neuroengineering and rehabilitation·2026
Same author

scCCVGBen for benchmarking of single-cell representation learning anchored on a centroid-coupled variational graph attention autoencoder across scRNA-seq and scATAC-seq.

Frontiers in genetics·2026
Same author

Reduced HAV IgG Seropositivity Among Unvaccinated People Living with HIV: The Weak Shield.

Tropical medicine and infectious disease·2026
Same author

Immunosuppression, resistance burden, and qSOFA on short-term prognosis and difficult clearance in hospitalized patients with Salmonella infection: a single-center retrospective cohort study.

BMC infectious diseases·2026
Same author

LAIOR: a hyperbolic neural ODE variational framework for interpretable single-cell manifold learning and trajectory inference.

Frontiers in genetics·2026
Same author

Global and Coalition Cognition Graph Modeling for Interpretable Multiagent Reinforcement Learning.

IEEE transactions on cybernetics·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Jun 15, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

4.9K

Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning.

Yuanyang Zhu, Zhi Wang, Yuanheng Zhu

    IEEE Transactions on Neural Networks and Learning Systems
    |August 27, 2024
    PubMed
    Summary
    This summary is machine-generated.

    Discretizing continuous action spaces in reinforcement learning (RL) can increase variance. This study introduces a unimodal policy using Poisson distributions to improve stability and performance in complex control tasks.

    More Related Videos

    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice
    08:59

    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

    Published on: March 3, 2023

    2.0K
    An Operant Intra-/Extra-dimensional Set-shift Task for Mice
    08:35

    An Operant Intra-/Extra-dimensional Set-shift Task for Mice

    Published on: January 22, 2016

    12.2K

    Related Experiment Videos

    Last Updated: Jun 15, 2025

    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
    08:18

    WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

    Published on: August 15, 2020

    4.9K
    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice
    08:59

    An Open-Source Virtual Reality System for the Measurement of Spatial Learning in Head-Restrained Mice

    Published on: March 3, 2023

    2.0K
    An Operant Intra-/Extra-dimensional Set-shift Task for Mice
    08:35

    An Operant Intra-/Extra-dimensional Set-shift Task for Mice

    Published on: January 22, 2016

    12.2K

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Robotics

    Background:

    • Discretizing continuous action spaces in on-policy reinforcement learning (RL) simplifies optimization but can lead to high variance due to ignored action ordering.
    • The explosion of discrete actions without considering their inherent order can negatively impact policy gradient (PG) estimator performance.

    Purpose of the Study:

    • To introduce a novel architecture for on-policy RL that constrains discrete policies to be unimodal.
    • To leverage the continuity of underlying continuous action spaces through explicit unimodal probability distributions.
    • To reduce variance in the policy gradient estimator and enhance learning stability.

    Main Methods:

    • Implementing a unimodal discrete policy architecture using Poisson probability distributions.
    • Constraining the policy to be unimodal to better utilize the continuous nature of the action space.
    • Conducting extensive experiments on challenging control tasks, including the Humanoid task.

    Main Results:

    • The unimodal discrete policy achieved significantly faster convergence compared to standard methods.
    • Demonstrated higher performance in complex on-policy RL tasks, particularly in highly challenging scenarios like Humanoid.
    • Theoretical analysis confirmed lower variance for the PG estimator with the proposed unimodal policy.

    Conclusions:

    • The proposed unimodal discrete policy architecture effectively addresses the variance issues associated with action space discretization in on-policy RL.
    • This approach enhances learning stability and performance, especially in complex robotic control tasks.
    • The explicit use of unimodal probability distributions better exploits the continuity of the action space for improved RL outcomes.