Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Unrealistic Optimism Bias01:30

Unrealistic Optimism Bias

7
Unrealistic optimism bias is the tendency to overestimate the likelihood of positive outcomes. This cognitive bias makes individuals believe they are less likely to experience failures, setbacks, or risks and more likely to succeed than others. For example, people may assume they are less prone to health issues, accidents, or financial struggles than their peers, even when they share similar risk factors.One key component of this bias is the above-average effect, where individuals perceive...
7
Observational Learning01:12

Observational Learning

334
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
334
Reinforcement01:23

Reinforcement

360
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
360
Propagation of Uncertainty from Random Error00:59

Propagation of Uncertainty from Random Error

1.1K
An experiment often consists of more than a single step. In this case, measurements at each step give rise to uncertainty. Because the measurements occur in successive steps, the uncertainty in one step necessarily contributes to that in the subsequent step. As we perform statistical analysis on these types of experiments, we must learn to account for the propagation of uncertainty from one step to the next. The propagation of uncertainty depends on the type of arithmetic operation performed on...
1.1K
Expected Value01:15

Expected Value

4.2K
The expected value is known as the "long-term" average or mean. This means that over the long term of experimenting over and over, you would expect this average. The expected average is represented by the symbol μ. It is calculated as follows:
4.2K
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

763
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
763

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same author

Improvements to dark experience replay and reservoir sampling for better balance between consolidation and plasticity.

Frontiers in artificial intelligence·2026
Same author

Proportion of middle ear surgeries feasible via transcanal endoscopic ear surgery: A multicenter study in Japan.

Auris, nasus, larynx·2026
Same author

Weber-Fechner law in temporal difference learning derived from control as inference.

Frontiers in robotics and AI·2025
Same author

Expression of AQP-10, -11 and -12 in the rat stria vascularis.

Acta oto-laryngologica·2024
Same author

Integration of motion information in illusory motion perceived in stationary patterns.

Scientific reports·2023
Same journal

TraNce: Type-aware hypergraph neural network with biological mediators for drug repositioning.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Decentralized ADMM for factorization-based Low-rank matrix estimation.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Memristive neuromorphic circuit design inspired by the neural mechanisms of conditioned fear.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Q-learning based asynchronous Boolean control networks stabilization with data loss.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

New results on prescribed-time synchronization of complex networks via intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Sep 24, 2025

Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.4K

Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization.

Taisuke Kobayashi1

  • 1Nara Institute of Science and Technology, Nara, Japan.

Neural Networks : the Official Journal of the International Neural Network Society
|May 9, 2022
PubMed
Summary
This summary is machine-generated.

This study reinterprets reinforcement learning (RL) optimization using KL divergence, introducing a novel forward KL divergence method. This new approach enhances learning speed and performance, showing promise in robotic simulations.

Keywords:
Control as probabilistic inferenceKullback–Leibler divergenceOptimistic learningReinforcement learning

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

5.1K

Related Experiment Videos

Last Updated: Sep 24, 2025

Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.4K
WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

5.1K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Robotics

Background:

  • Traditional reinforcement learning (RL) optimizes policies indirectly to maximize returns.
  • Recent work interprets RL optimization with explicit consideration of optimality as a stochastic variable.
  • Existing methods often use reverse Kullback-Leibler (KL) divergence for optimization.

Purpose of the Study:

  • To propose a new interpretation of RL optimization using KL divergence.
  • To derive a novel optimization method based on forward KL divergence.
  • To investigate the impact of optimism in RL for improved learning.

Main Methods:

  • Formulated traditional RL learning laws as optimization problems with reverse KL divergence.
  • Derived new optimization problems utilizing forward KL divergence, addressing KL divergence asymmetry.
  • Introduced an optimistic RL approach controlled by a hyperparameter derived from uncertainty.

Main Results:

  • The derived forward KL divergence optimization problems are interpretable as optimistic RL.
  • Optimism, controlled by a hyperparameter, was found to accelerate learning and increase rewards.
  • Integration with prioritized experience replay and eligibility traces further enhanced learning speed.

Conclusions:

  • A novel optimistic RL method based on forward KL divergence was successfully derived.
  • Moderate optimism demonstrably accelerates learning and improves reward acquisition in simulations.
  • The proposed method showed superior performance compared to state-of-the-art RL techniques in realistic robotic simulations.