Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Unrealistic Optimism Bias

Unrealistic Optimism Bias

Unrealistic optimism bias is the tendency to overestimate the likelihood of positive outcomes. This cognitive bias makes individuals believe they are less likely to experience failures, setbacks, or risks and more likely to succeed than others. For example, people may assume they are less prone to health issues, accidents, or financial struggles than their peers, even when they share similar risk factors.One key component of this bias is the above-average effect, where individuals perceive...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Propagation of Uncertainty from Random Error

Propagation of Uncertainty from Random Error

An experiment often consists of more than a single step. In this case, measurements at each step give rise to uncertainty. Because the measurements occur in successive steps, the uncertainty in one step necessarily contributes to that in the subsequent step. As we perform statistical analysis on these types of experiments, we must learn to account for the propagation of uncertainty from one step to the next. The propagation of uncertainty depends on the type of arithmetic operation performed on...

Expected Value

Expected Value

The expected value is known as the "long-term" average or mean. This means that over the long term of experimenting over and over, you would expect this average. The expected average is represented by the symbol μ. It is calculated as follows:

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026

Same author

Improvements to dark experience replay and reservoir sampling for better balance between consolidation and plasticity.

Frontiers in artificial intelligence·2026

Same author

Proportion of middle ear surgeries feasible via transcanal endoscopic ear surgery: A multicenter study in Japan.

Auris, nasus, larynx·2026

Same author

Weber-Fechner law in temporal difference learning derived from control as inference.

Frontiers in robotics and AI·2025

Same author

Expression of AQP-10, -11 and -12 in the rat stria vascularis.

Acta oto-laryngologica·2024

Same author

Integration of motion information in illusory motion perceived in stationary patterns.

Scientific reports·2023

Same journal

TraNce: Type-aware hypergraph neural network with biological mediators for drug repositioning.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Decentralized ADMM for factorization-based Low-rank matrix estimation.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Memristive neuromorphic circuit design inspired by the neural mechanisms of conditioned fear.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Q-learning based asynchronous Boolean control networks stabilization with data loss.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

New results on prescribed-time synchronization of complex networks via intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Variance-constrained multi-view ensemble broad network for imbalanced data.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 24, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization.

Taisuke Kobayashi¹

¹Nara Institute of Science and Technology, Nara, Japan.

Neural Networks : the Official Journal of the International Neural Network Society

|May 9, 2022

Summary

This summary is machine-generated.

This study reinterprets reinforcement learning (RL) optimization using KL divergence, introducing a novel forward KL divergence method. This new approach enhances learning speed and performance, showing promise in robotic simulations.

Keywords:

Control as probabilistic inference Kullback–Leibler divergence Optimistic learning Reinforcement learning

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Related Experiment Videos

Last Updated: Sep 24, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Traditional reinforcement learning (RL) optimizes policies indirectly to maximize returns.
Recent work interprets RL optimization with explicit consideration of optimality as a stochastic variable.
Existing methods often use reverse Kullback-Leibler (KL) divergence for optimization.

Purpose of the Study:

To propose a new interpretation of RL optimization using KL divergence.
To derive a novel optimization method based on forward KL divergence.
To investigate the impact of optimism in RL for improved learning.

Main Methods:

Formulated traditional RL learning laws as optimization problems with reverse KL divergence.
Derived new optimization problems utilizing forward KL divergence, addressing KL divergence asymmetry.
Introduced an optimistic RL approach controlled by a hyperparameter derived from uncertainty.

Main Results:

The derived forward KL divergence optimization problems are interpretable as optimistic RL.
Optimism, controlled by a hyperparameter, was found to accelerate learning and increase rewards.
Integration with prioritized experience replay and eligibility traces further enhanced learning speed.

Conclusions:

A novel optimistic RL method based on forward KL divergence was successfully derived.
Moderate optimism demonstrably accelerates learning and improves reward acquisition in simulations.
The proposed method showed superior performance compared to state-of-the-art RL techniques in realistic robotic simulations.