Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Approximate Integration

Approximate Integration

In many practical and theoretical contexts, the exact value of a definite integral may be inaccessible. This limitation typically arises when the antiderivative of a function is either unknown or cannot be expressed in a closed mathematical form. Alternatively, it can occur when a function is defined not by a formula but by a finite set of empirical data points, such as those collected during experiments. In these cases, approximate integration techniques provide a valuable solution.One of the...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Sampling Methods: Overview

Sampling Methods: Overview

A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling.
In analytical chemistry, the choice of...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Optimal Foraging

Optimal Foraging

How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Catecholamine precursor modulation of human exploration: Evidence from a large gender-balanced sample.

PLoS computational biology·2026

Same author

The earlier you know, the smoother you act: anticipatory control in solo and dyadic juggling.

Experimental brain research·2026

Same author

Exploration Strategies and Feature Prioritisation in Contour-based Haptic Perception of 2D Shape.

IEEE transactions on haptics·2026

Same author

[Use of continuous passive motion in inpatient rehabilitation after shoulder replacement-a retrospective study].

Orthopadie (Heidelberg, Germany)·2026

Same author

Open science practices in behavioral addictions: An exploratory survey.

Journal of behavioral addictions·2026

Same author

Environmental Dissemination of Antimicrobial Resistance: A Resistome-Based Comparison of Hospital and Community Wastewater Sources.

Antibiotics (Basel, Switzerland)·2026

Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 18, 2026

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Adaptive importance sampling for value function approximation in off-policy reinforcement learning.

Hirotaka Hachiya¹, Takayuki Akiyama, Masashi Sugiayma

¹Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan. hachiya@sg.cs.titech.ac.jp

Neural Networks : the Official Journal of the International Neural Network Society

|February 14, 2009

Summary

This summary is machine-generated.

This study introduces adaptive importance sampling for off-policy reinforcement learning, improving stability by managing bias-variance trade-offs. Simulations show this method enhances performance in complex learning environments.

More Related Videos

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Related Experiment Videos

Last Updated: Jan 18, 2026

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Studying Food Reward and Motivation in Humans

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Area of Science:

Machine Learning
Artificial Intelligence
Computational Statistics

Background:

Off-policy reinforcement learning (RL) utilizes data from different policies for efficient learning.
Importance sampling corrects bias in value function estimation but can increase variance.
Existing methods often struggle with estimator variance, leading to unstable performance.

Purpose of the Study:

To develop a more stable and efficient off-policy reinforcement learning method.
To actively control the bias-variance trade-off in value function estimation.
To introduce an adaptive importance sampling technique for improved RL performance.

Main Methods:

Proposed an adaptive importance sampling (AIS) technique to manage bias-variance.
Developed a cross-validation-based method for optimal parameter selection in AIS.
Evaluated the approach through simulation studies.

Main Results:

The proposed adaptive importance sampling method demonstrated effective control over bias-variance trade-offs.
Optimal parameter determination using cross-validation led to improved estimator stability.
Simulations confirmed the enhanced performance and stability of the new approach.

Conclusions:

Adaptive importance sampling offers a robust solution for stabilizing off-policy reinforcement learning.
The bias-variance control mechanism is crucial for reliable performance in RL.
This work provides a practical method for improving data efficiency and stability in RL algorithms.