Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Approximate Integration01:24

Approximate Integration

21
In many practical and theoretical contexts, the exact value of a definite integral may be inaccessible. This limitation typically arises when the antiderivative of a function is either unknown or cannot be expressed in a closed mathematical form. Alternatively, it can occur when a function is defined not by a formula but by a finite set of empirical data points, such as those collected during experiments. In these cases, approximate integration techniques provide a valuable solution.One of the...
21
Decision Making: P-value Method01:09

Decision Making: P-value Method

6.8K
The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim  is also stated. These statements can act as null and alternative hypotheses:  a null hypothesis would be a neutral statement while the alternative hypothesis can...
6.8K
Sampling Methods: Overview01:06

Sampling Methods: Overview

2.9K
A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling. 
In analytical chemistry, the choice of...
2.9K
Reinforcement Schedules01:24

Reinforcement Schedules

462
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
462
Sampling Plans01:23

Sampling Plans

897
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
897
Optimal Foraging00:48

Optimal Foraging

13.6K
How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.
13.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Catecholamine precursor modulation of human exploration: Evidence from a large gender-balanced sample.

PLoS computational biology·2026
Same author

The earlier you know, the smoother you act: anticipatory control in solo and dyadic juggling.

Experimental brain research·2026
Same author

Exploration Strategies and Feature Prioritisation in Contour-based Haptic Perception of 2D Shape.

IEEE transactions on haptics·2026
Same author

[Use of continuous passive motion in inpatient rehabilitation after shoulder replacement-a retrospective study].

Orthopadie (Heidelberg, Germany)·2026
Same author

Open science practices in behavioral addictions: An exploratory survey.

Journal of behavioral addictions·2026
Same author

Environmental Dissemination of Antimicrobial Resistance: A Resistome-Based Comparison of Hospital and Community Wastewater Sources.

Antibiotics (Basel, Switzerland)·2026
Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Jan 18, 2026

Pavlovian Conditioned Approach Training in Rats
06:57

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

11.4K

Adaptive importance sampling for value function approximation in off-policy reinforcement learning.

Hirotaka Hachiya1, Takayuki Akiyama, Masashi Sugiayma

  • 1Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan. hachiya@sg.cs.titech.ac.jp

Neural Networks : the Official Journal of the International Neural Network Society
|February 14, 2009
PubMed
Summary
This summary is machine-generated.

This study introduces adaptive importance sampling for off-policy reinforcement learning, improving stability by managing bias-variance trade-offs. Simulations show this method enhances performance in complex learning environments.

More Related Videos

Studying Food Reward and Motivation in Humans
12:09

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

24.1K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.4K

Related Experiment Videos

Last Updated: Jan 18, 2026

Pavlovian Conditioned Approach Training in Rats
06:57

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

11.4K
Studying Food Reward and Motivation in Humans
12:09

Studying Food Reward and Motivation in Humans

Published on: March 19, 2014

24.1K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.4K

Area of Science:

  • Machine Learning
  • Artificial Intelligence
  • Computational Statistics

Background:

  • Off-policy reinforcement learning (RL) utilizes data from different policies for efficient learning.
  • Importance sampling corrects bias in value function estimation but can increase variance.
  • Existing methods often struggle with estimator variance, leading to unstable performance.

Purpose of the Study:

  • To develop a more stable and efficient off-policy reinforcement learning method.
  • To actively control the bias-variance trade-off in value function estimation.
  • To introduce an adaptive importance sampling technique for improved RL performance.

Main Methods:

  • Proposed an adaptive importance sampling (AIS) technique to manage bias-variance.
  • Developed a cross-validation-based method for optimal parameter selection in AIS.
  • Evaluated the approach through simulation studies.

Main Results:

  • The proposed adaptive importance sampling method demonstrated effective control over bias-variance trade-offs.
  • Optimal parameter determination using cross-validation led to improved estimator stability.
  • Simulations confirmed the enhanced performance and stability of the new approach.

Conclusions:

  • Adaptive importance sampling offers a robust solution for stabilizing off-policy reinforcement learning.
  • The bias-variance control mechanism is crucial for reliable performance in RL.
  • This work provides a practical method for improving data efficiency and stability in RL algorithms.