Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

261
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
261
Model Approaches for Pharmacokinetic Data: Distributed Parameter Models01:06

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

223
Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...
223
Parametric Survival Analysis: Weibull and Exponential Methods01:14

Parametric Survival Analysis: Weibull and Exponential Methods

984
Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...
984
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.0K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.0K
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

226
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
226
Calibration Curves: Linear Least Squares01:20

Calibration Curves: Linear Least Squares

4.0K
A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...
4.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Instance-dependent Early Stopping for Adaptive Data Pruning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Towards natural stand-up movement support: guiding higher-dimensional muscle activation using a Lower-DOF assistive chair.

Frontiers in bioengineering and biotechnology·2026
Same author

Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Rapid functional reorganization of the targeted contralesional hemisphere induced by one week of noninvasive closed-loop neurofeedback guides motor recovery in post-stroke patients with chronic motor impairment: a phase I trial.

Communications medicine·2026
Same author

Dynamical modeling of torso stability in running via hip-knee three pairs of six springs.

Bioinspiration & biomimetics·2025
Same author

Neural-enhanced motion-to-EMG: refining simulated muscle activity from musculoskeletal models using a Seq2Seq approach.

Frontiers in bioengineering and biotechnology·2025
Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Jan 9, 2026

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM
13:54

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

Published on: August 18, 2023

5.7K

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation.

Voot Tangkaratt1, Syogo Mori1, Tingting Zhao1

  • 1Tokyo Institute of Technology, Japan.

Neural Networks : the Official Journal of the International Neural Network Society
|July 5, 2014
PubMed
Summary
This summary is machine-generated.

Model-based reinforcement learning (RL) offers a data-efficient alternative to model-free RL. This study introduces a novel method combining policy gradients with advanced transition model estimation for improved control policy learning.

Keywords:
Conditional density estimationReinforcement learningTransition model estimation

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.6K

Related Experiment Videos

Last Updated: Jan 9, 2026

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM
13:54

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

Published on: August 18, 2023

5.7K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.6K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Robotics

Background:

  • Reinforcement learning (RL) aims to optimize agent control policies for maximum future rewards.
  • Model-free RL learns policies directly from data, often requiring extensive samples.
  • Model-based RL estimates environment dynamics, potentially improving data efficiency.

Purpose of the Study:

  • To develop a novel model-based reinforcement learning method.
  • To enhance policy learning efficiency using limited data.
  • To demonstrate the practical utility of the proposed approach.

Main Methods:

  • Combines policy gradients (a model-free method) with parameter-based exploration.
  • Utilizes least-squares conditional density estimation for accurate transition model learning.
  • Integrates model estimation and policy learning within a unified framework.

Main Results:

  • The proposed model-based RL method shows practical usefulness in experiments.
  • Achieves effective policy learning with reduced data requirements compared to model-free approaches.
  • Demonstrates the synergy between advanced transition model estimation and policy search.

Conclusions:

  • The novel model-based RL approach provides a data-efficient alternative for learning optimal control policies.
  • Combining policy gradients with accurate transition model estimation is a promising direction for RL research.
  • The method is practically useful and offers advantages in scenarios with expensive data collection.