Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Instance-dependent Early Stopping for Adaptive Data Pruning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Towards natural stand-up movement support: guiding higher-dimensional muscle activation using a Lower-DOF assistive chair.

Frontiers in bioengineering and biotechnology·2026

Same author

Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Rapid functional reorganization of the targeted contralesional hemisphere induced by one week of noninvasive closed-loop neurofeedback guides motor recovery in post-stroke patients with chronic motor impairment: a phase I trial.

Communications medicine·2026

Same author

Dynamical modeling of torso stability in running via hip-knee three pairs of six springs.

Bioinspiration & biomimetics·2025

Same author

Neural-enhanced motion-to-EMG: refining simulated muscle activity from musculoskeletal models using a Seq2Seq approach.

Frontiers in bioengineering and biotechnology·2025

Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 9, 2026

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

Published on: August 18, 2023

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation.

Voot Tangkaratt¹, Syogo Mori¹, Tingting Zhao¹

¹Tokyo Institute of Technology, Japan.

Neural Networks : the Official Journal of the International Neural Network Society

|July 5, 2014

Summary

This summary is machine-generated.

Model-based reinforcement learning (RL) offers a data-efficient alternative to model-free RL. This study introduces a novel method combining policy gradients with advanced transition model estimation for improved control policy learning.

Keywords:

Conditional density estimation Reinforcement learning Transition model estimation

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Related Experiment Videos

Last Updated: Jan 9, 2026

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

A Workflow for Lipid Nanoparticle LNP Formulation Optimization using Designed Mixture-Process Experiments and Self-Validated Ensemble Models SVEM

Published on: August 18, 2023

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Reinforcement learning (RL) aims to optimize agent control policies for maximum future rewards.
Model-free RL learns policies directly from data, often requiring extensive samples.
Model-based RL estimates environment dynamics, potentially improving data efficiency.

Purpose of the Study:

To develop a novel model-based reinforcement learning method.
To enhance policy learning efficiency using limited data.
To demonstrate the practical utility of the proposed approach.

Main Methods:

Combines policy gradients (a model-free method) with parameter-based exploration.
Utilizes least-squares conditional density estimation for accurate transition model learning.
Integrates model estimation and policy learning within a unified framework.

Main Results:

The proposed model-based RL method shows practical usefulness in experiments.
Achieves effective policy learning with reduced data requirements compared to model-free approaches.
Demonstrates the synergy between advanced transition model estimation and policy search.

Conclusions:

The novel model-based RL approach provides a data-efficient alternative for learning optimal control policies.
Combining policy gradients with accurate transition model estimation is a promising direction for RL research.
The method is practically useful and offers advantages in scenarios with expensive data collection.