Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

Quadratic Models

Quadratic Models

Quadratic models are mathematical representations used to describe relationships in which the rate of change changes at a constant rate. These models appear in a wide variety of natural and engineered systems, especially those involving motion, forces, and optimization. One common application is analyzing the vertical motion of objects influenced by gravity, such as a ball thrown into the air.In such scenarios, the object's height changes over time in a curved pattern, rising to a maximum point...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Is More Always Better With Digital Health Interventions? Shifting Engagement From Maximizing Use to Supporting Health.

Mayo Clinic proceedings. Digital health·2026

Same author

Identifying Systems Developed for Classifying Physiotherapy Interventions in Neurological Rehabilitation: A Scoping Review.

Physiotherapy Canada. Physiotherapie Canada·2026

Same author

Effective monitoring of online AI decision-making algorithms in just-in-time adaptive interventions.

NPJ digital medicine·2026

Same author

SigmaScheduling: Uncertainty-Informed Scheduling of Decision Points for Intelligent Mobile Health Interventions.

... International Conference on Wearable and Implantable Body Sensor Networks. International Conference on Wearable and Implantable Body Sensor Networks·2026

Same author

Non-Stationary Latent Auto-Regressive Bandits.

Reinforcement learning journal·2026

Same author

Harnessing Causality in Reinforcement Learning With Bagged Decision Times.

Proceedings of machine learning research·2026

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Videos

Linear Fitted-Q Iteration with Multiple Reward Functions.

Daniel J Lizotte¹, Michael Bowling, Susan A Murphy

¹David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada, DLIZOTTE@UWATERLOO.CA.

Journal of Machine Learning Research : JMLR

|June 7, 2013

Summary

This summary is machine-generated.

We developed a new algorithm for reinforcement learning with multiple rewards, applicable to complex medical decisions. This approach aids in creating decision support tools for personalized patient care.

Keywords:

decision making dynamic programming linear regression preference elicitation reinforcement learning

Related Experiment Videos

Area of Science:

Machine Learning
Computational Geometry
Clinical Decision Support

Background:

Fitted-Q iteration is a key reinforcement learning algorithm.
Handling multiple reward signals and complex state features presents challenges.
Linear value function approximation is widely used in reinforcement learning.

Purpose of the Study:

To develop a general algorithm for finite-horizon fitted-Q iteration with multiple reward signals.
To incorporate linear value function approximation with arbitrary state features.
To demonstrate the algorithm's application in clinical decision support.

Main Methods:

Developed a generalized algorithm for fitted-Q iteration.
Utilized triangulation primitives from computational geometry for the 3-reward case.
Implemented a method for identifying globally dominated actions.
Applied the algorithm to sequential treatments for schizophrenia.

Main Results:

Successfully developed a general algorithm for multi-reward fitted-Q iteration.
Demonstrated a practical application in a clinical decision aid for schizophrenia treatment.
Showcased the identification of dominated actions and handling of multiple objectives.

Conclusions:

The developed algorithm provides a robust framework for multi-objective reinforcement learning.
This work advances evidence-based clinical decision support systems.
Future research can further enhance the impact on healthcare.