Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

Cognitive Learning

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Linear time-invariant Systems

Linear time-invariant Systems

A system is linear if it displays the characteristics of homogeneity and additivity, together termed the superposition property. This principle is fundamental in all linear systems. Linear time-invariant (LTI) systems include systems with linear elements and constant parameters.
The input-output behavior of an LTI system can be fully defined by its response to an impulsive excitation at its input. Once this impulse response is known, the system's reaction to any other input can be...

Instinctive Drift

Instinctive Drift

Instinctive drift refers to the tendency of animals to revert to their innate behaviors despite repeated reinforcement. Breland and Breland demonstrated this concept in an experiment with a raccoon. The raccoon was trained to pick up two coins and place them in a container in exchange for food. Initially, the raccoon learned to associate the coins with food, making them a conditioned stimulus or a substitute for food. However, over time, the raccoon became less willing to put the coins into the...

Causality in Epidemiology

Causality in Epidemiology

Causality or causation is a fundamental concept in epidemiology, vital for understanding the relationships between various factors and health outcomes. Despite its importance, there's no single, universally accepted definition of causality within the discipline. Drawing from a systematic review, causality in epidemiology encompasses several definitions, including production, necessary and sufficient, sufficient-component, counterfactual, and probabilistic models. Each has its strengths and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Family-based preventive intervention for children of parents with severe mental illness: A randomized clinical trial.

JCPP advances·2024

Same author

Effect-Invariant Mechanisms for Policy Generalization.

Journal of machine learning research : JMLR·2024

Same author

Local Independence Testing for Point Processes.

IEEE transactions on neural networks and learning systems·2023

Same author

Supervised learning and model analysis with compositional data.

PLoS computational biology·2023

Same author

Interpreting tree ensemble machine learning models with endoR.

PLoS computational biology·2022

Same author

Multiomic profiling of the liver across diets and age in a diverse mouse population.

Cell systems·2021

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

Invariant Policy Learning: A Causal Perspective.

Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 5, 2023

Summary

This summary is machine-generated.

This study introduces multi-environment contextual bandits to address environmental shifts in offline settings. An optimal invariant policy is shown to generalize across environments, enhancing the applicability of contextual bandits in high-stakes domains like healthcare.

More Related Videos

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Related Experiment Videos

Last Updated: Aug 4, 2025

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

Pavlovian Conditioned Approach Training in Rats

Pavlovian Conditioned Approach Training in Rats

Published on: February 4, 2016

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Area of Science:

Machine Learning
Causal Inference
Reinforcement Learning

Background:

Contextual bandit and reinforcement learning algorithms are effective in online advertising and recommender systems.
Widespread adoption in high-stakes domains like healthcare is limited due to the assumption of static underlying mechanisms.
Environmental shifts across different settings can invalidate existing static environment assumptions.

Purpose of the Study:

To address the challenge of environmental shifts within the framework of offline contextual bandits.
To propose a novel approach for handling dynamic changes in underlying mechanisms across environments.
To enhance the generalizability of contextual bandit policies in real-world, non-static systems.

Main Methods:

Framing the environmental shift problem through the lens of causality.
Introducing multi-environment contextual bandits to accommodate changes in underlying mechanisms.
Adopting the concept of invariance from causality literature and defining policy invariance.

Main Results:

Policy invariance is particularly relevant when unobserved variables are present.
An optimal invariant policy is demonstrated to generalize across environments under specific assumptions.
The proposed framework allows for changes in underlying mechanisms, overcoming limitations of static models.

Conclusions:

The developed multi-environment contextual bandit approach offers a robust solution for systems with environmental shifts.
The concept of policy invariance provides a theoretical guarantee for generalization in the presence of unobserved confounders.
This work paves the way for wider adoption of contextual bandits in high-stakes applications like healthcare by addressing mechanism shifts.