Randomized Experiments
Bandpass Sampling
Decision Making: P-value Method
The Anchoring-and-Adjustment Heuristic
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
Updated: Oct 7, 2025

Operant Protocols for Assessing the Cost-benefit Analysis During Reinforced Decision Making by Rodents
Published on: September 10, 2018
Kelly W Zhang1, Lucas Janson2, Susan A Murphy3
1Department of Computer Science, Harvard University.
This study addresses the challenge of drawing accurate statistical conclusions from data collected through adaptive bandit algorithms, which are widely used in science and industry. The authors demonstrate that standard statistical tools, like ordinary least squares, fail to provide reliable results when data is collected adaptively without a unique optimal choice. To solve this, they introduce a new estimation method that ensures valid statistical confidence and error control.
Area of Science:
Background:
Adaptive decision-making systems often generate data that challenges traditional statistical assumptions. No prior work had resolved how standard estimation techniques behave when applied to sequentially collected observations. Researchers frequently rely on simple linear models for decision analysis. That uncertainty drove concerns regarding the validity of resulting error estimates. It was already known that standard methods assume independent and identically distributed samples. However, bandit algorithms inherently violate these independence requirements during the collection process. This gap motivated a closer examination of how adaptive sampling affects estimator distributions. The current literature lacks robust frameworks for handling batch-collected information in these settings.
Purpose Of The Study:
The aim of this work is to develop reliable inference methods for data collected using bandit algorithms. This study addresses the increasing need for statistical validity in industrial and scientific applications. Researchers investigate the limitations of standard estimation techniques when applied to adaptively collected information. The authors seek to resolve the problem of asymptotic non-normality in linear estimators. This motivation stems from the observation that naive assumptions lead to inflated error rates. The team intends to provide a robust framework that functions across multi-arm and contextual bandit settings. They focus on creating a method that remains effective despite non-stationarity in baseline rewards. This effort provides a foundation for more accurate decision analysis in sequential environments.
Main Methods:
The review approach involves a formal mathematical analysis of estimator behavior under adaptive sampling protocols. Investigators evaluate the asymptotic properties of the ordinary least squares estimator within sequential decision frameworks. The team constructs a new estimation procedure designed specifically for batch-collected information. This design process focuses on ensuring normality across multi-arm and contextual bandit configurations. Researchers compare the performance of their proposed method against traditional linear regression benchmarks. The study utilizes theoretical proofs to establish the convergence characteristics of the new estimator. This approach systematically addresses the failure of classical assumptions in adaptive settings. The authors validate their findings by demonstrating robustness against non-stationary reward signals.
Main Results:
Key findings from the literature indicate that the ordinary least squares estimator is not asymptotically normal when no unique optimal arm exists. This failure leads to significant Type-1 error inflation and unreliable confidence intervals. The authors demonstrate that the Batched OLS estimator achieves asymptotic normality for both multi-arm and contextual bandit data. This result holds even when the baseline reward experiences non-stationarity. The analysis confirms that the proposed method provides better coverage probabilities than standard approaches. The study proves that the new estimator remains stable across diverse adaptive sampling environments. These findings quantify the risks associated with naive statistical assumptions in sequential decision-making. The results establish a formal framework for reliable inference in complex bandit applications.
Conclusions:
The authors demonstrate that standard linear estimators fail to maintain normal distributions under adaptive sampling conditions. This synthesis implies that naive statistical approaches often produce misleading confidence intervals in bandit settings. The researchers propose the Batched OLS estimator as a reliable alternative for multi-arm environments. This new method maintains asymptotic normality even when the underlying reward structures change over time. The findings suggest that practitioners should adopt these adjusted estimators to avoid inflated error rates. The work provides a formal basis for ensuring statistical validity in adaptive decision systems. These results clarify the limitations of applying classical regression techniques to sequential data streams. The study confirms that robust inference is achievable through specifically designed batch-based estimation procedures.
The researchers propose the Batched OLS estimator, which maintains asymptotic normality. In contrast, the standard ordinary least squares estimator exhibits asymptotic non-normality when no unique optimal arm exists, leading to inaccurate confidence intervals.
The authors utilize the Batched OLS estimator to handle data collected from multi-arm and contextual bandits. This tool specifically addresses the non-normality issues encountered when using traditional linear regression on adaptively sampled information.
The authors prove that the standard ordinary least squares estimator is not asymptotically normal when there is no unique optimal arm. This technical necessity arises because adaptive bandit algorithms violate the independence assumptions required for classical normality.
The researchers use batched data to ensure statistical validity. This component plays a role in stabilizing the estimation process, allowing the Batched OLS estimator to remain robust even when baseline rewards exhibit non-stationarity.
The authors measure the asymptotic normality of estimators. They observe that while the standard approach suffers from Type-1 error inflation, the Batched OLS method provides better coverage probabilities in multi-arm and contextual bandit scenarios.
The researchers claim that their new estimator is robust to non-stationarity in baseline rewards. They imply that this property makes the method suitable for real-world applications where reward distributions may shift over time.