Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Estimating Population Mean with Known Standard Deviation

Estimating Population Mean with Known Standard Deviation

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Confidence Interval for Estimating Population Mean

Confidence Interval for Estimating Population Mean

A point estimate of the population mean is obtained from a single sample. Such a point estimate does not represent a population well because it needs to account for variability in the population. Single point estimate can also be biased despite the sample being selected randomly. Thus, a point estimate is often unreliable. A confidence interval is needed to reduce this unreliability.
A confidence interval for the mean is a range of values that provides an estimate of the population mean. As the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Episodic memory trajectories of older adults with and without HIV: A longitudinal population-based study in rural South Africa.

PLOS global public health·2026

Same author

Finding distributions that differ, with false discovery rate control.

Biometrika·2026

Same author

Unpacking sources of transmission in HIV prevention trials with deep-sequence pathogen data.

Nature communications·2026

Same author

Test-negative Designs with Various Reasons for Testing: Statistical Bias and Solution.

Epidemiology (Cambridge, Mass.)·2025

Same author

Clarifying Contradictions: Transportability in 17OHP-C Trials and Preterm Birth Outcomes Using Doubly Debiased Machine Learning.

American journal of epidemiology·2025

Same author

Sharp-SSL: Selective High-Dimensional Axis-Aligned Random Projections for Semi-Supervised Learning.

Journal of the American Statistical Association·2025

Same journal

GENERALIZATION ERROR BOUNDS OF DYNAMIC TREATMENT REGIMES IN PENALIZED REGRESSION-BASED LEARNING.

Annals of statistics·2026

Same journal

TESTING HIGH-DIMENSIONAL REGRESSION COEFFICIENTS IN LINEAR MODELS.

Annals of statistics·2026

Same journal

COUNTERFACTUAL INFERENCE IN SEQUENTIAL EXPERIMENTS.

Annals of statistics·2026

Same journal

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.

Annals of statistics·2025

Same journal

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA.

Annals of statistics·2025

Same journal

NONLINEAR GLOBAL FRÉCHET REGRESSION FOR RANDOM OBJECTS VIA WEAK CONDITIONAL EXPECTATION.

Annals of statistics·2025

See all related articles

Search research articles

Home
Efficient And Multiply Robust Risk Estimation Under General Forms Of Dataset Shift.

Home
Efficient And Multiply Robust Risk Estimation Under General Forms Of Dataset Shift.

Related Experiment Video

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

EFFICIENT AND MULTIPLY ROBUST RISK ESTIMATION UNDER GENERAL FORMS OF DATASET SHIFT.

Hongxiang Qiu¹, Eric Tchetgen Tchetgen², Edgar Dobriban²

¹Department of Epidemiology and Biostatistics, Michigan State University.

Annals of Statistics

|April 22, 2026

View abstract on PubMed

Summary

This summary is machine-generated.

This study develops efficient methods for estimating target population risk using auxiliary data, even with dataset shift. These techniques improve machine learning accuracy by leveraging domain adaptation and transfer learning strategies.

Keywords:

62G20 68Q32 Dataset shift domain adaptation efficiency multiple robustness semiparametric model transfer learning

Related Experiment Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Statistical Machine Learning
Data Science
Causal Inference

Background:

Machine learning models often suffer from limited target population data.
Auxiliary data from related populations can mitigate this data scarcity.
Existing domain adaptation and transfer learning methods have limitations in efficient risk evaluation.

Purpose of the Study:

To develop efficient estimators for target population risk under various dataset shift conditions.
To address the challenge of limited data in statistical machine learning.
To improve the accuracy of risk evaluation in target domains using auxiliary data.

Main Methods:

Leveraging semiparametric efficiency theory for risk estimation.
Developing efficient and multiply robust estimators.

Considering a general class of dataset shift conditions, including covariate, label, and concept shift.

Allowing for partially nonoverlapping support between source and target populations.

Main Results:

Efficient estimators for target population risk were developed.
A straightforward specification test for dataset shift conditions was created.
Efficiency bounds were derived for posterior drift and location-scale shift.
Simulation studies confirmed efficiency gains from utilizing dataset shift conditions.

Conclusions:

The proposed methods offer significant efficiency gains for risk estimation under dataset shift.
The developed techniques enhance the utility of auxiliary data in machine learning.
This work provides a robust framework for addressing data scarcity and domain adaptation challenges.