Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.5K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.5K
Estimating Population Standard Deviation01:26

Estimating Population Standard Deviation

2.5K
When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...
2.5K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

6.2K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
6.2K
Estimating Population Mean with Known Standard Deviation01:16

Estimating Population Mean with Known Standard Deviation

7.3K
To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...
7.3K
Prediction Intervals01:03

Prediction Intervals

2.5K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.5K
Confidence Interval for Estimating Population Mean01:25

Confidence Interval for Estimating Population Mean

7.6K
A point estimate of the population mean is obtained from a single sample. Such a point estimate does not represent a population well because it needs to account for variability in the population. Single point estimate can also be biased despite the sample being selected randomly. Thus, a point estimate is often unreliable. A confidence interval is needed to reduce this unreliability.
A confidence interval for the mean is a range of values that provides an estimate of the population mean. As the...
7.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Episodic memory trajectories of older adults with and without HIV: A longitudinal population-based study in rural South Africa.

PLOS global public health·2026
Same author

Finding distributions that differ, with false discovery rate control.

Biometrika·2026
Same author

Unpacking sources of transmission in HIV prevention trials with deep-sequence pathogen data.

Nature communications·2026
Same author

Test-negative Designs with Various Reasons for Testing: Statistical Bias and Solution.

Epidemiology (Cambridge, Mass.)·2025
Same author

Clarifying Contradictions: Transportability in 17OHP-C Trials and Preterm Birth Outcomes Using Doubly Debiased Machine Learning.

American journal of epidemiology·2025
Same author

Sharp-SSL: Selective High-Dimensional Axis-Aligned Random Projections for Semi-Supervised Learning.

Journal of the American Statistical Association·2025
Same journal

GENERALIZATION ERROR BOUNDS OF DYNAMIC TREATMENT REGIMES IN PENALIZED REGRESSION-BASED LEARNING.

Annals of statistics·2026
Same journal

TESTING HIGH-DIMENSIONAL REGRESSION COEFFICIENTS IN LINEAR MODELS.

Annals of statistics·2026
Same journal

COUNTERFACTUAL INFERENCE IN SEQUENTIAL EXPERIMENTS.

Annals of statistics·2026
Same journal

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.

Annals of statistics·2025
Same journal

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA.

Annals of statistics·2025
Same journal

NONLINEAR GLOBAL FRÉCHET REGRESSION FOR RANDOM OBJECTS VIA WEAK CONDITIONAL EXPECTATION.

Annals of statistics·2025
See all related articles
  1. Home
  2. Efficient And Multiply Robust Risk Estimation Under General Forms Of Dataset Shift.
  1. Home
  2. Efficient And Multiply Robust Risk Estimation Under General Forms Of Dataset Shift.

Related Experiment Video

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.3K

EFFICIENT AND MULTIPLY ROBUST RISK ESTIMATION UNDER GENERAL FORMS OF DATASET SHIFT.

Hongxiang Qiu1, Eric Tchetgen Tchetgen2, Edgar Dobriban2

  • 1Department of Epidemiology and Biostatistics, Michigan State University.

Annals of Statistics
|April 22, 2026

View abstract on PubMed

Summary
This summary is machine-generated.

This study develops efficient methods for estimating target population risk using auxiliary data, even with dataset shift. These techniques improve machine learning accuracy by leveraging domain adaptation and transfer learning strategies.

Keywords:
62G2068Q32Dataset shiftdomain adaptationefficiencymultiple robustnesssemiparametric modeltransfer learning

Related Experiment Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.3K

Area of Science:

  • Statistical Machine Learning
  • Data Science
  • Causal Inference

Background:

  • Machine learning models often suffer from limited target population data.
  • Auxiliary data from related populations can mitigate this data scarcity.
  • Existing domain adaptation and transfer learning methods have limitations in efficient risk evaluation.

Purpose of the Study:

  • To develop efficient estimators for target population risk under various dataset shift conditions.
  • To address the challenge of limited data in statistical machine learning.
  • To improve the accuracy of risk evaluation in target domains using auxiliary data.

Main Methods:

  • Leveraging semiparametric efficiency theory for risk estimation.
  • Developing efficient and multiply robust estimators.
  • Considering a general class of dataset shift conditions, including covariate, label, and concept shift.
  • Allowing for partially nonoverlapping support between source and target populations.
  • Main Results:

    • Efficient estimators for target population risk were developed.
    • A straightforward specification test for dataset shift conditions was created.
    • Efficiency bounds were derived for posterior drift and location-scale shift.
    • Simulation studies confirmed efficiency gains from utilizing dataset shift conditions.

    Conclusions:

    • The proposed methods offer significant efficiency gains for risk estimation under dataset shift.
    • The developed techniques enhance the utility of auxiliary data in machine learning.
    • This work provides a robust framework for addressing data scarcity and domain adaptation challenges.