Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Probability in Statistics

Probability in Statistics

Probability is the likelihood of an event occurring. The term event is defined as a collection of results of a procedure. An event is a simple event when an outcome cannot be divided into simpler parts.
An example of a simple event is a coin toss. The result of a coin toss is either a head or a tail. Here, head and tail are two simple events. These two simple events make up the sample space. Further, the probability of an event occurring falls within the range of 0 to 1. The probability of an...

Probability Histograms

Probability Histograms

A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Probability Distributions

Probability Distributions

The probability of a random variable x is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Stratified Sampling Method

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prime Editing of Phytoene Synthase 1 in Rice for Seed Carotenoid Biofortification.

Plant biotechnology journal·2026

Same author

Real-world evaluation and management of osteoporosis in postmenopausal women following distal radius fracture.

Journal of clinical densitometry : the official journal of the International Society for Clinical Densitometry·2026

Same author

Generalized entropy calibration for analyzing voluntary survey data.

Biometrics·2026

Same author

Comprehensive metabolomics and phytochemical analyses identified important metabolites involved in the antioxidant activity of four Swiss chard cultivars (<i>Beta vulgaris</i> L. var. cicla) with different leaf colours.

Food chemistry: X·2026

Same author

Mutation of STAY-GREEN 1 in tomato increases volatile organic compounds during fruit ripening.

Plant & cell physiology·2026

Same author

Prognostic value of clinical and electrodiagnostic factors after corticosteroid injection in carpal tunnel syndrome.

Plastic and reconstructive surgery·2026

Same journal

Simplifying debiased inference via automatic differentiation and probabilistic programming.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Principal stratification with U-statistics under principal ignorability.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Correction to: Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 1, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Doubly robust inference when combining probability and non-probability samples with high dimensional data.

Shu Yang¹, Jae Kwang Kim², Rui Song¹

¹North Carolina State University, Raleigh, USA.

Journal of the Royal Statistical Society. Series B, Statistical Methodology

|November 9, 2020

Summary

This summary is machine-generated.

This study introduces a two-step method for combining non-probability and probability samples, improving variable selection and finite population inference for representative covariate data.

Keywords:

Data integration Double robustness Generalizability Penalized estimating equation Variable selection

More Related Videos

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Dec 1, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Statistics
Survey Methodology
Data Science

Background:

Integrating non-probability and probability samples presents challenges in statistical inference.
High-dimensional covariate information from probability samples is valuable for target population analysis.
Existing methods may struggle with variable selection and ensuring robustness.

Purpose of the Study:

To develop a robust two-step approach for variable selection and finite population inference.
To effectively combine non-probability samples with probability samples containing rich covariate data.
To enhance the accuracy and reliability of statistical estimates from complex survey data.

Main Methods:

A two-step procedure involving penalized estimating equations with folded concave penalties for variable selection.
Utilizing a doubly robust estimator for finite population mean estimation.
Re-estimating nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator.

Main Results:

Demonstrated selection consistency for important variables across general samples in the first step.
Developed a doubly robust estimator that is root-n consistent under weaker model assumptions.
The proposed strategy mitigates potential first-step selection errors, enhancing overall estimator performance.

Conclusions:

The proposed two-step method offers a robust framework for integrating diverse data sources.
Variable selection and finite population inference are improved by combining penalized estimating equations and doubly robust estimation.
This approach provides reliable estimates even when either the sampling probability or outcome model is misspecified.