Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Sampling Distribution

Sampling Distribution

Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...

Probability Histograms

Probability Histograms

A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Robust discovery of mutational signatures using power posteriors.

PLoS computational biology·2026

Same author

Local graph estimation with pathwise false discovery control.

Nature communications·2026

Same author

Manufacturing-aware generative models enable petascale synthesis of designed DNA.

Nature biotechnology·2026

Same author

Integrated path stability selection.

Journal of the American Statistical Association·2026

Same author

Reproducible parameter inference using bagged posteriors.

Electronic journal of statistics·2025

Same author

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.

Biostatistics (Oxford, England)·2025

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 29, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Bayesian Data Selection.

Eli N Weinstein¹, Jeffrey W Miller²

¹Data Science Institute, Columbia University, New York, NY 10027, USA.

Journal of Machine Learning Research : JMLR

|May 19, 2023

Summary

This summary is machine-generated.

We introduce a new method, the Stein Volume Criterion (SVC), for selecting relevant features in complex, high-dimensional data. This approach efficiently identifies data subsets that align with specific models without needing computationally intensive nonparametric modeling.

Keywords:

Bayesian nonparametrics Bayesian theory Stein discrepancy consistency misspecification

More Related Videos

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Jul 29, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Statistics
Machine Learning
Computational Biology

Background:

Analyzing complex, high-dimensional data requires identifying relevant features that match or deviate from a model.
The "data selection" problem aims to find a lower-dimensional statistic, like a variable subset, that fits a parametric model.
Traditional Bayesian methods for data selection struggle with high-dimensional data due to inefficient nonparametric modeling.

Purpose of the Study:

To propose a novel and efficient score for data selection in high-dimensional settings.
To address the computational and statistical inefficiencies of existing Bayesian approaches.
To enable robust feature discovery that aligns with parametric models of interest.

Main Methods:

Introduction of the "Stein volume criterion" (SVC) for data selection.
SVC utilizes a generalized marginal likelihood incorporating a kernelized Stein discrepancy.
Theoretical proofs establish the consistency of SVC for data selection and the properties of the generalized posterior.

Main Results:

The SVC provides a consistent method for data selection without fitting nonparametric models.
Demonstrated consistency and asymptotic normality of the generalized posterior on parameters.
Successful application of SVC to single-cell RNA sequencing data analysis.

Conclusions:

The Stein Volume Criterion (SVC) offers an efficient and statistically sound approach to data selection in high-dimensional data.
SVC overcomes the limitations of traditional methods by avoiding nonparametric model fitting.
The method shows promise for applications in complex biological data analysis, such as single-cell RNA sequencing.