Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Frequency-dependent Selection01:21

Frequency-dependent Selection

22.1K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
22.1K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.1K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.1K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

166
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
166
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.7K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.7K
Sampling Distribution01:12

Sampling Distribution

13.2K
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...
13.2K
Probability Histograms01:17

Probability Histograms

11.8K
A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.
11.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Robust discovery of mutational signatures using power posteriors.

PLoS computational biology·2026
Same author

Local graph estimation with pathwise false discovery control.

Nature communications·2026
Same author

Manufacturing-aware generative models enable petascale synthesis of designed DNA.

Nature biotechnology·2026
Same author

Integrated path stability selection.

Journal of the American Statistical Association·2026
Same author

Reproducible parameter inference using bagged posteriors.

Electronic journal of statistics·2025
Same author

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.

Biostatistics (Oxford, England)·2025
Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026
Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Jul 29, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Bayesian Data Selection.

Eli N Weinstein1, Jeffrey W Miller2

  • 1Data Science Institute, Columbia University, New York, NY 10027, USA.

Journal of Machine Learning Research : JMLR
|May 19, 2023
PubMed
Summary
This summary is machine-generated.

We introduce a new method, the Stein Volume Criterion (SVC), for selecting relevant features in complex, high-dimensional data. This approach efficiently identifies data subsets that align with specific models without needing computationally intensive nonparametric modeling.

Keywords:
Bayesian nonparametricsBayesian theoryStein discrepancyconsistencymisspecification

More Related Videos

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Related Experiment Videos

Last Updated: Jul 29, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Area of Science:

  • Statistics
  • Machine Learning
  • Computational Biology

Background:

  • Analyzing complex, high-dimensional data requires identifying relevant features that match or deviate from a model.
  • The "data selection" problem aims to find a lower-dimensional statistic, like a variable subset, that fits a parametric model.
  • Traditional Bayesian methods for data selection struggle with high-dimensional data due to inefficient nonparametric modeling.

Purpose of the Study:

  • To propose a novel and efficient score for data selection in high-dimensional settings.
  • To address the computational and statistical inefficiencies of existing Bayesian approaches.
  • To enable robust feature discovery that aligns with parametric models of interest.

Main Methods:

  • Introduction of the "Stein volume criterion" (SVC) for data selection.
  • SVC utilizes a generalized marginal likelihood incorporating a kernelized Stein discrepancy.
  • Theoretical proofs establish the consistency of SVC for data selection and the properties of the generalized posterior.

Main Results:

  • The SVC provides a consistent method for data selection without fitting nonparametric models.
  • Demonstrated consistency and asymptotic normality of the generalized posterior on parameters.
  • Successful application of SVC to single-cell RNA sequencing data analysis.

Conclusions:

  • The Stein Volume Criterion (SVC) offers an efficient and statistically sound approach to data selection in high-dimensional data.
  • SVC overcomes the limitations of traditional methods by avoiding nonparametric model fitting.
  • The method shows promise for applications in complex biological data analysis, such as single-cell RNA sequencing.