Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

The Anchoring-and-Adjustment Heuristic

The Anchoring-and-Adjustment Heuristic

In order to make good decisions, we use our knowledge and our reasoning. Often, this knowledge and reasoning is sound and solid. However, sometimes, we are swayed by biases or by others manipulating a situation. For example, let’s say you and three friends wanted to rent a house and had a combined target budget of $1,600. The realtor shows you only very run-down houses for $1,600 and then shows you a very nice house for $2,000. Might you ask each person to pay more in rent to get the...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Shorebird loss increases soil CO<sub>2</sub> emissions in coastal wetlands under restoration.

Fundamental research·2026

Same author

Evidence for an Electronically Driven Charge Density Wave in a 1D Metallic MOF.

ACS central science·2026

Same author

Redox-Active Doubly Boron-Doped Indenofluorenes: Isomer-Directed Access to Open- and Closed-Shell Electronic States Induces Strong Near-Infrared Activity.

Journal of the American Chemical Society·2026

Same author

Mammalian-like steroidogenesis in plants gives rise to endocrine-mimetic cardenolides.

Science advances·2026

Same author

Sterol trafficking in yeast studied by one- and two-photon live-cell imaging of an intrinsically fluorescent ergosterol analog.

Methods and applications in fluorescence·2026

Same author

The risk of psychosis associated with cannabis use by people with attention-deficit hyperactivity disorder: A systematic review.

Australasian psychiatry : bulletin of Royal Australian and New Zealand College of Psychiatrists·2026

Same journal

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

fastkqr: A Fast Algorithm for Kernel Quantile Regression.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Joint Registration and Conformal Prediction for Partially Observed Functional Data.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Efficient Decision Trees for Tensor Regressions.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Distributed Nonparametric Regression with Heterogeneity Through Prediction-Based Aggregation.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 21, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Consensus Monte Carlo for Random Subsets using Shared Anchors.

Yang Ni¹, Yuan Ji², Peter Müller³

¹Department of Statistics, Texas A&M University.

Journal of Computational and Graphical Statistics : a Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

|January 18, 2021

Summary

This summary is machine-generated.

We developed a scalable Monte Carlo algorithm for big data clustering and feature allocation using Bayesian nonparametric models. This method works with various sampling models and priors, demonstrated on image, mutation, and health records datasets.

Keywords:

Big data electronic health records image cluster parallel computing tumor heterogeneity

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Primer-Free Aptamer Selection Using A Random DNA Library

Primer-Free Aptamer Selection Using A Random DNA Library

Published on: July 26, 2010

Related Experiment Videos

Last Updated: Nov 21, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Primer-Free Aptamer Selection Using A Random DNA Library

Primer-Free Aptamer Selection Using A Random DNA Library

Published on: July 26, 2010

Area of Science:

Computational Statistics
Machine Learning
Data Science

Background:

Existing Bayesian nonparametric models for clustering and feature allocation struggle to scale to big data challenges.
Efficient algorithms are needed to handle large datasets in statistical modeling and machine learning.

Purpose of the Study:

To introduce a consensus Monte Carlo algorithm that enables Bayesian nonparametric models to scale for big data applications.
To validate the algorithm's applicability across diverse priors and sampling models for clustering and feature allocation.

Main Methods:

Developed a consensus Monte Carlo algorithm designed for scalability with big data.
The algorithm supports various priors on random subsets (e.g., partitions, latent feature allocation) and sampling models.
Focused on Dirichlet process mixture models, Indian buffet process priors (binomial sampling), and categorical sampling models.

Main Results:

The proposed algorithm effectively scales Bayesian nonparametric models for big data.
Demonstrated successful inference on challenging datasets: MNIST images, pancreatic cancer mutations, and electronic health records (EHR).
Simulation studies confirmed the algorithm's validity and performance.

Conclusions:

The consensus Monte Carlo algorithm provides a scalable solution for Bayesian nonparametric big data analysis.
The method is versatile, applicable to a wide range of clustering and feature allocation problems.
Successful application to real-world datasets highlights its practical utility in diverse scientific domains.