Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Randomized Experiments01:13

Randomized Experiments

8.6K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.6K
Random Sampling Method01:09

Random Sampling Method

13.7K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
13.7K
Random Variables01:09

Random Variables

16.7K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
16.7K
Cluster Sampling Method01:20

Cluster Sampling Method

13.7K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.7K
The Anchoring-and-Adjustment Heuristic01:25

The Anchoring-and-Adjustment Heuristic

7.6K
In order to make good decisions, we use our knowledge and our reasoning. Often, this knowledge and reasoning is sound and solid. However, sometimes, we are swayed by biases or by others manipulating a situation. For example, let’s say you and three friends wanted to rent a house and had a combined target budget of $1,600. The realtor shows you only very run-down houses for $1,600 and then shows you a very nice house for $2,000. Might you ask each person to pay more in rent to get the...
7.6K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

375
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
375

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Shorebird loss increases soil CO<sub>2</sub> emissions in coastal wetlands under restoration.

Fundamental research·2026
Same author

Evidence for an Electronically Driven Charge Density Wave in a 1D Metallic MOF.

ACS central science·2026
Same author

Redox-Active Doubly Boron-Doped Indenofluorenes: Isomer-Directed Access to Open- and Closed-Shell Electronic States Induces Strong Near-Infrared Activity.

Journal of the American Chemical Society·2026
Same author

Mammalian-like steroidogenesis in plants gives rise to endocrine-mimetic cardenolides.

Science advances·2026
Same author

Sterol trafficking in yeast studied by one- and two-photon live-cell imaging of an intrinsically fluorescent ergosterol analog.

Methods and applications in fluorescence·2026
Same author

The risk of psychosis associated with cannabis use by people with attention-deficit hyperactivity disorder: A systematic review.

Australasian psychiatry : bulletin of Royal Australian and New Zealand College of Psychiatrists·2026
Same journal

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
Same journal

fastkqr: A Fast Algorithm for Kernel Quantile Regression.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
Same journal

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
Same journal

Joint Registration and Conformal Prediction for Partially Observed Functional Data.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
Same journal

Efficient Decision Trees for Tensor Regressions.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
Same journal

Distributed Nonparametric Regression with Heterogeneity Through Prediction-Based Aggregation.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026
See all related articles

Related Experiment Video

Updated: Nov 21, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.8K

Consensus Monte Carlo for Random Subsets using Shared Anchors.

Yang Ni1, Yuan Ji2, Peter Müller3

  • 1Department of Statistics, Texas A&M University.

Journal of Computational and Graphical Statistics : a Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
|January 18, 2021
PubMed
Summary
This summary is machine-generated.

We developed a scalable Monte Carlo algorithm for big data clustering and feature allocation using Bayesian nonparametric models. This method works with various sampling models and priors, demonstrated on image, mutation, and health records datasets.

Keywords:
Big dataelectronic health recordsimage clusterparallel computingtumor heterogeneity

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.4K
Primer-Free Aptamer Selection Using A Random DNA Library
11:14

Primer-Free Aptamer Selection Using A Random DNA Library

Published on: July 26, 2010

25.1K

Related Experiment Videos

Last Updated: Nov 21, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.8K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.4K
Primer-Free Aptamer Selection Using A Random DNA Library
11:14

Primer-Free Aptamer Selection Using A Random DNA Library

Published on: July 26, 2010

25.1K

Area of Science:

  • Computational Statistics
  • Machine Learning
  • Data Science

Background:

  • Existing Bayesian nonparametric models for clustering and feature allocation struggle to scale to big data challenges.
  • Efficient algorithms are needed to handle large datasets in statistical modeling and machine learning.

Purpose of the Study:

  • To introduce a consensus Monte Carlo algorithm that enables Bayesian nonparametric models to scale for big data applications.
  • To validate the algorithm's applicability across diverse priors and sampling models for clustering and feature allocation.

Main Methods:

  • Developed a consensus Monte Carlo algorithm designed for scalability with big data.
  • The algorithm supports various priors on random subsets (e.g., partitions, latent feature allocation) and sampling models.
  • Focused on Dirichlet process mixture models, Indian buffet process priors (binomial sampling), and categorical sampling models.

Main Results:

  • The proposed algorithm effectively scales Bayesian nonparametric models for big data.
  • Demonstrated successful inference on challenging datasets: MNIST images, pancreatic cancer mutations, and electronic health records (EHR).
  • Simulation studies confirmed the algorithm's validity and performance.

Conclusions:

  • The consensus Monte Carlo algorithm provides a scalable solution for Bayesian nonparametric big data analysis.
  • The method is versatile, applicable to a wide range of clustering and feature allocation problems.
  • Successful application to real-world datasets highlights its practical utility in diverse scientific domains.