Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

Probability in Statistics

Probability in Statistics

Probability is the likelihood of an event occurring. The term event is defined as a collection of results of a procedure. An event is a simple event when an outcome cannot be divided into simpler parts.
An example of a simple event is a coin toss. The result of a coin toss is either a head or a tail. Here, head and tail are two simple events. These two simple events make up the sample space. Further, the probability of an event occurring falls within the range of 0 to 1. The probability of an...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Probability Histograms

Probability Histograms

A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.

Probability Distributions

Probability Distributions

The probability of a random variable x is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Metagenome-scale Modeling to Assess Microbiome Metabolic Complementarity for Precision Microbiota Transplantation Therapies.

bioRxiv : the preprint server for biology·2026

Same author

Emergent eukaryotic directional sensing via receptor degradation and diffusion.

Proceedings of the National Academy of Sciences of the United States of America·2025

Same author

Decay in transcriptional information flow is a hallmark of cellular aging.

bioRxiv : the preprint server for biology·2025

Same author

Non-equilibrium strategies enabling ligand specificity by signaling receptors.

eLife·2025

Same author

Designing host-associated microbiomes using the consumer/resource model.

mSystems·2024

Same author

Directional Sensing by Eukaryotic Receptors.

bioRxiv : the preprint server for biology·2024

Same journal

Another 10 years of PLOS Computational Biology: A data-driven reflection on trends in genomics research.

PLoS computational biology·2026

Same journal

Mobility data resolution needed to inform predictive models of spatial epidemic spread from mobile phone data.

PLoS computational biology·2026

Same journal

DeepMethylation: A deep learning framework for tissue-specific DNA methylation prediction and functional variant annotation.

PLoS computational biology·2026

Same journal

Redefining and estimating the early-phase reproduction ratio for epidemic outbreaks in spatially structured populations.

PLoS computational biology·2026

Same journal

Optimized phenotype definitions boost GWAS power.

PLoS computational biology·2026

Same journal

Detection, communication, and individual identification with deep audio embeddings: A case study with North Atlantic right whales.

PLoS computational biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

SiGMoiD: A super-statistical generative model for binary data.

Xiaochuan Zhao¹, Germán Plata², Purushottam D Dixit^1,3

¹Department of Physics, University of Florida, Gainesville, Florida, United States of America.

Plos Computational Biology

|August 6, 2021

Summary

This summary is machine-generated.

Super-statistical Generative Model for binary Data (SiGMoiD) infers constraints directly from data, enabling efficient probabilistic modeling of large binary variable collections. This approach models complex biological data with over 1000 variables, identifying clusters and reducing dimensionality.

More Related Videos

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Oct 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Area of Science:

Computational Biology
Statistical Modeling
Machine Learning

Background:

Probabilistic models for co-varying binary variables are crucial in computational biology.
Existing generative models face computational expense and require manual constraint identification for large datasets (N~100).

Purpose of the Study:

To introduce Super-statistical Generative Model for binary Data (SiGMoiD), a novel framework to address limitations in modeling large binary datasets.
To develop a computationally efficient method for inferring constraints directly from data, enabling scalable probabilistic modeling.

Main Methods:

SiGMoiD utilizes a maximum entropy-based framework, conceptualizing data as arising from a super-statistical system.
The algorithm infers constraints directly from the data, bypassing the need for manual specification.
The model handles a large number of binary variables (N>1000) and provides a reduced dimensional data description.

Main Results:

SiGMoiD successfully models collections of very large numbers of binary variables (N>1000).
The framework infers optimal constraints directly from data, enhancing efficiency and scalability.
Reduced dimensionality allows for effective identification of data point and variable clusters.

Conclusions:

SiGMoiD offers a versatile and efficient solution for building probabilistic generative models for large-scale binary data.
The method's ability to infer constraints and reduce dimensionality makes it suitable for diverse biological datasets across various scales.
SiGMoiD advances computational biology by enabling more scalable and insightful analysis of complex binary datasets.