Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Causality in Epidemiology

Causality in Epidemiology

Causality or causation is a fundamental concept in epidemiology, vital for understanding the relationships between various factors and health outcomes. Despite its importance, there's no single, universally accepted definition of causality within the discipline. Drawing from a systematic review, causality in epidemiology encompasses several definitions, including production, necessary and sufficient, sufficient-component, counterfactual, and probabilistic models. Each has its strengths and...

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

Steps in Outbreak Investigation

Steps in Outbreak Investigation

In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same authorSame journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same author

Insights into intraspecific variation and genotyping of <i>Ganoderma lingzhi</i> through pan-mitogenome analysis.

IMA fungus·2026

Same author

Dynamics of Singlet Fission in the TIPS-Pn Cluster: Endothermic or Exothermic?

The journal of physical chemistry letters·2026

Same author

Comprehensive analysis of the chloroplast genome structure and phylogeny of <i>Glochidion puberum</i> (L.) Hutch.

Mitochondrial DNA. Part B, Resources·2026

Same author

Microwave digestion-ICP-MS coupled with molecular docking: unraveling elemental distribution and its correlation with glucose and fructose accumulation in 25 strawberry cultivars.

Food chemistry·2026

Same author

The complete chloroplast genome and phylogenetic analysis of <i>Cephalanthus tetrandrus</i> (Roxb.) Ridsdale & Bakh.f.

Mitochondrial DNA. Part B, Resources·2026

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 11, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Model-Based Causal Discovery for Zero-Inflated Count Data.

Junsouk Choi¹, Yang Ni²

¹Department of Statistics, Texas A&M University, College Station, TX 98195-4322, USA.

Journal of Machine Learning Research : JMLR

|August 12, 2025

Summary

This summary is machine-generated.

We introduce a new zero-inflated generalized hypergeometric directed acyclic graph (ZiG-DAG) model to uncover causal relationships from observational count data with excess zeros. This flexible model accurately captures complex data features and outperforms existing methods in causal structure learning.

Keywords:

Bayesian network Causal identifiability Directed acyclic graph Observational zero-inflated count data Single-cell RNA-sequencing

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Related Experiment Videos

Last Updated: Sep 11, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Area of Science:

Statistics
Bioinformatics
Genomics

Background:

Zero-inflated count data are prevalent across scientific disciplines, including social science, biology, and genomics.
Existing causal discovery methods struggle to accommodate the excess zeros and overdispersion common in multivariate count data.

Purpose of the Study:

Propose a novel zero-inflated generalized hypergeometric directed acyclic graph (ZiG-DAG) model for causal inference from observational zero-inflated count data.
Develop a flexible framework capable of modeling diverse zero-inflated count data types and accommodating both linear and nonlinear causal relationships.

Main Methods:

The ZiG-DAG model leverages a generalized hypergeometric probability distribution family for flexible data modeling.
Causal structure identifiability is proven using a general technique applicable to count data.
Score-based algorithms are employed for efficient causal structure learning.

Main Results:

The proposed ZiG-DAG model demonstrates superior performance in discovering causal structures from observational zero-inflated count data compared to state-of-the-art methods.
Extensive synthetic experiments and a real-world dataset with known ground truth validate the model's effectiveness.
The method successfully reverse-engineered a gene regulatory network from single-cell RNA-sequencing data, showcasing practical utility.

Conclusions:

The ZiG-DAG model offers a robust and flexible approach for causal discovery from complex zero-inflated count data.
The identifiability proof and developed algorithms provide a strong foundation for future causal inference research in this domain.
The model's application in bioinformatics highlights its potential for unraveling biological networks and driving scientific discovery.