Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...

Factorial Design

Factorial Design

Factorial Analysis is an experimental design that applies Analysis of Variance (ANOVA) statistical procedures to examine a change in a dependent variable due to more than one independent variable, also known as factors. Changes in worker productivity can be reasoned, for example, to be influenced by salary and other conditions, such as skill level. One way to test this hypothesis is by categorizing salary into three levels (low, moderate, and high) and skills sets into two levels (entry level...

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics

Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Insights into intraspecific variation and genotyping of <i>Ganoderma lingzhi</i> through pan-mitogenome analysis.

IMA fungus·2026

Same author

Dynamics of Singlet Fission in the TIPS-Pn Cluster: Endothermic or Exothermic?

The journal of physical chemistry letters·2026

Same author

Comprehensive analysis of the chloroplast genome structure and phylogeny of <i>Glochidion puberum</i> (L.) Hutch.

Mitochondrial DNA. Part B, Resources·2026

Same author

Microwave digestion-ICP-MS coupled with molecular docking: unraveling elemental distribution and its correlation with glucose and fructose accumulation in 25 strawberry cultivars.

Food chemistry·2026

Same author

The complete chloroplast genome and phylogenetic analysis of <i>Cephalanthus tetrandrus</i> (Roxb.) Ridsdale & Bakh.f.

Mitochondrial DNA. Part B, Resources·2026

Same author

A near-complete, haplotype-resolved telomere-to-telomere genome assembly of Cannabis sativa reveals complex higher-order repetitive structures.

Plant communications·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

Same journal

Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices.

Journal of machine learning research : JMLR·2026

Same journal

Deep Generative Models: Complexity, Dimensionality, and Approximation.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 23, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Hee Cheol Chung¹, Yang Ni², Irina Gaynanova³

¹Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Journal of Machine Learning Research : JMLR

|June 22, 2026

Summary

This summary is machine-generated.

We developed a new statistical method for analyzing complex biological data from sequencing technologies. Our approach accurately classifies samples, even with skewed and zero-inflated data, outperforming existing methods.

Keywords:

Latent Gaussian copula probit regression robust classification sequencing data skewed data variable selection

More Related Videos

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Published on: May 16, 2022

Related Experiment Videos

Last Updated: Jun 23, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Published on: May 16, 2022

Area of Science:

Bioinformatics
Statistical Genetics
Computational Biology

Background:

Sequencing technologies generate high-dimensional biological data with inherent skewness and zero-inflation.
Linear classification methods face challenges due to violated distribution assumptions in such data.
Existing data transformation methods introduce ambiguity and affect model performance.

Purpose of the Study:

To propose a novel semiparametric framework for discriminant analysis robust to data characteristics.
To address skewness and zero inflation in high-dimensional biological data.
To improve classification accuracy and interpretability for sequencing-based data.

Main Methods:

Developed a semiparametric framework using a truncated latent Gaussian copula model.
Incorporated L1 sparsity regularization for enhanced model interpretability.
Established theoretical consistency of classification directions in high-dimensional settings.

Main Results:

The proposed model effectively handles skewed and zero-inflated data.
Demonstrated robustness against various data transformation methods.
Achieved superior classification accuracy compared to existing approaches.

Conclusions:

The novel framework offers a robust and interpretable solution for discriminant analysis of high-dimensional biological data.
The method shows promise for applications in microbiome, cancer genomics, and single-cell RNA sequencing.
This approach overcomes limitations of traditional methods when dealing with complex biological datasets.