Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...
Factorial Design02:01

Factorial Design

Factorial Analysis is an experimental design that applies Analysis of Variance (ANOVA) statistical procedures to examine a change in a dependent variable due to more than one independent variable, also known as factors. Changes in worker productivity can be reasoned, for example, to be influenced by salary and other conditions, such as skill level. One way to test this hypothesis is by categorizing salary into three levels (low, moderate, and high) and skills sets into two levels (entry level...
Introduction to Nonparametric Statistics01:28

Introduction to Nonparametric Statistics

Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Insights into intraspecific variation and genotyping of <i>Ganoderma lingzhi</i> through pan-mitogenome analysis.

IMA fungus·2026
Same author

Dynamics of Singlet Fission in the TIPS-Pn Cluster: Endothermic or Exothermic?

The journal of physical chemistry letters·2026
Same author

Comprehensive analysis of the chloroplast genome structure and phylogeny of <i>Glochidion puberum</i> (L.) Hutch.

Mitochondrial DNA. Part B, Resources·2026
Same author

Microwave digestion-ICP-MS coupled with molecular docking: unraveling elemental distribution and its correlation with glucose and fructose accumulation in 25 strawberry cultivars.

Food chemistry·2026
Same author

The complete chloroplast genome and phylogenetic analysis of <i>Cephalanthus tetrandrus</i> (Roxb.) Ridsdale & Bakh.f.

Mitochondrial DNA. Part B, Resources·2026
Same author

A near-complete, haplotype-resolved telomere-to-telomere genome assembly of Cannabis sativa reveals complex higher-order repetitive structures.

Plant communications·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
Same journal

Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices.

Journal of machine learning research : JMLR·2026
Same journal

Deep Generative Models: Complexity, Dimensionality, and Approximation.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Jun 23, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Hee Cheol Chung1, Yang Ni2, Irina Gaynanova3

  • 1Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Journal of Machine Learning Research : JMLR
|June 22, 2026
PubMed
Summary
This summary is machine-generated.

We developed a new statistical method for analyzing complex biological data from sequencing technologies. Our approach accurately classifies samples, even with skewed and zero-inflated data, outperforming existing methods.

Keywords:
Latent Gaussian copulaprobit regressionrobust classificationsequencing dataskewed datavariable selection

More Related Videos

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Published on: May 16, 2022

Related Experiment Videos

Last Updated: Jun 23, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

Published on: May 16, 2022

Area of Science:

  • Bioinformatics
  • Statistical Genetics
  • Computational Biology

Background:

  • Sequencing technologies generate high-dimensional biological data with inherent skewness and zero-inflation.
  • Linear classification methods face challenges due to violated distribution assumptions in such data.
  • Existing data transformation methods introduce ambiguity and affect model performance.

Purpose of the Study:

  • To propose a novel semiparametric framework for discriminant analysis robust to data characteristics.
  • To address skewness and zero inflation in high-dimensional biological data.
  • To improve classification accuracy and interpretability for sequencing-based data.

Main Methods:

  • Developed a semiparametric framework using a truncated latent Gaussian copula model.
  • Incorporated L1 sparsity regularization for enhanced model interpretability.
  • Established theoretical consistency of classification directions in high-dimensional settings.

Main Results:

  • The proposed model effectively handles skewed and zero-inflated data.
  • Demonstrated robustness against various data transformation methods.
  • Achieved superior classification accuracy compared to existing approaches.

Conclusions:

  • The novel framework offers a robust and interpretable solution for discriminant analysis of high-dimensional biological data.
  • The method shows promise for applications in microbiome, cancer genomics, and single-cell RNA sequencing.
  • This approach overcomes limitations of traditional methods when dealing with complex biological datasets.