Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data

  • 0Department of Biological and Biomedical Sciences, Rowan University, Glassboro, New Jersey 08028, USA.

|

|

Summary

This summary is machine-generated.

We introduce DEGage, a new method for detecting differentially expressed genes (DEGs) in single-cell RNA sequencing (scRNA-seq) data. DEGage outperforms existing tools, offering robust and sensitive analysis for high-throughput sequencing applications.

Area Of Science

  • Genomics
  • Computational Biology
  • Statistical Genetics

Background

  • High-throughput sequencing (HTS) is crucial for biological research at bulk and single-cell levels.
  • Comparative analysis of HTS data often uses the difference of two negative binomial distributions (DOTNB), but theoretical results are limited.
  • Existing methods for detecting differentially expressed genes (DEGs) in single-cell RNA sequencing (scRNA-seq) data have limitations.

Purpose Of The Study

  • To derive theoretical results for DOTNB and examine its asymptotic properties.
  • To introduce DEGage, a novel computational method for DEG detection in scRNA-seq data.
  • To validate DEGage's performance against existing DEG analysis tools.

Main Methods

  • Derivation of basic analytical results and examination of asymptotic properties for DOTNB.
  • Development of DEGage, a computational tool utilizing DOTNB for DEG identification in scRNA-seq data.
  • Extensive validation using simulated and real scRNA-seq datasets, comparing DEGage with DEGseq2, DEsingle, edgeR, Monocle3, and scDD.

Main Results

  • DEGage demonstrates superior performance compared to five popular DEG analysis tools.
  • The method is robust against high dropout rates and shows enhanced sensitivity for both balanced and imbalanced datasets, even with small sample sizes.
  • DEGage successfully identified marker genes in prostate cancer and potential memory-related genes in mouse neurons.

Conclusions

  • DEGage offers a powerful and reliable approach for DEG analysis in scRNA-seq data.
  • The theoretical advancements in DOTNB and the DEGage software have broad applicability for HTS data analysis.
  • This work facilitates comparative analyses of dispersed count data and addresses significant research questions in genomics and beyond.

Related Concept Videos

Evolutionary Relationships through Genome Comparisons 02:54

5.7K

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Bonferroni Test 01:10

2.7K

The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...

Friedman Two-way Analysis of Variance by Ranks 01:21

154

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Wald-Wolfowitz Runs Test II 01:17

190

The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and...

One-Way ANOVA: Equal Sample Sizes 01:15

3.2K

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Test for Homogeneity 01:23

1.9K

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...