Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A Kernelization Algorithm for Finding a Perfect Phylogeny From Mixed Tumor Samples.

IEEE transactions on computational biology and bioinformatics·2025

Same author

Faster Algorithms for Constructing Frequency Difference Consensus Trees.

IEEE transactions on computational biology and bioinformatics·2025

Same author

Fast Algorithms for the Simplified Partial Digest Problem.

Journal of computational biology : a journal of computational molecular cell biology·2022

Same author

A Faster Algorithm for Computing the Kernel of Maximum Agreement Subtrees.

IEEE/ACM transactions on computational biology and bioinformatics·2019

Same author

Fast Algorithms for Computing Path-Difference Distances.

IEEE/ACM transactions on computational biology and bioinformatics·2018

Same author

Constructing a Gene Team Tree in Almost O (n lg n) Time.

IEEE/ACM transactions on computational biology and bioinformatics·2015

Same journal

circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network.

IEEE/ACM transactions on computational biology and bioinformatics·2024

Same journal

Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification.

IEEE/ACM transactions on computational biology and bioinformatics·2024

Same journal

Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq.

IEEE/ACM transactions on computational biology and bioinformatics·2024

Same journal

MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data.

IEEE/ACM transactions on computational biology and bioinformatics·2024

Same journal

An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction.

IEEE/ACM transactions on computational biology and bioinformatics·2024

Same journal

Generative Biomedical Event Extraction With Constrained Decoding Strategy.

IEEE/ACM transactions on computational biology and bioinformatics·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 10, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A New Efficient Algorithm for the Frequent Gene Team Problem.

Biing-Feng Wang

IEEE/ACM Transactions on Computational Biology and Bioinformatics

|December 14, 2016

Summary

This summary is machine-generated.

A new algorithm efficiently solves the frequent gene team problem by identifying gene teams across multiple genomes. This method is significantly faster than previous approaches, especially for large datasets, enabling large-scale genome analysis.

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Related Experiment Videos

Last Updated: Mar 10, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

The frequent gene team problem involves identifying gene teams present in a minimum number (quorum, μ) of m genomes.
Existing algorithms are often inefficient for large quorum values.
There is a need for scalable algorithms to analyze large genomic datasets.

Purpose of the Study:

To present a novel algorithm for the frequent gene team problem.
To develop an algorithm with time complexity independent of the quorum parameter μ.
To provide an efficient tool for large-scale genome analyses.

Main Methods:

A new algorithm is introduced that avoids examining all combinations of μ genomes.
The algorithm's time complexity is independent of μ.
Theoretical runtime is estimated as O(n) under realistic assumptions, where n is the maximum genome length.

Main Results:

The algorithm demonstrates extreme efficiency in experimental tests.
Processing 100 bacterial genomes takes under 1 second for any μ.
Processing 2,000 genomes takes approximately 10 minutes.

Conclusions:

The presented algorithm offers a significant improvement for solving the frequent gene team problem.
Its efficiency makes it suitable for large-scale genomic data analysis.
This tool can advance research in comparative genomics and evolutionary biology.