Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Comparing Copy Number Variations and SNPs

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evaluating the efficacy of rimegepant as a preventive treatment for chronic and episodic migraine: a three-month longitudinal retrospective cohort study.

Frontiers in neurology·2026

Same author

Hematopoietic Stem Cell Transplantation in Infantile Osteopetrosis: Lessons from a Resource-Limited Setting.

Transplantation and cellular therapy·2026

Same author

Efficacy and safety evaluation of artificial intelligence-identified antimicrobial peptides targeting avian pathogenic Escherichia coli in broiler chickens.

Journal of animal science and biotechnology·2026

Same author

Dynamic light scattering-assisted design of an optimized NiS-ZnS nanocomposite for efficient photocatalytic dye degradation: experimental and theoretical insights.

RSC advances·2026

Same author

AIEdit: Alignment-free genome assembly polisher trained on spaced seed match patterns.

PLoS computational biology·2026

Same author

Toolkit to Promote Collaboration Between Surgeons and Anesthesiologists: Addressing Common Barriers to Perioperative Teamwork.

Journal of the American College of Surgeons·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 3, 2026

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Published on: June 10, 2017

ntCard: a streaming algorithm for cardinality estimation in genomics data.

Hamid Mohamadi^1,2, Hamza Khan^1,2, Inanc Birol^1,2

¹Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.

Bioinformatics (Oxford, England)

|April 29, 2017

Summary

This summary is machine-generated.

ntCard is a new streaming algorithm that efficiently estimates k-mer frequencies in large genomics datasets. It offers faster and more accurate k-mer analysis for bioinformatics tools.

More Related Videos

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

Related Experiment Videos

Last Updated: Mar 3, 2026

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Published on: June 10, 2017

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Bioinformatics algorithms often require analysis of uniform length sequences (k-mers).
Efficiently calculating k-mer frequencies is crucial for downstream analysis, genome size estimation, and error rate measurement.
Current methods face challenges with large-scale sequencing data.

Purpose of the Study:

To present ntCard, a novel streaming algorithm for estimating k-mer frequencies in genomics datasets.
To provide an efficient and accurate tool for k-mer histogram generation.
To enable large-scale genomics applications through improved k-mer analysis.

Main Methods:

ntCard utilizes the ntHash algorithm for efficient k-mer hashing.
It employs a sampling strategy to build a reduced representation multiplicity table.
A statistical model reconstructs the population distribution from the sampled data.

Main Results:

ntCard demonstrates >15x faster k-mer frequency estimation compared to state-of-the-art algorithms.
The algorithm achieves high accuracy rates while using comparable memory resources.
Performance was validated on large datasets (up to 2.4 TB) from human and white spruce genomes.

Conclusions:

ntCard is a highly efficient and accurate tool for estimating k-mer frequencies in large genomics datasets.
It represents a potentially enabling technology for large-scale genomics applications.
The algorithm is implemented in C++ and freely available under the GPL license.