Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

18.9K
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
18.9K
Sample Size Calculation01:19

Sample Size Calculation

6.8K
Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
6.8K
Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

3.6K
3.6K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.3K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluating the efficacy of rimegepant as a preventive treatment for chronic and episodic migraine: a three-month longitudinal retrospective cohort study.

Frontiers in neurology·2026
Same author

Hematopoietic Stem Cell Transplantation in Infantile Osteopetrosis: Lessons from a Resource-Limited Setting.

Transplantation and cellular therapy·2026
Same author

Efficacy and safety evaluation of artificial intelligence-identified antimicrobial peptides targeting avian pathogenic Escherichia coli in broiler chickens.

Journal of animal science and biotechnology·2026
Same author

Dynamic light scattering-assisted design of an optimized NiS-ZnS nanocomposite for efficient photocatalytic dye degradation: experimental and theoretical insights.

RSC advances·2026
Same author

AIEdit: Alignment-free genome assembly polisher trained on spaced seed match patterns.

PLoS computational biology·2026
Same author

Toolkit to Promote Collaboration Between Surgeons and Anesthesiologists: Addressing Common Barriers to Perioperative Teamwork.

Journal of the American College of Surgeons·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Mar 3, 2026

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma
08:53

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Published on: June 10, 2017

10.5K

ntCard: a streaming algorithm for cardinality estimation in genomics data.

Hamid Mohamadi1,2, Hamza Khan1,2, Inanc Birol1,2

  • 1Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.

Bioinformatics (Oxford, England)
|April 29, 2017
PubMed
Summary
This summary is machine-generated.

ntCard is a new streaming algorithm that efficiently estimates k-mer frequencies in large genomics datasets. It offers faster and more accurate k-mer analysis for bioinformatics tools.

More Related Videos

Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

12.6K
Infinium Assay for Large-scale SNP Genotyping Applications
13:33

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

39.9K

Related Experiment Videos

Last Updated: Mar 3, 2026

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma
08:53

Flow-sorting and Exome Sequencing of the Reed-Sternberg Cells of Classical Hodgkin Lymphoma

Published on: June 10, 2017

10.5K
Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

12.6K
Infinium Assay for Large-scale SNP Genotyping Applications
13:33

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

39.9K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Bioinformatics algorithms often require analysis of uniform length sequences (k-mers).
  • Efficiently calculating k-mer frequencies is crucial for downstream analysis, genome size estimation, and error rate measurement.
  • Current methods face challenges with large-scale sequencing data.

Purpose of the Study:

  • To present ntCard, a novel streaming algorithm for estimating k-mer frequencies in genomics datasets.
  • To provide an efficient and accurate tool for k-mer histogram generation.
  • To enable large-scale genomics applications through improved k-mer analysis.

Main Methods:

  • ntCard utilizes the ntHash algorithm for efficient k-mer hashing.
  • It employs a sampling strategy to build a reduced representation multiplicity table.
  • A statistical model reconstructs the population distribution from the sampled data.

Main Results:

  • ntCard demonstrates >15x faster k-mer frequency estimation compared to state-of-the-art algorithms.
  • The algorithm achieves high accuracy rates while using comparable memory resources.
  • Performance was validated on large datasets (up to 2.4 TB) from human and white spruce genomes.

Conclusions:

  • ntCard is a highly efficient and accurate tool for estimating k-mer frequencies in large genomics datasets.
  • It represents a potentially enabling technology for large-scale genomics applications.
  • The algorithm is implemented in C++ and freely available under the GPL license.