Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Genomic DNA in Eukaryotes

Genomic DNA in Eukaryotes

Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

DNA Microarrays

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Dynamic time warping analysis of accelerometry data: a tool for interpreting fine-scale movement patterns during fish angling events.

Conservation physiology·2026

Same author

Identification of food deprivation in salmonids using gill biomarkers.

Conservation physiology·2025

Same author

Addressing issues of experimental design, ecological realism and local adaptation for applications of ectotherm upper thermal limits.

The Journal of experimental biology·2025

Same author

Migration and Spawning Affect the Stable Isotope Values of Multiple Tissues in Pacific Salmon.

Ecological and evolutionary physiology·2025

Same author

Coronary circulation enhances the aerobic performance of wild Pacific salmon.

The Journal of experimental biology·2024

Same author

Physiological condition infers habitat choice in juvenile sockeye salmon.

Conservation physiology·2024

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 26, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

GenAp: a distributed SQL interface for genomic data.

Christos Kozanitis¹, David A Patterson²

¹Department of Computer Science, University of California Berkeley, Soda Hall, Berkeley, 94720, California, USA. kozanitis@eecs.berkeley.edu.

BMC Bioinformatics

|February 6, 2016

Summary

This summary is machine-generated.

Researchers have developed a modified Spark SQL to efficiently query large genomic datasets. This new approach speeds up data retrieval by over 50x, simplifying genomic data analysis for genetic disease research.

More Related Videos

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

Related Experiment Videos

Last Updated: Mar 26, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Advancements in genome sequencing technology have led to massive data generation for genetic disease research.
Handling and accessing terabytes of genomic data presents a significant challenge for researchers.
Efficient data retrieval is crucial for understanding disease mechanisms and developing targeted therapies.

Purpose of the Study:

To address the challenge of providing on-demand access to large-scale genomic data.
To improve the efficiency of querying genomic intervals within distributed databases.
To reduce the complexity and development effort required for genomic data analysis.

Main Methods:

Modification of Spark SQL, a distributed SQL execution engine.
Implementation of efficient join operations using genomic intervals as keys.
Benchmarking performance against existing brute-force and distributed approaches.

Main Results:

The modified Spark SQL achieves over 50x speedup for genomic interval joins compared to brute-force methods.
The system demonstrates an 8x performance improvement over similar distributed implementations.
A significant reduction (by an order of magnitude) in software code is required for data querying.

Conclusions:

Modified Spark SQL offers a highly efficient solution for querying large genomic datasets.
This advancement can accelerate genetic disease research by simplifying data access and analysis.
The approach has the potential to replace current practices for genomic data retrieval and analysis.