Evolutionary Relationships through Genome Comparisons
Comparing Copy Number Variations and SNPs
Multi-species Conserved Sequences
Next-generation Sequencing
RNA-seq
Single Nucleotide Polymorphisms-SNPs
You might also read
Articles linked to this work by shared authors, journal, and citation graph.

Ultra-long Read Sequencing for Whole Genomic DNA Analysis
Published on: March 15, 2019
Siegfried Schloissnig1, Samarendra Pani2,3, Jana Ebler2,3
1Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
View abstract on PubMed
Long-read sequencing of 1,019 humans revealed over 100,000 genomic structural variants (SVs) and 300,000 tandem repeats. This advances understanding of genetic diversity and disease by characterizing SVs across diverse populations.
Area of Science:
Background:
The human genome contains vast architectural differences that influence phenotypic diversity and clinical outcomes. Prior research has shown that these large-scale alterations, often exceeding fifty base pairs, represent a significant portion of genetic variation between individuals. Traditional short-read technologies frequently fail to resolve complex regions or repetitive sequences where these mutations cluster. Existing databases lacked the resolution to capture the full spectrum of insertions, inversions, and tandem repeats across global populations. The scientific community recognized that many pathogenic variants remain hidden within these dark regions of the genetic code. Understanding the evolutionary history of these rearrangements requires a more granular view than what was previously available. This absence of evidence motivated the development of higher-fidelity mapping strategies to catalog these elusive genomic features.
Based on this study's findings, Long interspersed nuclear element-1 (L1) and SINE-VNTR-Alu (SVA) retrotransposition activities mediate the transduction of unique sequence stretches. These events occur at the 5' or 3' ends of the genome, depending on the specific mobile element class and the chromosomal locus involved.
The researchers uncovered over 100,000 sequence-resolved biallelic structural variants and genotyped 300,000 multiallelic variable number of tandem repeats. These findings were derived from an intermediate-coverage resource encompassing 1,019 diverse humans from 26 distinct populations within the 1000 Genomes Project.
The study integrated graph genome-based analyses with linear methods to resolve complex structural variants that short-read surveys often miss. This dual approach enabled the characterization of over 100,000 biallelic variants and 300,000 multiallelic variable number of tandem repeats across diverse human populations.
The findings are confined to an intermediate-coverage resource based on 1,019 individuals from 26 populations within the 1000 Genomes Project. While it advances structural variant characterization, the authors suggest that further investigation is required to prioritize variants in specific patient genomes.
The study's authors propose that this open-access resource underscores the value of long-read sequencing in advancing structural variant characterization. They conclude that the dataset enables guiding variant prioritization in patient genomes, potentially improving the diagnostic accuracy for genetic diseases linked to complex rearrangements.
Purpose Of The Study:
This investigation sought to establish a comprehensive, sequence-resolved resource of structural diversity across a globally representative cohort. Researchers aimed to leverage advanced sequencing technologies to overcome the limitations of previous population-scale surveys. The project focused on identifying biallelic and multiallelic variations within twenty-six distinct human groups. Scientists intended to clarify the mechanisms driving the formation of deletions, duplications, and mobile element insertions. The team prioritized the creation of a reference that accounts for the unique genetic backgrounds of diverse ethnic lineages. By mapping these variants, the study intended to bridge the gap between raw sequence data and functional biological insights. The team worked to provide an open-access framework for prioritizing variants in clinical diagnostics.
Main Methods:
The study utilized long-read sequencing to generate intermediate-coverage data for 1,019 participants representing twenty-six global populations. Bioinformaticians integrated linear reference alignments with graph genome-based analyses to detect complex rearrangements across the human genome. The pipeline specifically targeted 300,000 multiallelic variable number of tandem repeats and retrotransposon-mediated events. Computational tools characterized the breakpoints of deletions and insertions to identify underlying mutational signatures and homology-mediated processes. The researchers mapped these findings back to the 1000 Genomes Project framework to ensure population-level relevance and diversity. The team employed sophisticated algorithms to distinguish between biallelic and multiallelic states in highly repetitive or complex loci. Statistical frameworks were applied to validate the frequency of these structural changes across the diverse lineages sampled in the cohort.
Main Results:
The analysis uncovered more than 100,000 sequence-resolved biallelic structural variants across the diverse cohort of 1,019 individuals. Genotyping efforts successfully identified 300,000 multiallelic variable number of tandem repeats within the twenty-six human populations. Long interspersed nuclear element-1 and SINE-VNTR-Alu activities were found to mediate specific sequence transductions at genomic loci. These retrotransposition events occurred at either the 5' or 3' ends depending on the source mobile element class. Breakpoint evaluations revealed that homology-mediated processes significantly contribute to recurrent deletion events and overall structural formation. The data showed that insertions and inversions are distributed unevenly across different chromosomal regions and specific population groups. The study successfully resolved complex loci that were previously considered inaccessible to standard short-read sequencing platforms used in earlier surveys.
Conclusions:
The resulting dataset provides an unprecedented view of the architectural complexity inherent in the human species across global populations. These findings demonstrate that long-read technologies are essential for capturing variation missed by previous short-read methodologies. The cataloged variants offer a new foundation for understanding how structural changes influence disease susceptibility and genetic diversity. Clinicians can now use this resource to improve the prioritization of candidate mutations in patient genomes for diagnostic purposes. Future genomic studies will likely rely on these high-resolution maps to interpret the functional impact of non-coding variation. The open-access nature of this resource ensures that researchers worldwide can integrate these findings into their own diagnostic pipelines. This work marks a significant shift toward a more inclusive and accurate representation of the global human pangenome.