Misexpression of inactive genes in whole blood is associated with nearby rare structural variants

Affiliations
  • 1Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
  • 2Human Technopole, Fondazione Human Technopole, Milan, Italy.
  • 3Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK; Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Puddicombe Way, Cambridge, UK.
  • 4British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
  • 5Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
  • 6British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
  • 7Radcliffe Department of Medicine, John Radcliffe Hospital, Oxford, UK; Clinical Services, NHS Blood and Transplant, Oxford Centre, John Radcliffe Hospital, Oxford, UK.
  • 8Human Technopole, Fondazione Human Technopole, Milan, Italy; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK; British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK; National Institute for Health and Care Research Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge, Cambridge, UK; Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
  • 9Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK; British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK; National Institute for Health and Care Research Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge, Cambridge, UK; Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
  • 10Translational Science and Experimental Medicine, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Molndal, Sweden.
  • 11Translational Science and Experimental Medicine, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • 12Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK; Human Technopole, Fondazione Human Technopole, Milan, Italy; Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Puddicombe Way, Cambridge, UK; British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK; National Institute for Health and Care Research Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge, Cambridge, UK.
  • 13Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. Electronic address: ed5@sanger.ac.uk.

Abstract

Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.

Related Concept Videos

JoVE Research Video for Comparing Copy Number Variations and SNPs 02:26

13.4K

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%…

JoVE Research Video for Genome-wide Association Studies-GWAS 01:11

10.3K

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in…

JoVE Research Video for Translation 01:31

13.5K

Translation is the process of synthesizing proteins from the genetic information carried by messenger RNA (mRNA). Following transcription, it constitutes the final step in the expression of genes. This process is carried out by ribosomes, complexes of protein and specialized RNA molecules. Ribosomes, transfer RNA (tRNA), and other proteins produce a chain of amino acids—the polypeptide—as the end product of translation.
Translation Produces the Building Blocks of Life
Proteins are…

JoVE Research Video for Alternative RNA Splicing 02:18

19.1K

Alternative RNA splicing is the regulated splicing of exons and introns to produce different mature mRNAs from a single pre-mRNA. Unlike in constitutive splicing where a single gene produces a single type of mRNA, alternative splicing allows an organism to produce multiple proteins from a single gene and plays an important role in protein diversity.
There are five types of alternative RNA splicing that vary in the ways the pre-mRNA segments are removed or retained in the mature mRNA. The first…

JoVE Research Video for Genomic Imprinting and Inheritance 02:30

30.7K

Diploid organisms inherit genetic material through chromosomes from both parents. Copies of the same gene are known as alleles. In most cases, both alleles are simultaneously expressed and allow various cellular processes to function optimally. If one of the alleles is missing or mutated, the expression of the other allele can compensate; however, this is not true for all genes.
The expression of some genes depends on which parent passed the gene to the offspring, through a phenomenon known as…

JoVE Research Video for Epigenetic Regulation 01:37

2.8K

Epigenetic changes alter the physical structure of the DNA without changing the genetic sequence and often regulate whether genes are turned on or off. This regulation ensures that each cell produces only proteins necessary for its function. For example, proteins that promote bone growth are not produced in muscle cells. Epigenetic mechanisms play an essential role in healthy development. Conversely, precisely regulated epigenetic mechanisms are disrupted in diseases like cancer.
X-chromosome…