Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease

Affiliations
  • 1Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA.
  • 2Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
  • 3Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
  • 4Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA; UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA, USA. Electronic address: gs9yr@virginia.edu.

Abstract

A major fraction of loci identified by genome-wide association studies (GWASs) mediate alternative splicing, but mechanistic interpretation is hindered by the technical limitations of short-read RNA sequencing (RNA-seq), which cannot directly link splicing events to full-length protein isoforms. Long-read RNA-seq represents a powerful tool to characterize transcript isoforms, and recently, infer protein isoform existence. Here, we present an approach that integrates information from GWASs, splicing quantitative trait loci (sQTLs), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes that colocalized with BMD associations (H4PP ≥ 0.75). We generated PacBio Iso-Seq data (N = ∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were unannotated. By casting the sQTLs onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense-mediated decay and 190 that potentially resulted in the expression of unannotated protein isoforms. Finally, we functionally validated colocalizing sQTLs in TPM2, in which siRNA-mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization but exhibited no effect upon knockdown of the entire gene. Our approach should be to generalize across diverse clinical traits and to provide insights into protein isoform activities modulated by GWAS loci.

Related Concept Videos

JoVE Research Video for Ribosome Profiling 02:24

3.3K

Ribosome profiling or ribo-sequencing is a deep sequencing technique that produces a snapshot of active translation in a cell. It selectively sequences the mRNAs protected by ribosomes to get an insight into a cell’s translation landscape at any given point in time.
Applications of ribosome profiling
Ribosome profiling has many applications, including in vivo monitoring of translation inside a particular organ or tissue type and quantifying new protein synthesis levels.
The technique…

JoVE Research Video for Genome-wide Association Studies-GWAS 01:11

10.3K

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in…

JoVE Research Video for Leaky Scanning 02:28

4.9K

During most eukaryotic translation processes, the small 40S ribosome subunit scans an mRNA from its 5' end until it encounters the first start AUG codon. The large 60S ribosomal subunit then joins the smaller one to initiate protein synthesis. The location of the translation initiation is largely determined by the nucleotides near the start codon as there may be multiple translation initiation sites present on the mRNA.  Marilyn Kozak discovered that the sequence RCCAUGG (where R…

JoVE Research Video for Proteomics 01:33

6.1K

A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term…

JoVE Research Video for Pleiotropy 01:33

35.7K

Pleiotropy is the phenomenon in which a single gene impacts multiple, seemingly unrelated phenotypic traits. For example, defects in the SOX10 gene cause Waardenburg Syndrome Type 4, or WS4, which can cause defects in pigmentation, hearing impairments, and an absence of intestinal contractions necessary for elimination. This diversity of phenotypes results from the expression pattern of SOX10 in early embryonic and fetal development. SOX10 is found in neural crest cells that form melanocytes,…

JoVE Research Video for Translation 01:31

13.5K

Translation is the process of synthesizing proteins from the genetic information carried by messenger RNA (mRNA). Following transcription, it constitutes the final step in the expression of genes. This process is carried out by ribosomes, complexes of protein and specialized RNA molecules. Ribosomes, transfer RNA (tRNA), and other proteins produce a chain of amino acids—the polypeptide—as the end product of translation.
Translation Produces the Building Blocks of Life
Proteins are…