MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell

Affiliations
  • 1School of Computer Science, McGill University, Montreal, Quebec, Canada.
  • 2Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada.
  • 3Department of Medicine, McGill University, Montreal, Quebec, Canada.
  • 4Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada.
  • 5Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
  • 6Department of Neurosurgery, Baylor College of Medicine, Temple, TX, USA.
  • 7College of Medicine and Irma Lerma Rangel College of Pharmacy, Texas A&M University, College Station, TX, USA.
  • 8LIVESTRONG Cancer Institutes and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA.
  • 9Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX, USA.
  • 10MyCellome LLC., Pittsburgh, PA, USA.
  • 11Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. tao.wu@bcm.edu.
  • 12School of Computer Science, McGill University, Montreal, Quebec, Canada. jun.ding@mcgill.ca.
  • 13Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada. jun.ding@mcgill.ca.
  • 14Department of Medicine, McGill University, Montreal, Quebec, Canada. jun.ding@mcgill.ca.
  • 15Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada. jun.ding@mcgill.ca.
  • 16Mila-Quebec AI Institue, Montreal, Quebec, Canada. jun.ding@mcgill.ca.

Published on:

Abstract

Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align multi-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.