Using artificial intelligence to document the hidden RNA virosphere

Affiliations
  • 1National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, State Key Laboratory for Biocontrol, School of Medicine, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, China.
  • 2Apsara Lab, Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China.
  • 3Wuhan Centers for Disease Control and Prevention, Wuhan, China.
  • 4Polar Research Institute of China, Shanghai, China.
  • 5Department of Nursing, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China.
  • 6School of Geospatial Engineering and Science, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Sun Yat-sen University, Zhuhai, China; Key Laboratory of Comprehensive Observation of Polar Environment, Ministry of Education, Sun Yat-sen University, Zhuhai, China.
  • 7Ministry of Education Key Laboratory of Biodiversity Science and Ecological Engineering, National Observations and Research Station for Wetland Ecosystems of the Yangtze Estuary, Institute of Biodiversity Science and Institute of Eco-Chongming, School of Life Sciences, Fudan University Shanghai, Shanghai, China.
  • 8Centre for Virus Research, Westmead Institute for Medical Research, Westmead, NSW, Australia; School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia.
  • 9College of Life Sciences, Zhejiang University, Hangzhou, China.
  • 10School of Life Science, Guangzhou University, Guangzhou, China.
  • 11Key Laboratory of Pathogen Infection Prevention and Control (MOE), State Key Laboratory of Respiratory Health and Multimorbidity, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China; School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, China.
  • 12Guangzhou National Laboratory, Guangzhou International Bio-Island, Guangzhou, China.
  • 13Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong SAR, China.
  • 14School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia; Laboratory of Data Discovery for Health Limited, Hong Kong SAR, China. Electronic address: edward.holmes@sydney.edu.au.
  • 15Apsara Lab, Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China. Electronic address: zhaorong.lzr@alibaba-inc.com.

|

Abstract

Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.

Related Concept Videos

JoVE Research Video for RNA Interference 00:00

6.3K

Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we […]

JoVE Research Video for Experimental RNAi 02:15

5.7K

RNA interference (RNAi) is a cellular mechanism that inhibits gene expression by suppressing its transcription or activating the RNA degradation process. The mechanism was discovered by Andrew Fire and Craig Mello in 1998 in plants. Today, it is observed in almost all eukaryotes, including protozoa, flies, nematodes, insects, parasites, and mammals. This precise cellular mechanism of gene silencing has been developed into a technique that provides an efficient way to identify and determine the…

JoVE Research Video for siRNA - Small Interfering RNAs 02:30

15.3K

Small interfering RNAs, or siRNAs, are short regulatory RNA molecules that can silence genes post-transcriptionally, as well as the transcriptional level in some cases. siRNAs are important for protecting cells against viral infections and silencing transposable genetic elements.
In the cytoplasm, siRNA is processed from a double-stranded RNA, which comes from either endogenous DNA transcription or exogenous sources like a virus. This double-stranded RNA is then cleaved by the…

JoVE Research Video for RNA-seq 03:21

9.1K

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while…

JoVE Research Video for Types of RNA 01:23

60.7K

Overview
Three main types of RNA are involved in protein synthesis: messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). These RNAs perform diverse functions and can be broadly classified as protein-coding or non-coding RNA. Non-coding RNAs play important roles in the regulation of gene expression in response to developmental and environmental changes. Non-coding RNAs in prokaryotes can be manipulated to develop more effective antibacterial drugs for human or animal use.
RNA…

JoVE Research Video for RNA Structure 01:19

4.1K

The basic structure of RNA consists of a string of ribonucleotides attached by phosphodiester bonds. Although most RNA is single-stranded, it can form complex secondary and tertiary structures. Such structures play essential roles in the regulation of transcription and translation.
Different Types of RNA Have the Same Basic Structure
There are three main types of ribonucleic acid (RNA) involved in protein synthesis: messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). All three…