New groups of highly divergent proteins in families as old as cellular life with important biological functions in the ocean

  • 1Institut de Systématique, Evolution, Bioaffiliationersité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, Paris, France. duncan.sussfeld@gmail.com.
  • 2Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, Evry, 91000, France. duncan.sussfeld@gmail.com.
  • 3Institut de Systématique, Evolution, Bioaffiliationersité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, Paris, France.
  • 4Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, Evry, 91000, France.
  • 5Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, 75016, France.

|

Abstract

BACKGROUND

Metagenomics has considerably broadened our knowledge of microbial diversity, unravelling fascinating adaptations and characterising multiple novel major taxonomic groups, e.g. CPR bacteria, DPANN and Asgard archaea, and novel viruses. Such findings profoundly reshaped the structure of the known Tree of Life and emphasised the central role of investigating uncultured organisms. However, despite significant progresses, a large portion of proteins predicted from metagenomes remain today unannotated, both taxonomically and functionally, across many biomes and in particular in oceanic waters.

RESULTS

Here, we used an iterative, network-based approach for remote homology detection, to probe a dataset of 40 million ORFs predicted in marine environments. We assessed the environmental diversity of 53 core gene families broadly distributed across the Tree of Life, with essential functions including translational, replication and trafficking processes. For nearly half of them, we identified clusters of remote environmental homologues that showed divergence from the known genetic diversity comparable to the divergence between Archaea and Bacteria, with representatives distributed across all the oceans. In particular, we report the detection of environmental clades with new structural variants of essential SMC (Structural Maintenance of Chromosomes) genes, divergent polymerase subunits forming deep-branching clades in the polymerase tree, and variant DNA recombinases in Bacteria as well as viruses.

CONCLUSIONS

These results indicate that significant environmental diversity may yet be unravelled even in strongly conserved gene families. Protein sequence similarity network approaches, in particular, appear well-suited to highlight potential sources of biological novelty and make better sense of microbial dark matter across taxonomical scales.

Related Concept Videos

Protein Families 02:47

15.3K

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...

Conservation of Protein Domains Over Different Proteins 02:26

10.8K

Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...

What is Evolutionary History? 02:35

36.3K

Scientists record evolutionary history by analyzing fossil, morphological, and genetic data. The fossil record documents the history of life on Earth and provides evidence for evolution. However, both fossil and living organisms offer evidence that outlines Earth’s evolutionary history.

Phylogenetic trees illustrate the evolutionary relationships among these organisms. Scientists infer organisms’ common ancestry by evaluating shared morphological and genetic characteristics....

The Evidence for Evolution 02:55

42.6K

Genetic variations accumulating within populations over generations give rise to biological evolution. Evolutionary changes can result in the formation of novel varieties and entire new species. These changes are responsible for the diverse forms of life inhabiting the planet. The evidence for evolution suggests that all living organisms descended from common ancestors.

The collection of fossils within sedimentary rocks give a record of common ancestry and often depicts the history of...

Protein Networks 02:26

3.9K

An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...

Evolutionary Relationships through Genome Comparisons 02:54

5.7K

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...