COPO – Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project

Affiliations
  • 1Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.
  • 2EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.
  • 3Department of Zoology, University of Oxford, Oxford, Oxfordshire, OX1 2JD, UK.
  • 4Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1RQ, UK.

Published on:

Abstract

Large-scale reference genome sequencing projects for all of biodiversity are underway and common standards have been in place for some years to enable the understanding and sharing of sequence data. However, the metadata that describes the collection, processing and management of samples, and link to the associated sequencing and genome data, are not yet adequately developed and standardised for these projects. At the time of writing, the Darwin Tree of Life (DToL) Project is over two years into its ten-year ambition to sequence all described eukaryotic species in Britain and Ireland. We have sought consensus from a wide range of scientists across taxonomic domains to determine the minimal set of metadata that we collectively deem as critically important to accompany each sequenced specimen. These metadata are made available throughout the subsequent laboratory processes, and once collected, need to be adequately managed to fulfil the requirements of good data management practice. Due to the size and scale of management required, software tools are needed. These tools need to implement rigorous development pathways and change management procedures to ensure that effective research data management of key project and sample metadata is maintained. Tracking of sample properties through the sequencing process is handled by Lab Information Management Systems (LIMS), so publication of the sequenced data is achieved via technical integration of LIMS and data management tools. Discussions with community members on how metadata standards need to be managed within large-scale programmes is a priority in the planning process. Here we report on the standards we developed with respect to a robust and reusable mechanism of metadata collection, in the hopes that other projects forthcoming or underway will adopt these practices for metadata.

Related Concept Videos

JoVE Research Video for Evolutionary Relationships through Genome Comparisons 02:54

5.5K

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse…

JoVE Research Video for Taxonomy 01:31

70.0K

Taxonomy is the science of defining and naming groups of biological organisms based on shared characteristics. It uses a hierarchy of increasingly inclusive categories with Latin names. The smallest units of taxonomy, species and genus, are used to assign a formal, taxonomic name to each species in a system. This classification system, referred to as binomial nomenclature, was formalized by Carolus Linnaeus in the 18th century.Hierarchy of TaxonomyThe hierarchy that Carolus Linnaeus first…

JoVE Research Video for Phylogenetic Trees 03:21

42.5K

Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.

The length of the branches can depict time or the relative amount of change among organisms. For instance, the branch length might indicate the number of amino acid changes in the sequence that underlies the…

JoVE Research Video for The Tree of Life - Bacteria, Archaea, Eukaryotes 02:40

29.8K

The “tree of life” describes the evolution of life and the evolutionary relationships between organisms. The root of the tree is the common ancestor to all life on Earth. All other species radiate from this point, much like the branches of a tree. The numerous tips of these branches on the tree of life represent every living, or extant, species. Extinct species, which are species that no longer exist, can be found towards the center of the tree. Currently, these organisms, both…

JoVE Research Video for Conservation of Small Populations 02:04

12.7K

Small population sizes put a species at extreme risk of extinction due to a lack of variation, and a consequent decrease in adaptability. This weakens the chances of survival under pressures such as climate change, competition from other species, or new diseases. Large populations are more likely to survive pressures such as these, as such populations are more likely to harbor individuals that have genetic variants that are adaptive under new stresses. Small populations are much less…

JoVE Research Video for Gene Evolution - Fast or Slow? 02:05

6.8K

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code…