Skip to content

ENVO annotations for MIxS v5

Pier Luigi Buttigieg edited this page Nov 3, 2020 · 13 revisions

General notes and guidance

The Minimum Information about any (x) Sequence (MIxS) is a checklist from the Genomic Standards Consortium. This checklist features three mandatory fields for environmental description using ENVO classes.

This guidance is relevant to version 5 of the MIxS checklist, available as an XLSX document, here

Field name Full name Description Comments
env_broad_scale broad-scale environmental context In this field, report which major environmental system your sample or specimen came from. The systems identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. were you in the desert or a rainforest?). We recommend using subclasses of ENVO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428
env_local_scale local environmental context In this field, report the entity or entities which are in your sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale
env_medium environmental medium In this field, report which environmental material or materials (pipe separated) immediately surrounded your sample or specimen prior to sampling Please use one or more subclasses of ENVO’s environmental material class: http://purl.obolibrary.org/obo/ENVO_00010483.

Format

The valid format includes the term label, followed by a space, and then the term's ID in CURIE format.

Format (single term) Format (multiple terms)
termLabel [termID] termLabel [termID]|termLabel [termID]|termLabel [termID]
Example: single term Example: multiple terms
tropical moist broadleaf forest biome [ENVO:01000228] canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]

Example annotations

Consider a sample of leaf litter taken from the understory of the Amazon rainforest. An appropriate annotation would be:

env_broad_scale : tropical moist broadleaf forest biome [ENVO:01000228]
env_local_scale : understory [ENVO:01000335]
env_medium      : plant matter [ENVO:01001121]

If this was a sample of plant matter pooled from several layers of the forest's vegetation, a valid annotation may resemble:

env_broad_scale : tropical moist broadleaf forest biome [ENVO:01000228]
env_local_scale : canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336] 
env_medium      : plant matter [ENVO:01001121]

Consider the following annotation for a water sample from a seasonal thermocline in the well-lit waters of the Atlantic Ocean:

env_broad_scale : oceanic epipelagic zone biome [ENVO:01000033]
env_local_scale : seasonal thermocline [ENVO:01000107]
env_medium      : ocean water [ENVO:00002151]

If one is dealing with larger organisms, multiple env_medium terms may be needed. For example, consider a duck paddling in water:

env_medium : pond water [ENVO:00002228]|air [ENVO_00002005]

Requesting new terms

If needed, request new terms on the ENVO tracker. Please include a definition of the new term, citing relevant sources.

Notes on annotating microscale or microbial samples

Many MIxS users will be working with microbial communities. We offer the following recommendations for these users:

General considerations

  • When filling in the env_broad_scale field, it's tempting to assume that macroscale environments do not matter to the microbial assemblage sampled. Nonetheless, we strongly encourage these larger scale systems (e.g. urban biomes or subtropical desert biomes) to be identified, in addition to any smaller scale features. These annotations will be instrumental for leveraging data in global analyses and for microbial biogeography. Keep in mind, users can always include any smaller-scale environments using the multi-term format noted above.
  • When filling in the env_local_scale field, attempt to add new, more fine-grained information relative to the env_broad_scale field. For example, a MIxS entry that includes env_broad_scale: village biome [ENVO:01000246] and env_local_scale: village [ENVO:01000773] is not as useful as one that includes env_broad_scale: village biome [ENVO:01000246] and env_local_scale: farm [ENVO:00000078]

Host-associated microbial samples

General advice

  • When annotating microbial communities living in or on host organisms, we recommend the following:
    • Ensure the taxonomic information of the host (e.g. the NCBI TaxID) is filled out in the MIxS host fields.
    • env_broad_scale entries should reflect the ecosystem the host is found in (e.g. an urban biome [ENVO:01000249] or a tundra biome [ENVO:01000180])
    • env_local_scale entries should use terms from an ontology such as UBERON or PO to describe the anatomical parts of the host that are most causally influential to the microbial communities sampled (e.g. skin of eyelid [UBERON:0001457] or tepal apex [PO:0025143])
    • env_medium should either be a term from the ENVO environmental material hierarchy, or one from the anatomical ontologies relevant to the host organism.

Notes on the use of other ontologies in MIxS environment fields

Using existing terms from other OBO ontologies in the env_broad_scale, env_local_scale, or env_medium fields is advised to prevent the creation of near-duplicate hierarchies (so-called "shadow hierarchies") in ENVO. Such hierarchies would contain terms like ENVO:'liver environment' that shadow UBERON:liver - the semantic precision granted is typically outweighed by overheads of maintaining shadow hierarchies.

If high semantic precision is needed, we recommend the post-composition following patterns like:

  • 'liver ecosystem' = ecosystem [ENVO:01001110] and determined by some liver [UBERON:0002107]
  • 'mucus material' = 'environmental material' [ENVO:00010483] and composed primarily of some mucus [UBERON:0000912]