Skip to content

lauralwd/2OGD_phylogeny

Repository files navigation

This repository contains a phylogentic analyses of the 2-OGD super family of enzymes. This work aims to place Azolla filiculoides 2-OGD sequences in context of the broader evolution of this group, using data from the 1kP project. Within this repository, all code, data, and most intermediate files are stored for reproducibility and documentation purposes. The phylogeny created here is shown in Güngör et al. (in prep).

Quick links:

Phylogeny of 2-OGD sequences in plants, firstly of the ANS, FLS and JOX subclades:

Secondly of the entire 2-OGD superfamily

Main text figure:

Supplemental figure:

View the pdf for all details.

Guide through directories and files

Directories

The data folder contains (unaligned) fasta files, lists of sequence names, and aligned sequences in both trimmed and untrimmed versions. File names tend to be long, but are meant to reflect the history of that specific file. For example: orthogroup_AtLDOX_AT4g22880_selection-v2_guide-v5_aligned-mafft-einsi_trim-gt30.fasta contains sequences from the 1kP orthogroup retrevied with LDOX from Arabidopsis thaliana from which a manual selection was taken (v2). Second, several sequences were added (guide-v5), a set of guide sequences (sequences whose function has been verified), optionally some outgroup sequences, and Azolla filiculoides sequences. These sequences were then aligned with mafft-einsi and trimmed with trimAL settings -gt .30.

The analyses folder contains tree inferences. These are organised in folders of starting dataset, and then in folders of alignment and trimming strategy. Still, a folder may contain several tree inferences made with IQTree. The final part of the filename summarises the settings used to create a particular tree file. Note that intermediate trees are just that, intermediate results.

The figures folder contains the final versions of the figures shown in Güngör et al (in prep). in several formats. These were made by importing a .treefile in iToL, then adding annotation manually, and downloading these as .svg file. These .svg files were then finalised in Inkscape to their published form and exported as pdf or png.

Files

The workflows for which data is shared here, are documented in JuPyter notebooks (*.ipynb). The workflow describing the final version of the complete tree is 2OGD_tree_v5. The workflow describing the final version of the subsetted tree is v2g5_JOX-ANS-FLS-subset. The other workflows are explorative and should be interpreted as such. A blank version of the workflow is maintained here: Laura's phylogeny workflow. Note that figures which are embedded in the JuPyter notebooks are not properly displayed online on Github. You may download the .ipynb files to display them locally, including images.

Finaly, the environment.yaml file details all software names and versions that were used in this project. This file may be used to recreate the exact software environment for this analysis using miniconda. To do so, issue a command like so conda env create -f ./environment.yaml.

Data sources used in this project

In building these trees, we have made use of publicly available data, and a novel assembly and annotation of the Azolla filiculoides genome (Azfi v2). Azolla automated annotations (Afi v1) are available on fernbase The novel assembly and annotation will be available publicly soon. Sequences of relevance to this particular analysis are stored in data/ANS-likes_Azolla-filiculoides_v4.fasta.

Notably, we have made use of data made available by the 1000 plant transcriptomes project (1kP). First, we made use of the 1kP orthogroup extractor. Unfortunately this website was taken offline shortly after publication of the 1kP project, and to the best of our knowledge the data is not accessible in any other manner. The orthogroups extracted by us are stored in this repository. Second, we made use of the online sample list viewer to create a subset of the orthogroup; taking care to sample across the tree of all plants with extra attention to seed-free plants. The subset used here is online in google sheets, and the resulting lists are stored here in the data directory.

The 1kP project provides a wealth of sequencing information on taxa of plants for which few sequences are available from genome sequences, let alone sequences of which their function is verified. Therefore, we thankfully made use of the sequences collected in literature and online databases; most notably so in Kawai et al. 2014: Evolution and diversity of the 2–oxoglutarate-dependent dioxygenase superfamily in plants.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published