SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1 (http://qiime.org). SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Visit http://bioinfo.lifl.fr/RNA/sortmerna/ for more information.
- Support
- Documentation
- Getting Started
- Compilation
- Install compilers, ZLIB and autoconf
- Tests
- Third-party libraries
- Wrappers and packages
- Taxonomies
- Citation
- Contributors
For questions and comments, please use the SortMeRNA forum.
If you have Doxygen installed, you can generate the documentation
by modifying the following lines in doxygen_configure.txt
:
INPUT = /path/to/sortmerna/include /path/to/sortmerna/src
IMAGE_PATH = /path/to/sortmerna/algorithm
and running the following command:
doxygen doxygen_configure.txt
This command will generate a folder html
in the directory from which the
command was run.
There are 3 methods to install SortMeRNA:
- GitHub repository development version (master branch) ...* Installation instructions
- GitHub releases (tar balls) ...* Installation instructions Linux ...* Installation instructions Mac OS
- BioInfo releases (tar balls including compiled binaries)
Option (3) is the most straight-forward, as it does not require autoconf
and provides
access to pre-compiled binaries to various OS.
NOTE: You will require autoconf
to build from the git cloned
repository or from source code in the Source code
tar
balls under release GitHub Downloads
.
(0) Prepare your build system for compilation:
bash autogen.sh
(1) Check your GCC compiler is version 4.0 or above:
gcc --version
(2) Run configure and make scripts:
./configure
make
(3) To install:
make install
You can define an alternative installation directory by
specifying --prefix=/path/to/installation/dir
to configure
.
(1) Check the version of your C/C++ compiler:
gcc --version
(2a) If the compiler is Clang or GCC, proceed to run configure and make scripts:
./configure
make
Note: If the compiler is Clang, you will not have access to multithreading.
(2b) If the compiler is LLVM-GCC, you will need to change it (see Deprecation and Removal Notice).
To set your compiler to Clang (see instructions) or the original GCC compiler (see instructions).
(3) To install:
make install
(1) Check if you have Clang installed:
clang --version
(2a) If Clang is installed, set your compiler to Clang:
export CC=clang
export CXX=clang++
(2b) If Clang is not installed, see Clang for Mac OS for installation instructions.
(1) Check if you have GCC installed:
gcc --version
(2a) If GCC is installed, set your compiler to GCC:
export CC=gcc-mp-4.8
export CXX=g++-mp-4.8
(2b) If GCC is not installed, see Install GCC through MacPorts for installation instructions.
(3) Next, if you would like zlib support (reading compressed .zip and .gz FASTA/FASTQ files), Zlib should also be installed via MacPorts. See section Install GCC and Zlib though MacPorts for installation instructions.
(4a) Assuming you have Zlib installed, run configure and make scripts (if compression feature wanted):
./configure --with-zlib="/opt/local"
make
(4b) Otherwise (if option to read compressed files is not wanted):
./configure --without-zlib
make
You can define an alternative installation directory by
specifying --prefix=/path/to/installation/dir
to configure
.
NOTE: the Clang compiler on Mac (distributed through Xcode) does not support OpenMP (multithreading). A preliminary implementation of OpenMP for Clang has been made at "http://clang-omp.github.io" though has not been yet incorporated into the Clang mainline. The user may follow the steps outlined in the above link to install the version of Clang with multithreading support, though this version has not yet been tested with SortMeRNA. Otherwise, the user is recommended to install the original GCC compiler via MacPorts (contains full multithreading support).
Installing Xcode (free through the App Store) and Xcode command line tools will automatically install the latest version of Clang supported with Xcode.
After installing Xcode, the Xcode command line tools may be installed via:
Xcode -> Preferences -> Downloads
Under "Components", click to install "Command Line Tools"
Assuming you have MacPorts installed, type:
sudo port selfupdate
sudo port install gcc48
sudo port install zlib
After the installation, you should find the compiler installed in /opt/local/bin/gcc-mp-4.8 and /opt/local/bin/g++-mp-4.8 as well as Zlib in /opt/local/lib/libz.dylib and /opt/local/include/zlib.h .
wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
tar -zxvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure
make
make install
You can define an alternative installation directory by
specifying --prefix=/path/to/installation/dir
to configure
(before calling make
).
If installing in a directory other than those listed in $PATH, add the installation directory to $PATH:
export PATH=$PATH:/path/to/installation/dir
Usage tests can be run with the following command:
python ./tests/test_sortmerna.py
python ./tests/test_sortmerna_zlib.py
Make sure the data
folder is in the same directory as test_sortmerna.py
Users require scikit-bio 0.5.0 to run the tests.
Various features in SortMeRNA are dependent on third-party libraries, including:
- ALP: computes statistical parameters for Gumbel distribution (K and Lambda)
- CMPH: C Minimal Perfect Hashing Library
- KSEQ: FASTA/FASTQ parser (including compressed files)
- PARASAIL: Pairwise Sequence Alignment Library
Thanks to Björn Grüning and Nicola Soranzo, an up-to-date Galaxy wrapper exists for SortMeRNA. Please visit Björn's github page for installation.
Thanks to the Debian Med team, SortMeRNA 2.0 is now a package in Debian. Thanks to Andreas Tille for the sortmerna and indexdb_rna man pages (version 2.0). These have been updated for 2.1 in the master repository.
Thanks to Ben Woodcroft for adding SortMeRNA 2.1 to GNU Guix, find the package here.
SortMeRNA 2.0 can be used in QIIME's pick_closed_reference_otus.py, pick_open_reference_otus.py and assign_taxonomy.py scripts.
Note: At the moment, only 2.0 is compatible with QIIME.
The folder rRNA_databases/silva_ids_acc_tax.tar.gz
contains SILVA taxonomy strings (extracted from XML file generated by ARB)
for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns,
the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.
If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.
See AUTHORS for a list of contributors to this project.