Skip to content

SortMeRNA: a sequence analysis tool for filtering, mapping and clustering NGS reads.

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

allemathor/sortmerna

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sortmerna

Build Status

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1 (http://qiime.org). SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Visit http://bioinfo.lifl.fr/RNA/sortmerna/ for more information.

Table of Contents

Support

For questions and comments, please use the SortMeRNA forum.

Documentation

If you have Doxygen installed, you can generate the documentation by modifying the following lines in doxygen_configure.txt:

INPUT = /path/to/sortmerna/include /path/to/sortmerna/src
IMAGE_PATH = /path/to/sortmerna/algorithm

and running the following command:

doxygen doxygen_configure.txt

This command will generate a folder html in the directory from which the command was run.

Getting Started

There are 3 methods to install SortMeRNA:

  1. GitHub repository development version (master branch) ...* Installation instructions
  2. GitHub releases (tar balls) ...* Installation instructions Linux ...* Installation instructions Mac OS
  3. BioInfo releases (tar balls including compiled binaries)

Option (3) is the most straight-forward, as it does not require autoconf and provides access to pre-compiled binaries to various OS.

SortMeRNA Compilation

NOTE: You will require autoconf to build from the git cloned repository or from source code in the Source code tar balls under release GitHub Downloads.

(0) Prepare your build system for compilation:

bash autogen.sh

Linux OS

(1) Check your GCC compiler is version 4.0 or above:

gcc --version

(2) Run configure and make scripts:

./configure
make

(3) To install:

make install

You can define an alternative installation directory by specifying --prefix=/path/to/installation/dir to configure.

Mac OS

(1) Check the version of your C/C++ compiler:

gcc --version

(2a) If the compiler is Clang or GCC, proceed to run configure and make scripts:

./configure
make

Note: If the compiler is Clang, you will not have access to multithreading.

(2b) If the compiler is LLVM-GCC, you will need to change it (see Deprecation and Removal Notice).

To set your compiler to Clang (see instructions) or the original GCC compiler (see instructions).

(3) To install:

make install

Set Clang compiler for Mac OS

(1) Check if you have Clang installed:

clang --version

(2a) If Clang is installed, set your compiler to Clang:

export CC=clang
export CXX=clang++

(2b) If Clang is not installed, see Clang for Mac OS for installation instructions.

Set GCC compiler for Mac OS

(1) Check if you have GCC installed:

gcc --version

(2a) If GCC is installed, set your compiler to GCC:

export CC=gcc-mp-4.8
export CXX=g++-mp-4.8

(2b) If GCC is not installed, see Install GCC through MacPorts for installation instructions.

(3) Next, if you would like zlib support (reading compressed .zip and .gz FASTA/FASTQ files), Zlib should also be installed via MacPorts. See section Install GCC and Zlib though MacPorts for installation instructions.

(4a) Assuming you have Zlib installed, run configure and make scripts (if compression feature wanted):

./configure --with-zlib="/opt/local"
make

(4b) Otherwise (if option to read compressed files is not wanted):

./configure --without-zlib
make

You can define an alternative installation directory by specifying --prefix=/path/to/installation/dir to configure.

Install compilers, ZLIB and autoconf

NOTE: the Clang compiler on Mac (distributed through Xcode) does not support OpenMP (multithreading). A preliminary implementation of OpenMP for Clang has been made at "http://clang-omp.github.io" though has not been yet incorporated into the Clang mainline. The user may follow the steps outlined in the above link to install the version of Clang with multithreading support, though this version has not yet been tested with SortMeRNA. Otherwise, the user is recommended to install the original GCC compiler via MacPorts (contains full multithreading support).

Clang for Mac OS

Installing Xcode (free through the App Store) and Xcode command line tools will automatically install the latest version of Clang supported with Xcode.

After installing Xcode, the Xcode command line tools may be installed via:

Xcode -> Preferences -> Downloads

Under "Components", click to install "Command Line Tools"

GCC and Zlib though MacPorts

Assuming you have MacPorts installed, type:

sudo port selfupdate
sudo port install gcc48
sudo port install zlib

After the installation, you should find the compiler installed in /opt/local/bin/gcc-mp-4.8 and /opt/local/bin/g++-mp-4.8 as well as Zlib in /opt/local/lib/libz.dylib and /opt/local/include/zlib.h .

autoconf

wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
tar -zxvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure
make
make install

You can define an alternative installation directory by specifying --prefix=/path/to/installation/dir to configure (before calling make).

If installing in a directory other than those listed in $PATH, add the installation directory to $PATH:

export PATH=$PATH:/path/to/installation/dir

Tests

Usage tests can be run with the following command:

python ./tests/test_sortmerna.py
python ./tests/test_sortmerna_zlib.py

Make sure the data folder is in the same directory as test_sortmerna.py

Users require scikit-bio 0.5.0 to run the tests.

Third-party libraries

Various features in SortMeRNA are dependent on third-party libraries, including:

  • ALP: computes statistical parameters for Gumbel distribution (K and Lambda)
  • CMPH: C Minimal Perfect Hashing Library
  • KSEQ: FASTA/FASTQ parser (including compressed files)
  • PARASAIL: Pairwise Sequence Alignment Library

Wrappers and Packages

Galaxy

Thanks to Björn Grüning and Nicola Soranzo, an up-to-date Galaxy wrapper exists for SortMeRNA. Please visit Björn's github page for installation.

Debian

Thanks to the Debian Med team, SortMeRNA 2.0 is now a package in Debian. Thanks to Andreas Tille for the sortmerna and indexdb_rna man pages (version 2.0). These have been updated for 2.1 in the master repository.

GNU Guix

Thanks to Ben Woodcroft for adding SortMeRNA 2.1 to GNU Guix, find the package here.

QIIME

SortMeRNA 2.0 can be used in QIIME's pick_closed_reference_otus.py, pick_open_reference_otus.py and assign_taxonomy.py scripts.

Note: At the moment, only 2.0 is compatible with QIIME.

Taxonomies

The folder rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB) for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns, the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

See AUTHORS for a list of contributors to this project.

About

SortMeRNA: a sequence analysis tool for filtering, mapping and clustering NGS reads.

Resources

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 60.7%
  • C 24.1%
  • Python 4.1%
  • Shell 3.6%
  • Makefile 3.3%
  • TeX 2.8%
  • Other 1.4%