Skip to content

Commit

Permalink
Freeze version (#544)
Browse files Browse the repository at this point in the history
* fix result script in case of no significant results

* fix bug merge in wrong order coverage dt

* Update prepare.rst

* added saving of granges of split counts .as tsv.gz

* Update DNA_RNA_matrix_plot.R

* change ln to softlink

* add help to index

* Update DNA_RNA_matrix_plot.R

fix bug when multiple matching

* Update create_matrix_dna_rna_cor.R

Remove unnecessary step for the QC matrix

* Update DNA_RNA_matrix_plot.R

fix bug in not annotated QC table

* Update DNA_RNA_matrix_plot.R

fix bug in check matches function

* Update DNA_RNA_matrix_plot.R

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update 08_extract_results_FraseR.R

* Update README.md

* update GHA with snakemake version range and activate PR checks

* fix #532 knitr parsing

* remove redundent rules to create dict and fai files

* fix last outrider version install

* Fix FRASER and OUTRIDER versions

* OUTRIDER 1.16.2 & FRASER 1.99.0

* pin OUTRIDER to 1.20.1

* pin FRASER to commit #d6a422c

* update version nr for bioconda

* Update README.md

* Update README.md

---------

Co-authored-by: Ines Scheller <scheller@in.tum.de>
Co-authored-by: Nick <smithnickh@gmail.com>
Co-authored-by: Christian Mertes <mertes@in.tum.de>
Co-authored-by: AtaJadidAhari <atajadidahari@gmail.com>
  • Loading branch information
5 people committed May 17, 2024
1 parent 594d7da commit ef7e11e
Show file tree
Hide file tree
Showing 25 changed files with 125 additions and 106 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-package-conda.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Build

on: [push]
on: [push,pull_request]

jobs:
build-linux:
Expand Down
43 changes: 25 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,16 @@
[![Version](https://img.shields.io/github/v/release/gagneurlab/drop?include_prereleases)](https://github.com/gagneurlab/drop/releases)
[![Version](https://readthedocs.org/projects/gagneurlab-drop/badge/?version=latest)](https://gagneurlab-drop.readthedocs.io/en/latest)

The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw sequencing files.
The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw sequencing data.

The manuscript is available in [Nature Protocols](https://www.nature.com/articles/s41596-020-00462-5). [SharedIt link.](https://rdcu.be/cdMmF)

<img src="drop_sticker.png" alt="drop logo" width="200" class="center"/>
The manuscript is available in [Nature Protocols](https://www.nature.com/articles/s41596-020-00462-5).

The website containing the different reports of the Geuvadis demo dataset described in the paper can be found [here](https://cmm.in.tum.de/public/paper/drop_analysis/webDir/drop_analysis_index.html).

## What's new
This [video](https://www.youtube.com/watch?v=XvgjiFQClhM&t=2761s) introduces the tools used in DROP and their application to rare disease diagnostics.

Versions 1.3.3, 1.3.2 and 1.3.1 fix some bugs.
Version 1.3.0 introduces the option to use FRASER 2.0 which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of percent spliced in and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the `FRASER_version` parameter in the aberrantSplicing dictionary in the config file and adapt the `quantileForFiltering` and `deltaPsiCutoff` parameters. See the [config template](https://github.com/gagneurlab/drop/blob/master/drop/template/config.yaml) for more details. When switching between FRASER versions, we recommend running DROP in a
separate folder for each version. Moreover, DROP now allows users to provide lists of genes to focus on and do the multiple testing correction instead of the usual transcriptome-wide approach. Refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#limiting-fdr-correction-to-subsets-of-genes-of-interest).

`Snakemake v.7.8` introduced some changes in which changes in parameters can cause rules to be re-executed. More info [here](https://github.com/snakemake/snakemake/issues/1694). This affects DROP and causes certain rules in the AS and QC modules to be triggered even if they were already completed and there were no changes in the sample annotation or scripts. The workaround is to run DROP by adding the parameter `--rerun-triggers mtime`, e.g. `snakemake -n --rerun-triggers mtime` or `snakemake --cores 10 --rerun-triggers mtime`. We will investigate the rules in DROP to fix this.

Version 1.2.3 simplifies the plots in the AE Summary Script. In addition, there's a new heatmap in the sampleQC Summary that allows to better identify DNA-RNA mismatches.

As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#rna-variant-calling-dictionary).
<img src="drop_sticker.png" alt="drop logo" width="200" class="center"/>

Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. Simply specify in a new column in the sample annotation the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples).

## Quickstart
DROP is available on [bioconda](https://anaconda.org/bioconda/drop).
Expand All @@ -31,9 +21,9 @@ We recommend using a dedicated conda environment (`drop_env` in this example). I
mamba create -n drop_env -c conda-forge -c bioconda drop --override-channels
```

In the case of mamba/conda troubles we recommend using the fixed `DROP_<version>.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env`
In the case of mamba/conda troubles, we recommend using the fixed `DROP_<version>.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env`
```
mamba env create -f DROP_1.3.3.yaml
mamba env create -f DROP_1.3.4.yaml
```

Test installation with demo project
Expand All @@ -55,6 +45,20 @@ Expected runtime: 25 min
For more information on different installation options, refer to the
[documentation](https://gagneurlab-drop.readthedocs.io/en/latest/installation.html)

## What's new

Due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 simply fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with the version 1.3.4 should provide the same outlier results as 1.3.3.

Version 1.3.0 introduces the option to use FRASER 2.0 which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of percent spliced in and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the `FRASER_version` parameter in the aberrantSplicing dictionary in the config file and adapt the `quantileForFiltering` and `deltaPsiCutoff` parameters. See the [config template](https://github.com/gagneurlab/drop/blob/master/drop/template/config.yaml) for more details. When switching between FRASER versions, we recommend running DROP in a
separate folder for each version. Moreover, DROP now allows users to provide lists of genes to focus on and do the multiple testing correction instead of the usual transcriptome-wide approach. Refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#limiting-fdr-correction-to-subsets-of-genes-of-interest).

`Snakemake v.7.8` introduced some changes in which changes in parameters can cause rules to be re-executed. More info [here](https://github.com/snakemake/snakemake/issues/1694). This affects DROP and causes certain rules in the AS and QC modules to be triggered even if they were already completed and there were no changes in the sample annotation or scripts. The workaround is to run DROP by adding the parameter `--rerun-triggers mtime`, e.g. `snakemake -n --rerun-triggers mtime` or `snakemake --cores 10 --rerun-triggers mtime`. We will investigate the rules in DROP to fix this.

As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#rna-variant-calling-dictionary).

Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. Simply specify in a new column in the sample annotation the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples).


## Set up a custom project
Install the drop module according to [installation](#installation) and initialize the project in a custom project directory.
### Prepare the input data
Expand Down Expand Up @@ -100,7 +104,10 @@ If you want to contribute with your own count matrices, please contact us: yepez

If you use DROP in research, please cite our [manuscript](https://www.nature.com/articles/s41596-020-00462-5).

Furthermore, if you use the aberrant expression module, also cite [OUTRIDER](https://doi.org/10.1016/j.ajhg.2018.10.025); if you use the aberrant splicing module, also cite [FRASER](https://www.nature.com/articles/s41467-020-20573-7); and if you use the MAE module, also cite the [Kremer, Bader et al study](https://www.nature.com/articles/ncomms15824) and [DESeq2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8).
Furthermore, if you use:
* the aberrant expression module, also cite [OUTRIDER](https://doi.org/10.1016/j.ajhg.2018.10.025)
* the aberrant splicing module, also cite [FRASER](https://www.nature.com/articles/s41467-020-20573-7) or [FRASER2](https://www.sciencedirect.com/science/article/pii/S0002929723003671?dgcid=coauthor), depending on the version that you use
* the MAE module, also cite the [Kremer, Bader et al study](https://www.nature.com/articles/ncomms15824) and [DESeq2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8).

For the complete set of tools used by DROP (e.g. for counting), see the [manuscript](https://www.nature.com/articles/s41596-020-00462-5).

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
author = 'Michaela Müller'

# The full version, including alpha/beta/rc tags
release_ = '1.3.3'
release_ = '1.3.4'



Expand Down
4 changes: 2 additions & 2 deletions docs/source/help.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ Common errors

The ``MAE:mae_allelicCounts`` step is susceptible to fail if:

1. the chromosomes styles of the reference genome and the BAM files do not match
1. The chromosomes styles of the reference genome and the BAM files do not match

Solution: Identify the chromosomes style of the BAM file. Obtain an appropriate reference genome file and specify it in the config file.

2. the BAM file does not have the correct ``Read Groups`` documentation both the header and reads. You can often identify if the BAM file has any problems by using the command ``gatk ValidateSamFile -I path/to/bam_file.bam``
2. The BAM file does not have the correct ``Read Groups`` documentation for both the header and reads. You can often identify if the BAM file has any problems by using the command ``gatk ValidateSamFile -I path/to/bam_file.bam``

Solution:
To fix this is often dependent on the individual case, but some combination of the following tools is quite helpful:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Then, DROP can be executed in multiple ways (:doc:`pipeline`).
pipeline
output
license
troubleshooting
help

Quickstart
-----------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Install the latest version and use the full path in the following command to ins

.. code-block:: bash
mamba env create -f DROP_1.3.3.yaml
mamba env create -f DROP_1.3.4.yaml
Installation time: ~ 10min

Expand Down
11 changes: 7 additions & 4 deletions docs/source/prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,9 +314,9 @@ Limiting FDR correction to subsets of genes of interest
------------------------------------
In addition to returning transcriptome-wide results, DROP provides the option to
limit the FDR correction to user-provided genes of interest in the
``aberrantExpression`` and ``aberrantSplicing`` modules. These could e.g. be all
``aberrantExpression`` and ``aberrantSplicing`` modules. These could, for example, be all
OMIM genes. It is also possible to provide sample-specific genes such as all
genes with a rare splice region variant for each sample.
genes with a rare splice-region variant for each sample.
To use this feature, a YAML file containing the set(s) of genes to test
(per sample or for all samples) needs to be specified in the ``genesToTest`` parameter
of the ``aberrantExpression`` and ``aberrantSplicing`` modules in the config file.
Expand All @@ -330,11 +330,14 @@ Creating the YAML file specifying subsets of genes to test
The file containing the list of genes (HGNC symbols) to be tested must be a YAML file,
where the variable names specify the name of each set of tested genes. In the output
of DROP, this name will be used to identify the set in the results table. Each set
can either be a list of genes, in which case the set will be tested for all samples. Alternatively
(and additionally), sample-specific sets can be created by giving the RNA_ID of the sample
can either be: i) a list of genes, in which case the set will be tested for all samples, or ii)
sample-specific sets that can be created by giving the RNA_ID of the sample
for which the set should be used as the name (see example below).
This YAML file can be created in R using ``yaml::write_yaml(subsetList, filepath)``,
where ``subsetList`` is a named list of named lists containing the sets of genes to test.
The gene names must match those from the provided gtf file. We currently do not support Ensembl ids as input.
The table with extracted gene names from the gtf file is located under:
``root/processed_data/preprocess/{geneAnnotation}/gene_name_mapping_{geneAnnotation}.tsv``.
In the following example, the name of the global set of genes is ``Genes_to_test_on_all_samples``
and the name of the sample-specific set is ``Genes_with_rare_splice_variants``:

Expand Down
2 changes: 1 addition & 1 deletion drop/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
from . import utils
from . import demo

__version__ = "1.3.3"
__version__ = "1.3.4"

2 changes: 1 addition & 1 deletion drop/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

@click.group()
@click_log.simple_verbosity_option(logger)
@click.version_option('1.3.3',prog_name='drop')
@click.version_option('1.3.4',prog_name='drop')


def main():
Expand Down
10 changes: 6 additions & 4 deletions drop/installRPackages.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,20 @@ if (file.exists(args[1])){
package=gsub("=.*", "", unlist(args)),
version=gsub(".*=", "", unlist(args)),
ref="")
packages[package == version, version:=NA]
packages[package == version, c("min_version", "max_version") := ""]
}
installed <- as.data.table(installed.packages())

for (pckg_name in packages$package) {
package_dt <- packages[package == pckg_name]
pckg_name <- gsub(".*/", "", pckg_name)
version <- package_dt$version
min_version <- package_dt$min_version
max_version <- package_dt$max_version
branch <- package_dt$ref

if (!pckg_name %in% installed$Package || (!is.na(version) && compareVersion(
installed[Package == pckg_name, Version], version) < 0)) {
if (!pckg_name %in% installed$Package || (min_version != "" && (compareVersion(
installed[Package == pckg_name, Version], min_version) < 0 || compareVersion(
installed[Package == pckg_name, Version], max_version) > 0))) {

package <- package_dt$package
message(paste("install", package))
Expand Down
11 changes: 6 additions & 5 deletions drop/modules/aberrant-expression-pipeline/Counting/Summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,14 @@ cnts_mtx <- counts(ods, normalized = F)
#' Consider removing samples with too low or too high size factors.
#'
bam_coverage <- fread(snakemake@input$bam_cov)
bam_coverage[, sampleID := as.character(sampleID)]
bam_coverage[, RNA_ID := as.character(sampleID)]
bam_coverage[, sampleID := NULL]
setnames(bam_coverage, 'record_count', 'total_count')
coverage_dt <- merge(bam_coverage,
data.table(sampleID = colnames(ods),
coverage_dt <- merge(data.table(RNA_ID = colnames(ods),
read_count = colSums(cnts_mtx),
isExternal = ods@colData$isExternal),
by = "sampleID", sort = FALSE)
bam_coverage,
by = "RNA_ID", sort = FALSE)
# read counts
coverage_dt[, count_rank := rank(read_count)]

Expand Down Expand Up @@ -94,7 +95,7 @@ p_sf

setnames(coverage_dt, old = c('total_count', 'read_count', 'size_factors'),
new = c('Reads Mapped', 'Reads Counted', 'Size Factors'))
DT::datatable(coverage_dt[, .(sampleID, `Reads Mapped`, `Reads Counted`, `Size Factors`)][order(`Reads Mapped`)],
DT::datatable(coverage_dt[, .(RNA_ID, `Reads Mapped`, `Reads Counted`, `Size Factors`)][order(`Reads Mapped`)],
caption = 'Reads summary statistics')

#' # Filtering
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ splitCountRanges <- rowRanges(splitCounts)
# Annotate granges from the split counts
splitCountRanges <- FRASER:::annotateSpliceSite(splitCountRanges)
saveRDS(splitCountRanges, snakemake@output$gRangesSplitCounts)
# additionally save as tsv.gz (for easier AbSplice input)
fwrite(as.data.table(splitCountRanges),
gsub(".Rds", ".tsv.gz", snakemake@output$gRangesSplitCounts,
ignore.case=TRUE))

# Create ranges for non split counts
# Subset by minExpression
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ if(nrow(res_junc_dt) > 0){
res_junc_dt <- merge(res_junc_dt, as.data.table(colData(fds)), by = "sampleID")
res_junc_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL]
} else{
warning("The aberrant splicing pipeline gave 0 results for the ", dataset, " dataset.")
warning("The aberrant splicing pipeline gave 0 intron-level results for the ", dataset, " dataset.")
}

# Extract full results by gene
Expand All @@ -102,7 +102,7 @@ res_genes_dt <- res_genes_dt[do.call(pmin, c(res_genes_dt[,padj_cols, with=FALSE
abs(deltaPsi) >= snakemake@params$deltaPsiCutoff &
totalCounts >= 5,]

if(length(res_gene) > 0){
if(nrow(res_genes_dt) > 0){
res_genes_dt <- merge(res_genes_dt, as.data.table(colData(fds)), by = "sampleID")
res_genes_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL]

Expand All @@ -115,34 +115,35 @@ if(length(res_gene) > 0){
}
}
} else{
res_genes_dt <- data.table()
warning("The aberrant splicing pipeline gave 0 gene-level results for the ", dataset, " dataset.")
}

# Annotate results with spliceEventType and blacklist region overlap
# load reference annotation
library(AnnotationDbi)
txdb <- loadDb(snakemake@input$txdb)

# annotate the type of splice event and UTR overlap
res_junc_dt <- annotatePotentialImpact(result=res_junc_dt, txdb=txdb, fds=fds)
res_genes_dt <- annotatePotentialImpact(result=res_genes_dt, txdb=txdb, fds=fds)

# set genome assembly version to load correct blacklist region BED file (hg19 or hg38)
assemblyVersion <- snakemake@config$genomeAssembly
if(grepl("grch37", assemblyVersion, ignore.case=TRUE)){
assemblyVersion <- "hg19"
if(nrow(res_junc_dt) > 0){
res_junc_dt <- annotatePotentialImpact(result=res_junc_dt, txdb=txdb, fds=fds)
}
if(grepl("grch38", assemblyVersion, ignore.case=TRUE)){
assemblyVersion <- "hg38"
if(nrow(res_genes_dt) > 0){
res_genes_dt <- annotatePotentialImpact(result=res_genes_dt, txdb=txdb, fds=fds)
}

# set genome assembly version to load correct blacklist region BED file (hg19 or hg38)
assemblyVersion <- snakemake@config$genomeAssembly
if(grepl("grch37", assemblyVersion, ignore.case=TRUE)) assemblyVersion <- "hg19"
if(grepl("grch38", assemblyVersion, ignore.case=TRUE)) assemblyVersion <- "hg38"

# annotate overlap with blacklist regions
if(assemblyVersion %in% c("hg19", "hg38")){
res_junc_dt <- flagBlacklistRegions(result=res_junc_dt,
if(nrow(res_junc_dt) > 0){
res_junc_dt <- flagBlacklistRegions(result=res_junc_dt,
assemblyVersion=assemblyVersion)
res_genes_dt <- flagBlacklistRegions(result=res_genes_dt,
}
if(nrow(res_genes_dt) > 0){
res_genes_dt <- flagBlacklistRegions(result=res_genes_dt,
assemblyVersion=assemblyVersion)
}
} else{
message(date(), ": cannot annotate blacklist regions as no blacklist region\n",
"BED file is available for genome assembly version ", assemblyVersion,
Expand Down
2 changes: 1 addition & 1 deletion drop/modules/mae-pipeline/MAE/Results.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ res[, MAE_ALT := MAE == TRUE & altRatio >= allelicRatioCutoff]
#'
#' Number of samples with significant MAE for alternative events: `r uniqueN(res[MAE_ALT == TRUE, ID])`

#+echo=F
#+ echo=F

# Save full results zipped
res[, altRatio := round(altRatio, 3)]
Expand Down
Loading

0 comments on commit ef7e11e

Please sign in to comment.