Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge external splicing counts #247

Merged
merged 73 commits into from
Apr 22, 2022
Merged

Merge external splicing counts #247

merged 73 commits into from
Apr 22, 2022

Conversation

c-mertes
Copy link
Contributor

This is the fresh take on the merging of external splicing counts. #169

vyepez88
vyepez88 previously approved these changes Apr 1, 2022
@@ -50,3 +52,5 @@ nonSplitCounts <- getNonSplitReadCountsForAllSamples(fds=fds,
longRead=params$longRead)

message(date(), ":", dataset, " nonSplit counts done")

file.create(snakemake@output$done)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file.create(snakemake@output$done)
file.create(snakemake@output$done)

minExpressionInOneSample = minExpressionInOneSample,
minDeltaPsi = minDeltaPsi,
filter=FALSE)
fds <- saveFraserDataSet(fds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using external counts save a new copy of the fds object
else use a symlink of the fds object

use the new copy/link in future processing

Comment on lines +1 to +3
Results and Output of DROP
===========================

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow slack comments

Comment on lines +39 to +42
has_external <- !(all(ods@colData$GENE_COUNTS_FILE == "") || is.null(ods@colData$GENE_COUNTS_FILE))
if(has_external){
ods@colData$isExternal <- as.factor(ods@colData$GENE_COUNTS_FILE != "")
}else{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment to explain

Comment on lines 125 to 133
res[MAE == TRUE & MAE_ALT == FALSE, N_MAE_REF := .N, by = ID]
res[MAE_ALT == TRUE, N_MAE_ALT := .N, by = ID]
res[MAE == TRUE & MAE_ALT == FALSE & rare == TRUE, N_MAE_REF_RARE := .N, by = ID]
res[MAE_ALT == TRUE & rare == TRUE, N_MAE_ALT_RARE := .N, by = ID]

rd <- unique(res[,.(ID, N, N_MAE, N_MAE_REF, N_MAE_ALT, N_MAE_REF_RARE, N_MAE_ALT_RARE)])

# rd contains duplicate entries for each ID. IE when MAE==F N_MAE for ID1 is both .N and 0
# summarize these duplicates by taking the maximum of each column for each ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean

Copy link
Contributor Author

@c-mertes c-mertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adapt the code as discussed.

DROP is intended to help researchers use RNA-Seq data in order to detect genes with aberrant expression,
aberrant splicing and mono-allelic expression. By simplifying the workflow process we hope to provide
easy to read and interpret html files and output files. This section is dedicated to explaining the relevant
results files. We will use the results of the ``demo`` to explain the files generated.::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
results files. We will use the results of the ``demo`` to explain the files generated.::
results files. We will use the results of the ``demo`` to explain the files generated by the following commands:

Aberrant Expression
+++++++++++++++++++

html file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
html file
HTML file


DROP is intended to help researchers use RNA-Seq data in order to detect genes with aberrant expression,
aberrant splicing and mono-allelic expression. By simplifying the workflow process we hope to provide
easy to read and interpret html files and output files. This section is dedicated to explaining the relevant
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
easy to read and interpret html files and output files. This section is dedicated to explaining the relevant
easy to read and interpret HTML files and output files. This section is dedicated to explaining the relevant


* Counting Summaries
* For each aberrant expression group
* split of local vs external sample counts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* split of local vs external sample counts
* number of local vs external sample

* information about the expressed genes within each sample and as a dataset
* Outrider Summaries
* For each aberrant expression group
* the number of aberrantly expressed gene per sample
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* the number of aberrantly expressed gene per sample
* the number of aberrantly expressed genes per sample

Comment on lines 39 to 45
* Files
* OUTRIDER files for each aberrant expression group
* For each of these files you can follow the `OUTRIDER vignette for individual analysis <https://www.bioconductor.org/packages/devel/bioc/vignettes/OUTRIDER/inst/doc/OUTRIDER.pdf>`_.
* tsv files
* For each aberrant expression group
* results.tsv
* this tsv file contains only the significant genes and samples that meet the cutoffs defined in the ``config.yaml`` for ``padjCutoff`` and ``zScoreCutoff``
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Files
* OUTRIDER files for each aberrant expression group
* For each of these files you can follow the `OUTRIDER vignette for individual analysis <https://www.bioconductor.org/packages/devel/bioc/vignettes/OUTRIDER/inst/doc/OUTRIDER.pdf>`_.
* tsv files
* For each aberrant expression group
* results.tsv
* this tsv file contains only the significant genes and samples that meet the cutoffs defined in the ``config.yaml`` for ``padjCutoff`` and ``zScoreCutoff``
* Files (for each aberrant expression group)
* OUTRIDER data files (RDS)
* You can follow the `OUTRIDER vignette for further individual analysis <https://www.bioconductor.org/packages/devel/bioc/vignettes/OUTRIDER/inst/doc/OUTRIDER.pdf>`.
* results files (TSV)
* the result file contains only the significant genes and samples that meet the cutoffs defined in the ``config.yaml`` for ``padjCutoff`` and ``zScoreCutoff``

* For each aberrant splicing group
* split of local (from internal BAM files) vs external sample counts
* split of local vs merged with external sample splicing/intron counts
* comparison of local and external log mean counts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* comparison of local and external log mean counts
* comparison of local and external mean counts

@@ -16,13 +16,14 @@ exportCounts:
- v29
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain it twice the file? In the code base and in the resource tar file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not in the resource tar

@@ -1,23 +1,25 @@
RNA_ID RNA_BAM_FILE DNA_VCF_FILE DNA_ID DROP_GROUP PAIRED_END COUNT_MODE COUNT_OVERLAPS STRAND HPO_TERMS GENE_COUNTS_FILE GENE_ANNOTATION GENOME
RNA_ID RNA_BAM_FILE DNA_VCF_FILE DNA_ID DROP_GROUP PAIRED_END COUNT_MODE COUNT_OVERLAPS STRAND HPO_TERMS GENE_COUNTS_FILE GENE_ANNOTATION GENOME SPLICE_COUNTS_DIR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, do we need to maintain it twice?

@@ -6,13 +6,13 @@
#' - snakemake: '`sm str(tmp_dir / "AS" / "{dataset}" / "splitReads" / "{sample_id}.Rds")`'
#' params:
#' - setup: '`sm cfg.AS.getWorkdir() + "/config.R"`'
#' - workingDir: '`sm cfg.getProcessedDataDir() + "/aberrant_splicing/datasets"`'
#' - workingDir: '`sm cfg.getProcessedDataDir() + "/aberrant_splicing/datasets/fromBam"`'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ..../datasets/raw-local-{dataset} raw-{dataset} {dataset}

nickhsmith
nickhsmith previously approved these changes Apr 22, 2022
@nickhsmith nickhsmith merged commit 8a88adb into dev Apr 22, 2022
@nickhsmith nickhsmith deleted the new_external_merge_splicing branch April 22, 2022 16:35
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants