Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline crashed when no variant is found #14

Closed
mfoll opened this issue Sep 11, 2015 · 16 comments
Closed

Pipeline crashed when no variant is found #14

mfoll opened this issue Sep 11, 2015 · 16 comments

Comments

@mfoll
Copy link
Member

mfoll commented Sep 11, 2015

It seems that when the process R_regression does not produce variants, as it does not produce pdf output (but the errorStrategy 'ignore' makes that acceptable), the empty vcf is not sent to the vcf channel:

output:
     file "${region_tag}.vcf" into vcf
     file '*.pdf' into PDF

This creates an error in the collect_vcf_result process as there is no vcf there.

@mfoll mfoll self-assigned this Sep 11, 2015
@tdelhomme
Copy link
Member

Do you have any example file to test this?

@mfoll
Copy link
Member Author

mfoll commented Sep 16, 2015

If you use the tiny bed file on this test data set it will crash:

git clone --depth=1 https://github.com/mfoll/NGS_data_test.git
cd NGS_data_test/1000G_CEU_TP53/
nextflow run mfoll/robust-regression-caller -with-docker mfoll/robust-regression-caller --bed \ 
               TP53_tiny.bed --bam_folder BAM/ --fasta_ref 17.fasta.gz

We can either:

  • Add errorStrategy 'ignore'in collect_vcf_result: but the pipeline will output nothing when no variant is found, and I don't like too much using this option as if a real error occurs it's harder to spot
  • Check if there is no vcf file produced and in this case output an empty vcf
  • Ask if nextflow people could change this behaviour (when one output is missing, still send the others to their channels). This would be my favorite option of course.

@mfoll mfoll added this to the First official release v0.1 milestone Sep 16, 2015
@mfoll
Copy link
Member Author

mfoll commented Sep 16, 2015

@pditommaso suggestion:

the easiest way to handle this is creating an empty pdf file in the BASH script
and eventually filtering out the empty file from pdf channel
something like this
PDF.filter { it.size() > 0 }.set { pdf_2 }

@mfoll
Copy link
Member Author

mfoll commented Sep 16, 2015

That's complicating the pipeline just to produce an empty vcf output... I would rather prefer option 2 I mentioned above.

@pditommaso
Copy link
Contributor

As far as I'm understanding this happens only with the test data. Is it not possible to create a test dataset producing at least one entry in the vcf?

@mfoll
Copy link
Member Author

mfoll commented Sep 16, 2015

No, it happened to me on real data and then I created a test replicating the issue.

@pditommaso
Copy link
Contributor

Thus, when the pileup_nbrr_caller_vcf.r script creates an empty vcf and in this case it does not create the pdf file.

In my opinion for consistency it should create an empty pdf file (or none of them).

mfoll added a commit that referenced this issue Sep 17, 2015
Creates an empty pdf in all cases and delete them afterward
@mfoll mfoll mentioned this issue Sep 17, 2015
tdelhomme added a commit that referenced this issue Sep 17, 2015
@mfoll mfoll closed this as completed Sep 17, 2015
@mfoll mfoll reopened this Sep 17, 2015
@mfoll
Copy link
Member Author

mfoll commented Sep 17, 2015

I am trying to delete the empty pdf files. But the empty pdf is only deleted when it is the only pdf produced. There must be an error in this line: https://github.com/mfoll/robust-regression-caller/blob/dev/samtools_regression_somatic_vcf.nf#L149
@pditommaso can you help please?

@pditommaso
Copy link
Contributor

I'm a bit confused about that code. Why do you need a PDF output channel if it is not consumed by nobody?

@mfoll
Copy link
Member Author

mfoll commented Sep 17, 2015

To copy it where I want it to be using storeDir, but maybe there is another way? And actually if I remember well, omitting into PDF leads to an infinite loop in nextflow without any error message (but I need to double check that).

@mfoll
Copy link
Member Author

mfoll commented Sep 17, 2015

Ok I was wrong, the into PDF is not necessary and not leading to an infinite loop.
But I do need it to delete the empty pdf file that I produce. So do you know what is wrong with PDF.filter { it.size() == 0 }.subscribe { it.delete() }?

@pditommaso
Copy link
Contributor

Does it report an error message? what's the problem?

@mfoll
Copy link
Member Author

mfoll commented Sep 17, 2015

No, but when the process R_regression produces a single pdf output (the empty one I create here) it is properly deleted, but when the process R_regression also creates others, it's not deleted.

@mfoll
Copy link
Member Author

mfoll commented Sep 17, 2015

If you want to replicate the issue you can use my test data:

git clone --depth=1 https://github.com/mfoll/NGS_data_test.git
cd NGS_data_test/1000G_CEU_TP53/

Then:

nextflow run mfoll/robust-regression-caller -with-docker mfoll/robust-regression-caller --bed TP53_tiny.bed --bam_folder BAM/ --fasta_ref 17.fasta.gz

Creates the empty pdf and deletes it (absent from BAM/VCF).

nextflow run mfoll/robust-regression-caller -with-docker mfoll/robust-regression-caller --bed TP53_exon2_11.bed --bam_folder BAM/ --fasta_ref 17.fasta.gz

Creates the empty pdf and others, and then doesn't delete the empty one (present in BAM/VCF).

@pditommaso
Copy link
Contributor

OK, I think the problem is that when there's more than a pdf, the PDF items are list objects, thus it.size() returns the size of the list instead of the file. You should declare it as flatten.

Said that, I've noticed that you have used storeDir on any process. Are you aware that it will cause all following runs of the pipeline to skip the execution of that processes?

mfoll added a commit that referenced this issue Sep 18, 2015
@mfoll mfoll mentioned this issue Sep 18, 2015
@mfoll mfoll closed this as completed in #19 Sep 18, 2015
mfoll added a commit that referenced this issue Sep 18, 2015
@mfoll
Copy link
Member Author

mfoll commented Sep 18, 2015

Sorry @pditommaso I should have figured this out by myself (you should stop being so helpful, or users like me will become lazy!). It's working perfectly now.

Yes I know the behaviour with storeDir and I actually like it. It's true I could remove some in the production version of the pipeline, but it's really helpful at the moment to keep all intermediate files outside the workfolder in an organised way. It's also very convenient for restarting the pipeline at a given point: I just need to delete the folders created after this point and nextflow automatically skips the previous steps. Maybe something to add in nextflow would be an option in the command line to ignore and overwrite the existing files if they exist in this case (like the opposite to the resume option).

@mfoll mfoll removed the bug label Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants