Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in mae #518

Open
Acetyl-ZHOU opened this issue Feb 7, 2024 · 8 comments
Open

error in mae #518

Acetyl-ZHOU opened this issue Feb 7, 2024 · 8 comments

Comments

@Acetyl-ZHOU
Copy link

I run the mae,but have some problem.
[W::hts_idx_load2] The index file is older than the data file: /Users/huizhou/Documents/1_allelic/drop_demo/qc_vcf_1000G_hg19.vcf.gz.tbi Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: invalid class “ScanVcfParam” object: ScanVcfParam: 'geno' cannot be specified if 'samples' is 'NA' Execution halted
can you give me some help?

@vyepez88
Copy link
Collaborator

vyepez88 commented Feb 7, 2024

Hi, it seems that there is something wrong with one of your vcf files or ids from the RNA BAM files. In which step exactly did this happen?

@Acetyl-ZHOU
Copy link
Author

I want run snakemake --cores all sampleAnnotation mae
and get that problem.
My VCF and bam files also hg19.
this is my config.yaml
`projectTitle: "Detection of RNA Outliers Pipeline"
root: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output # root directory of all output objects and tables
htmlOutputPath: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output/html # path for HTML rendered reports
indexWithFolderName: true # whether the root base name should be part of the index name

hpoFile: null # if null, downloads it from webserver
sampleAnnotation: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/sampleAnnotation.tsv # path to sample annotation (see documentation on how to create it)

geneAnnotation:
v45: /Users/huizhou/Documents/1_allelic/drop_demo/gencode.v45lift37.basic.annotation.gtf
genomeAssembly: hg19
genome: /Users/huizhou/Documents/1_allelic/drop_demo/hg19.fa

exportCounts:
# specify which gene annotations to include and which
# groups to exclude when exporting counts
geneAnnotations:
- v45
excludeGroups:
- null

aberrantExpression:
run: false
groups:
- all
fpkmCutoff: 1
implementation: autoencoder
padjCutoff: 0.05
zScoreCutoff: 0
genesToTest: null
maxTestedDimensionProportion: 3
yieldSize: 2000000

aberrantSplicing:
run: false
groups:
- all
recount: false
longRead: false
keepNonStandardChrs: false
filter: true
minExpressionInOneSample: 20
quantileMinExpression: 10
minDeltaPsi: 0.05
implementation: PCA
padjCutoff: 0.1
maxTestedDimensionProportion: 6
genesToTest: null
### FRASER1 configuration
FRASER_version: "FRASER"
deltaPsiCutoff : 0.3
quantileForFiltering: 0.95

mae:
run: true
groups:
- all
gatkIgnoreHeaderCheck: true
padjCutoff: 0.05
allelicRatioCutoff: 0.8
addAF: true
maxAF: 0.001
maxVarFreqCohort: 0.05
# VCF-BAM matching
qcVcf: /Users/huizhou/Documents/1_allelic/drop_demo/qc_vcf_1000G_hg19.vcf.gz
qcGroups:
- all
dnaRnaMatchCutoff: 0.85

rnaVariantCalling:
run: true
groups:
- all
highQualityVCFs:
- /Users/huizhou/Documents/1_allelic/drop_demo/Mills_and_1000G_gold_standard.indels.hg19.sites.chrPrefix.vcf.gz
- /Users/huizhou/Documents/1_allelic/drop_demo/1000G_phase1.snps.high_confidence.hg19.sites.chrPrefix.vcf.gz
dbSNP: /Users/huizhou/Documents/1_allelic/drop_demo/00-All.vcf.gz
repeat_mask: /Users/huizhou/Documents/1_allelic/drop_demo/hg19_repeatMasker_sorted.chrPrefix.bed
createSingleVCF: true
addAF: true
maxAF: 0.001
maxVarFreqCohort: 0.05
hcArgs: ""
minAlt: 3
yieldSize: 100000

tools:
gatkCmd: gatk
bcftoolsCmd: bcftools
samtoolsCmd: samtools`

@Acetyl-ZHOU
Copy link
Author

I also find some problem with this.
[W::vcf_parse_filter] FILTER 'VarFreq,VarMapQual,MinMMQSdiff' is not defined in the header [E::bcf_hdr_parse_line] Could not parse the header line: "##FILTER=<ID=VarFreq,VarMapQual,MinMMQSdiff,Description=\"Dummy\">" [E::vcf_parse_filter] Could not add dummy header for FILTER 'VarFreq,VarMapQual,MinMMQSdiff' at chr1:10048 [W::vcf_parse_filter] FILTER 'NoReadCounts' is not defined in the header [W::vcf_parse_filter] FILTER 'MinMMQSdiff' is not defined in the header

what I can do with header?

@vyepez88
Copy link
Collaborator

vyepez88 commented Feb 9, 2024

Hi, be sure to double-check your vcf files format, for example:

  • it should contain a column with the sample name containing the genotype of that sample. That sample name is what you should specify in the DNA_ID column of the sample annotation
  • the header contains all the values present in the FILTER, INFO and FORMAT columns

@Acetyl-ZHOU
Copy link
Author

In my DNA VCF file the header is like this.
##INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases wit h Phred score >= 15"> 4 ##INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wi ld-type)"> 5 ##INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous -variant"> 6 ##INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-v ariant"> 7 ##INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called"> 8 ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting re ads on one strand"> 9 ##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this positio n"> 10 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> 11 ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> 12 ##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMto ols"> 13 ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phr ed score >= 15"> 14 ##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> 15 ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (r eads2)"> 16 ##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> 17 ##FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test" 24 ##FILTER=<ID=VarCount,Description="Fewer than 4 variant-supporting reads"> 25 ##FILTER=<ID=VarFreq,Description="Variant allele frequency below 0.05"> 26 ##FILTER=<ID=VarReadPos,Description="Relative average read position < 0.01"> 27 ##FILTER=<ID=VarDist3,Description="Average distance to effective 3' end < 0.01"> 28 ##FILTER=<ID=VarMMQS,Description="Average mismatch quality sum for variant reads > 100" > 29 ##FILTER=<ID=VarMapQual,Description="Average mapping quality of variant reads < 15"> 30 ##FILTER=<ID=VarBaseQual,Description="Average base quality of variant reads < 28"> 31 ##FILTER=<ID=Strand,Description="Strand representation of variant reads < 0.01"> 32 ##FILTER=<ID=RefMapQual,Description="Average mapping quality of reference reads < 15"> 33 ##FILTER=<ID=RefBaseQual,Description="Average base quality of reference reads < 28"> 34 ##FILTER=<ID=MMQSdiff,Description="Mismatch quality sum difference (ref - var) > 50"> 35 ##FILTER=<ID=MapQualDiff,Description="Mapping quality difference (ref - var) > 50"> 36 ##FILTER=<ID=ReadLenDiff,Description="Average supporting read length difference (ref - var) > 0.25">

do I need to add this information to sample annotaion? Or change vcf file?

@vyepez88
Copy link
Collaborator

it seems that MinMMQSdiff is not defined in the VCF file header. You would need to modify your VCF files.

@Acetyl-ZHOU
Copy link
Author

So can I just write something for MinMMQSdiff?
I don`t know how to add this to my Vcf.

@vyepez88
Copy link
Collaborator

consider validating your vcf files beforehand using for example the validation on only VCF format tests from GATK:
https://gatk.broadinstitute.org/hc/en-us/articles/360037057272-ValidateVariants

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants