PredPSI-SVR
was designed to predict the change of percent spliced in (delta-PSI or ) caused by genetic variants for the CAGI 5 vex-seq challenge.
Send questions and comments to chenkenbio@gmail.com
- Operation system: Unix/Linux
- Memory: 4GB at least
Perl
in your PATHPython 2
Python 3
(withnumpy
package installed)
If you have trouble installingpython 3
ornumpy
, you can tryminiconda
cd ~/Downloads wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod +x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.sh #pay attention to path of installation, we'll use the default path "$HOME/miniconda3" in this tutorial source $HOME/miniconda/bin/activate pip install numpy
Note: If you have the following packages installed on your system, you can skip installing them and just edit path in src/init.sh
We put PrePSI-SVR in $HOME directory as default
-
Download
PredPSI-SVR
,cd ~ git clone https://github.com/chenkenbio/PredPSI-SVR
-
Download
ANNOVAR
(http://annovar.openbioinformatics.org/en/latest/user-guide/download/),libsvm
(https://www.csie.ntu.edu.tw/~cjlin/libsvm), andsamtools
(http://www.htslib.org/download/). And move them toPredPSI-SVR/tools
. -
Extract packages:
cd ~/PredPSI-SVR/tools tar -xzvf annovar.latest.tar.gz tar -xzvf libsvm-3.23.tar.gz tar -xjvf samtools-1.9.tar.bz2 cd libsvm-3.23 make all cd ../samtools-1.9 make all cd ..
-
Download basic annotation databases for
ANNOVAR
cd ~/PredPSI-SVR/tools/annovar # PredPSI-SVR/tools ./annotate_variation.pl -buildver hg19 -downdb -webfrom annovar ensGene ./humandb/
-
Download third-party database SPIDEX from http://www.openbioinformatics.org/annovar/spidex_download_form.php. Move it to
~/PredPSI-SVR/tools/annovar/humandb/
and decompress withunzip
:
unzip hg19_spidex.zip #working directory: PredPSI-SVR/tools/annovar/humandb
- Download hg19 genome
cd ~/PredPSI-SVR/genome
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
cat *.fa > hg19.fasta
$HOME/PredPSI-SVR/tools/samtools-1.9/samtools faidx hg19.fasta
- Finally, check variables in
src/init.sh
, edit them to fit your system
Example:
cd ~/PredPSI-SVR/
## PredPSI-SVR, with "-p" option
./main.sh example/sample.vcf -p example/sample.psi -o example/outdir
## PredPSI-SVR-noPSI, with out "-p"
./main.sh example/sample.vcf -o example/outdir
Result file is example/outdir/OUTPUT.dpsi
The PredPSI-SVR
will filter the VCF file at first to remove variants in intergenic regions or distant to splice sites (more than 200 bp ). Threrefore sometimes you will find that there are fewer variants in OUTPUT.psi
than your input VCF file.
PredPSI-SVR/tools/ese3/ese3_mod.py
is modified based a script in SilVA package (Paper: https://www.ncbi.nlm.nih.gov/pubmed/23736532, GitHub: https://github.com/buske/silva)
Chen, K., Lu, Y., Zhao, H., & Yang, Y. (2019). Predicting the change of exon splicing caused by genetic variant using support vector regression. Human mutation, 40(9), 1235–1242. https://doi.org/10.1002/humu.23785