Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added enformer predictor #2

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
02d834e
added enformer pipeline
gtsitsiridis Aug 7, 2024
0739ca0
added enformer pipeline
gtsitsiridis Aug 7, 2024
56014f6
added enformer pipeline
gtsitsiridis Aug 7, 2024
9ffe6bd
added enformer pipeline
gtsitsiridis Aug 7, 2024
17301b0
added enformer pipeline
gtsitsiridis Aug 7, 2024
0eee3a2
added enformer pipeline
gtsitsiridis Aug 7, 2024
602f2c8
added enformer pipeline
gtsitsiridis Aug 7, 2024
dea4bbd
added enformer pipeline
gtsitsiridis Aug 7, 2024
e4f1878
added enformer pipeline
gtsitsiridis Aug 7, 2024
719c116
added enformer defaults in config
gtsitsiridis Aug 8, 2024
61044ef
added enformer defaults in config
gtsitsiridis Aug 8, 2024
567612a
added enformer defaults in config
gtsitsiridis Aug 8, 2024
c03dc4d
added enformer defaults in config
gtsitsiridis Aug 8, 2024
b36bf25
added enformer defaults in config
gtsitsiridis Aug 8, 2024
9d31278
download enformer reference logic
gtsitsiridis Aug 8, 2024
a77ebf2
download enformer reference logic
gtsitsiridis Aug 8, 2024
315af28
download enformer reference logic
gtsitsiridis Aug 8, 2024
58f6fe5
download enformer reference logic
gtsitsiridis Aug 8, 2024
21509fb
download enformer reference logic
gtsitsiridis Aug 8, 2024
d9d9601
download enformer reference logic
gtsitsiridis Aug 8, 2024
4a099a2
download enformer reference logic
gtsitsiridis Aug 8, 2024
34ae679
download enformer reference logic
gtsitsiridis Aug 8, 2024
d93d1d1
download enformer reference logic
gtsitsiridis Aug 8, 2024
bc46e3e
enformer refactoring
gtsitsiridis Aug 12, 2024
ee124d9
enformer refactoring
gtsitsiridis Aug 12, 2024
2f30730
enformer refactoring
gtsitsiridis Aug 22, 2024
1178fcd
renamed enformer run config
gtsitsiridis Aug 22, 2024
2c8bd58
renamed enformer run config
gtsitsiridis Aug 22, 2024
3ea517b
renamed enformer run config
gtsitsiridis Aug 22, 2024
f1c271c
removed duplicate gtf2parquet rule
gtsitsiridis Aug 22, 2024
99ca6b2
enformer protein_coding_only flag
gtsitsiridis Aug 22, 2024
0b2b1e9
enformer chromosomes flag
gtsitsiridis Aug 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ with open("system_config.yaml", "r") as fd:
with open("download_urls.yaml", "r") as fd:
download_urls = yaml.safe_load(fd)

# TODO upload enformer reference; download reference rule

SNAKEMAKE_DIR = os.path.abspath(os.path.dirname(workflow.snakefile))

CONDA_ENV_YAML_DIR=f"{SNAKEMAKE_DIR}/envs"
Expand Down Expand Up @@ -85,7 +87,7 @@ rule all:
expand(
f"{RESULTS_DIR}/predict/{{model_type}}/{{vcf_file}}.parquet",
vcf_file=vcf_input_file_names,
model_type=config["predict_abexp_models"],
model_type=config["predict_abexp_models"],
),


Expand Down
2 changes: 2 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ gtf_file: "example/chr22.gencode.v34.annotation.gtf.gz"

# mapping of chromosome names. By default, maps them to "chr" prefix.
chrom_alias_tsv: "{SNAKEMAKE_DIR}/resources/chromAlias.tsv"

use_enformer_gpu: false
26 changes: 25 additions & 1 deletion defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,35 @@ absplice:
- Vagina
- Whole_Blood
use_spliceai_rocksdb: True
spliceai_rocksdb_path:
spliceai_rocksdb_path:
hg19: '{RESOURCES_DIR}/spliceai_rocksdb/spliceAI_hg19_chr{chromosome}.db'
hg38: '{RESOURCES_DIR}/spliceai_rocksdb/spliceAI_hg38_chr{chromosome}.db'
spliceai_rocksdb_chromosomes: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']

enformer:
batch_size: 2
isoform_aggregation_mode: canonical
variant_upstream_tss: 50
variant_downstream_tss: 200
# Whether to calculate the variant effect only for the canonical transcripts
canonical_only: true
# Whether to calculate the variant effect only for the protein coding transcripts
protein_coding_only: true
# the chromosomes in which to look for transcripts
chromosomes: ["chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12",
"chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX", "chrY" ]
# How many central bins to average over?
num_agg_central_bins: 3
# Enformer will be run 3 times, once centered on the TSS, and +-<shift> from the TSS
shift: 43
# How many bins to save from each run. A smaller number will save space, since
# we only care about the central bins (close to the TSS)
num_output_bins: 21
tissue_mapper_pkl: "{SNAKEMAKE_DIR}/resources/enformer/tissue_mapper.pkl"
tracks_yml: "{SNAKEMAKE_DIR}/resources/enformer/tracks.yaml"
# Whether to download the reference or generate them locally
download_reference: true

models:
abexp_v1.0:
model: "{SNAKEMAKE_DIR}/resources/models/abexp_v1.0/model.joblib"
Expand Down
2 changes: 2 additions & 0 deletions download_urls.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ splicemap:
psi3: 'https://zenodo.org/record/6408906/files/{tissue}_splicemap_psi3_method%3Dkn_event_filter%3Dmedian_cutoff.csv.gz?download=1'
psi5: 'https://zenodo.org/record/6408906/files/{tissue}_splicemap_psi5_method%3Dkn_event_filter%3Dmedian_cutoff.csv.gz?download=1'

# todo: add the correct URL
enformer_reference_tar: "/data/nasif12/home_if12/tsi/downloads/enformer_ref.tar"
22 changes: 22 additions & 0 deletions envs/abexp-enformer-gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: abexp-enformer-gpu
channels:
- conda-forge
- bioconda
dependencies:
- python>=3.11
- attrs<=21.4.0 # because of kipoi
- cyvcf2>=0.30.28
- numpy>=1.26.4
- pandas>=2.2.1
- pyarrow>=16.1.0
- pyranges>=0.0.129
- polars>=0.20.19
- scikit-learn>=1.4.1.post1
- lightgbm>=4.3.0
- zarr>=2.15.0
- xarray>=2024.5.0
- tensorflow-hub==0.16.1
- tensorflow==2.16.1
- tensorflow-gpu==2.16.1
- pip:
- "git+https://github.com/gtsitsiridis/kipoi_veff_analysis.git@aparent2#egg=kipoi-enformer&subdirectory=pkgs/kipoi_enformer"
21 changes: 21 additions & 0 deletions envs/abexp-enformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: abexp-enformer
channels:
- conda-forge
- bioconda
dependencies:
- python>=3.11
- attrs<=21.4.0 # because of kipoi
- cyvcf2>=0.30.28
- numpy>=1.26.4
- pandas>=2.2.1
- pyarrow>=16.1.0
- pyranges>=0.0.129
- polars>=0.20.19
- scikit-learn>=1.4.1.post1
- lightgbm>=4.3.0
- zarr>=2.15.0
- xarray>=2024.5.0
- tensorflow-hub==0.16.1
- tensorflow==2.16.1
- pip:
- "git+https://github.com/gtsitsiridis/kipoi_veff_analysis.git@aparent2#egg=kipoi-enformer&subdirectory=pkgs/kipoi_enformer"
Binary file added resources/enformer/tissue_mapper.pkl
Binary file not shown.
Loading