- Used the transcription Factor list given in Cacace et al. (2020) in Current Topics in Developmental Biology, figure 3.
- http://www.informatics.jax.org/batch/ to get gene symbols from ENSEMBL IDs.
- Used fgsea for pre-ranked gene sets enrichment analysis.
- Used msigdbr v7.1.1 for database of gene sets.
First, load the database(s) to use from MSigDB:
library(msigdbr); library(data.table)
df_C3_TFTLegacy = msigdbr(species = "Mus musculus", category = "C3", subcategory = "TFT:TFT_Legacy")
df_C3_TFTGTRD = msigdbr(species = "Mus musculus", category = "C3", subcategory = "TFT:GTRD")
df_C2_CPREACTOME = msigdbr(species = "Mus musculus", category = "C2", subcategory = "CP:REACTOME")
df_C5 = msigdbr(species = "Mus musculus", category = "C5")
df_C5_BP = msigdbr(species = "Mus musculus", category = "C5", subcategory = "BP")
#Cleaning
C3_TFTLegacy = df_C3_TFTLegacy %>% split(x = .$gene_symbol, f = .$gs_name)
C3_TFTGTRD = df_C3_TFTGTRD %>% split(x = .$gene_symbol, f = .$gs_name)
C2_CPREACTOME = df_C2_CPREACTOME %>% split(x = .$gene_symbol, f = .$gs_name)
C5_BP = df_C5_BP %>% split(x = .$gene_symbol, f = .$gs_name) #GO
C7 = df_C7 %>% split(x = .$gene_symbol, f = .$gs_name)
rm(df_C3_TFTLegacy); rm(df_C3_TFTGTRD) ; rm(df_C2_CPREACTOME); rm(df_C5); rm (df_C7)
Selects gene above threshold, and change ENSEMBL IDs (liger@W matrix) to gene symbols using using the output from http://www.informatics.jax.org/batch/ (excel table).
W <- liger_obj@W #has to be in gene symbols; can use `merge` function
setClass(Class = "TF_obj",
slots = c(W_thres = "list",
common_TF =