DRPPM - EASY - Integration

Introduction

This is an extention of the DRPPM Expression Analysis ShinY (EASY) App for further integration of result files and matrices comparison with differential gene expression and gene set enrichment analysis. Here we allow for the comparison of data sets such as from single sample GSEA and expression matrices. The comparison of expression data is one of the main features we elaborate on in this app. The user may upload two expression matrices, in our example we uploaded mRNA and Protein expression data, with their corresponding meta data. Through the functionality of the DRPPM-EASY-Integration app we are abloe to compare log fold change of the expression and generate gene sets of the significantly up regulated and down regulated genes. These gene set were further used for GSEA and ssGSEA on their opposing matrix to illustrate any similarity in gene regulation. Additionally, these gene set are further compared with GMT files of published gene sets from the Molecular Signatures Database (MSigDB) with corresponding statiistal calculations to rank their similarity, such as Fishers Exact Test, Cohen's Kappa, and the Jaccard Index. Below you may see a flow chart of the DRPPM-EASY pipeline, where this Integration app represents segment C.

Installation

Download ZIP file from https://github.com/shawlab-moffitt/DRPPM-EASY-Integration
Unzip and load into directory as a project in R Studio
Open the ‘App.R’ script and write in user input files and options as directed at the top of the script
- ‘App.R’ script begins with example files available on the front page and within the ExampleData Folder
Press ‘Run App’ button in R Studio to run in application or browser window and enjoy!
- The app script will install any missing packages that the user may not have locally

Requirments

R - https://cran.r-project.org/src/base/R-4/
R Studio - https://www.rstudio.com/products/rstudio/download/

R Dependencies


shiny_1.6.0	shinythemes_1.2.0	shinyjqui_0.4.0	shinycssloaders_1.0.0	stringi_1.7.6
dplyr_1.0.7	tidyr_1.1.3	readr_2.0.1	stringr_1.4.0	DT_0.18
ggplot2_3.3.5	plotly_4.9.4.1	enrichplot_1.12.2	ggVennDiagram_1.2.0	ggrepel_0.9.1
rhoR_1.3.0.3	limma_3.48.3	clusterProfiler_4.0.5	limma_3.48.3	GSVA_1.40.1
BiocManager_1.30.16	reshape2_1.4.4	ggpubr_0.4.0

Required Files for User Input

Expression Matrix (.tsv/.txt):
- Must be tab delimited with gene names as symbols located in the first column with subsequent columns consiting of the sample name as the header and expression data down the column.
- The current App expects lowly expressed genes filtered out and normalized data either to FPKM or TMM.
  - Larger files might inflict memory issues for you local computer.
Meta Data (.tsv/.txt):
- Must be tab delimited with two columns. First column of sample names and second column as phenotype grouping of the samples

Required Files for Setup - Provided

MSigDB Gene Set Names:
- These gene set files were gathered from the Molecular Signatures Database (MSigDB) as separate collections and processed through R to generate a master gene set file with catagorical labels to use for GSEA and ssGSEA analysis.
- This is used mainly for the UI for gmt category selection.
MSigDB Gene Set RData List:
- The RData gene set list is a more refined format of the gene set table.
- This is a named list with over 32,000 gene sets from MSigDB paired with the genes they consist of.
- This list is used for the back end analysis.

App Features

Scatter Plot Comparison

The user may upload two files gathered from ssGSEA analysis in the main DRPPM-EASY app to compare
- In theory, the plot gathers values from the third column in both of the files uploaded as long as the first two columns are the same. So as long as the data follows this format a plot should be generated.
  - More adjustments will be made in future edits to ensure compatibility and freedom to use other data sets
The user may choose to log transform either axis
The axis label names may also be adjusted
The table shown below the figure may be downloaded for future use and analysis and even used in the following tab for the correlation rank plot.

Ranked Feature Correlation

The user must upload a primary feature file which consists of 3 columns, sample name, sample type, and a value and an expression matrix.
- Typically, and as shown in the example data, we use a ssGSEA score file and the primary feature file.
- In future edits more compatibility will be added
When using the ssGSEA score file as the primary feature file the name of the gene sets within the file will show up to choose which one to correlate the expression matrix with.
- The user may use the merge ssGSEA table from the previous scatter plot tab as input.
The user may chose the correlation method which options of Spearman, Pearson, or Kendall, as well as the correlation cutoff values used in coloring the genes on the figure
The table of correlation values below the figure may be download with a file name of your choice.

Matrix Comparison with Reciprical GSEA

Matrix Upload

Assign names to the two matrices the user will be comparing for easier identification
Upload expression matrices
Upload meta data

Fold Change Scatter Plot

The user may choose the sample type to compare the expression fold change of
Select genes to annotate on the plot
When hovering of a point this text box will appear giving information on that gene
The table below the figure may be downloaded for future use

Reciprocal GSEA

The user may select the number of top genes to include in the gene set generation of the matrices and the P-value cutoff for the GSEA calculation
- For example, this figure shows that after performing differential gene expression on both matrices, we are selecting the top 100 up and down regulated genes of each matrices
- These gene sets will be used to perform a GSEA calculation on their opposing gene set with the P-value cutoff designated
Above the enrichment plots displays the normalizxed enrichment scores and P-value for each gene set

Reciprocal Single Sample GSEA

The user may choose the ssGSEA scoring method based off the choices of ssGSEA, GSVA, z-score, or plage

Venn Diagram and Statistical Analyses

The user may select significance cutoffs values for the fold change and P-value when determining the differentially expressed genes in each matrix which is going to be compared to the gene sets
The user may select a category of gene sets based on the MSigDB categories to compare to the differntially expressed genes
The user may also upload their own .gmt file to analyze with their matrices.
- Proper formatting of .gmt files is provided by the Broad Institute and can be found here
The statistics table may be downloaded for future use

Future Enhancments

More compatibility with a wider range of files for intergrative analysis comparison
Mouse-to-human gene name conversion for comparison analysis

Questions and Comments

Please email Alyssa Obermayer at alyssa.obermayer@moffitt.org if you have any further comments or questions in regards to the R Shiny Application.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
App_Demo_Pictures		App_Demo_Pictures
ExampleData		ExampleData
GeneSets		GeneSets
README.md		README.md
app.R		app.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation