Skip to content

Latest commit

 

History

History
62 lines (42 loc) · 2.35 KB

File metadata and controls

62 lines (42 loc) · 2.35 KB

Machine learning for systems serology data

Workshop session #1 of the "Systems Biology of Infectious Diseases" workshop (Feb 24-26, 2020)

Goal

We will introduce a general machine learning workflow to deal with system serology datasets.

Getting started

Before the workshop, please install the required programs and packages so that we can directly get started. In particular:

  1. Install R and RStudio. If you use Mac OS, please follow the instructions found here to install also XQuartz and Xcode.

  2. Install the following packages:

  • readxl
  • ggpubr
  • corrr
  • ropls
  • glmnet
  • DMwR
  • pheatmap
  • ggplot2
  • ggrepel
  • RColorBrewer
  • igraph
  • ggraph
  • tidyverse
install.packages(c("readxl", "ggpubr", "corrr", "glmnet", "DMwR", "pheatmap", "ggplot2", 
                   "ggrepel", "RColorBrewer", "igraph", "ggraph", "tidyverse"))

The package ropls needs to be installed from Bioconductor using:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ropls")

If you are not or only little experienced with R, consider looking at general R workshops before, for example the ones that can be found here: https://github.com/nuitrcs/rworkshops. However, the hands-on exercises are offered in different versions to have a version for every experience level.

Dataset

The data for this session is taken from the following publication:

Lu al., IFN-γ-independent immune markers of Mycobacterium tuberculosis exposure, Nature Medicine (2019)

To import the data and get a overview of the data, run the Notebook part 1.

Workflow

The basic workflow for machine learning systems serology data includes feature selection using LASSO (Least Absolute Shrinkage and Selection Operator), followed by PLS-DA (partial least square discriminant analysis). There are different version of the exercises for different programming skills:

The solution notebook provides code for the whole workflow including more detailed explanations of the results.