Skip to content
This repository has been archived by the owner on Jan 12, 2018. It is now read-only.

Lab materials for "Targeted Learning in Biomedical Big Data" (Fall 2016, UC Berkeley)

License

Notifications You must be signed in to change notification settings

vanderLaan-Group/tlbbd-fa2016

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PB HTLH 295: Big Data Seminar material

Theme: Targeted Learning in the era of Big Data

This course is aimed at providing both theoretical and practical tools for analyzing big data generated by modern biomedical studies. It could therefore be of interest to Ph.D. students in quantitative fields who are interested in learning about recent theoretical developments in the area, as well as MA students interested in learning practical skills for analyzing big data.

We will discuss problems arising from traditional data analysis in biomedical big data settings and study the targeted learning roadmap for causal inference as a solution to these problems. We will cover fundamental topics in causal inference including causal models, defining causal quantities that represent the answer to scientific questions of interest, and casual assumptions under which the causal quantity can be identified from the observed data. Specific examples of questions of interest that will be covered include precision medicine, stochastic interventions, and time-to-event outcomes. We will discuss practical tools for estimating causal quantities using state-of-the-art machine learning techniques including the SuperLearner and h20ensemble R packages. We discuss how such techniques can be used to construct asymptotically efficient estimators through one-step estimation and targeted minimum loss-based estimation (TMLE). We discuss how these estimators facilitate construction of scalable confidence intervals and statistical hypothesis tests. Finally, we discuss recent extensions of super learning and TMLE to the online estimation setting, thereby providing statistical estimation and inference for arbitrarily large data sets.

Schedule

  • Week 1: No Lab
  • Week 2: Intro to R
  • Week 3: Make-up Lecture 2
  • Week 4: Guest speaker 1 (Oleg Sofrygin) -- simcausal
  • Week 5: Bias-Variance Tradeoff
  • Week 6: SuperLearner: Part I
  • Week 7: SuperLearner: Part II
  • Week 8: Review
  • Week 9: h20ensemble (Chris Kennedy) + SuperLearner: Part III (David)
  • Week 10: Git + Amazon EC2
  • Week 11: Guest speaker 2 (BRC) -- Savio & parallelization
  • Week 12: TMLE
  • Week 13: No Lab (Thanksgiving)

License

© 2016 Mark J. van der Laan, David C Benkeser, Wilson Cai

This repository is licensed under the MIT license. See LICENSE for details.

About

Lab materials for "Targeted Learning in Biomedical Big Data" (Fall 2016, UC Berkeley)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%