Skip to content
Stella Biderman edited this page Jan 30, 2021 · 4 revisions

Welcome to the gpt-neox wiki!

The purpose of this wiki is to organize information about all the different terminology and ideas floating around in the DeepSpeed papers, how they connect to each other, what benefits they provide, and why we care about them.

To Do List:

Optimizations

  • ZeRO
  • ZeRO Stage 1 vs 2 vs 3
  • Pipeline Parallelism
  • Kernel Optimization

Checkpointing

  • Model Checkpointing
  • Activation Checkpointing

Optimizers

  • Adam
  • 1-Bit Adam

Networking

  • TCP
  • Infiniband
  • PCIE
  • NVLINK
  • MPI
  • NCCL
Clone this wiki locally