Skip to content

Release MS-AMP v0.4.0

Latest
Compare
Choose a tag to compare
@tocean tocean released this 26 Feb 10:34
· 2 commits to main since this release
51f34ac

MS-AMP Improvements

  • Improve GPT-3 performance by optimizing the FP8-gradient accumulation with kernel fusion technology
  • Support FP8 in FSDP
  • Support DeepSpeed+TE+MSAMP and add cifar10 example
  • Support MSAMP+TE+DDP
  • Update DeepSpeed to latest version
  • Update TransformerEngin to V1.1 and flash-attn to latest version
  • Support CUDA 12.2
  • Fix several bugs in DeepSpeed integration

MS-AMP-Examples Improvements

  • Improve document for data processing in GPT3
  • Add launch script for pretraining GPT-6b7
  • Use new API of TransformerEngine in Megatron-LM

Document Improvements

  • Add docker usage in Installation page
  • Tell customer how to run FSDP and DeepSpeed+TE+MSAMP example in "Run Examples" page