fix some bugs for latest TE #160

tocean · 2024-02-21T04:16:39Z

Description
Fix some bugs for latest TE and add UT for it.

In TE, it only allocates fp8 weight for the first micro batch. In MS-AMP, it allocates zero size tensor for fp8 weight because tex.fp8_cast_transpose_fused will allocate memory for it. However, latest TE introduces a data structure Float8Tensor which use _data to store the original fp8 tensor. When comparing shape in set_fp8_weights, we should use the shape of _data. Otherwise, TE will allocate zero-size tensor for non-first micro batch.
Seem that when using latest TE, Megaton-LM can't converge(Test it with GPT-345M). The newest TE which can converge is v1.1, so convert it back to v1.1

fix bug for non first microbatch in te

eadca11

tocean requested review from penghouwen and guoshzhao February 21, 2024 06:12

tocean enabled auto-merge (squash) February 21, 2024 09:02

tocean disabled auto-merge February 22, 2024 03:53

use transformer engine v1.1

1b0d2cf

tocean changed the title ~~fix bug for non first microbatch in TE~~ fix some bugs for latest TE Feb 22, 2024

penghouwen approved these changes Feb 22, 2024

View reviewed changes

tocean merged commit 9ac98df into main Feb 22, 2024
9 checks passed

tocean deleted the yuxiang/te_bugfix branch February 22, 2024 08:25

Provide feedback