fix gpt2 train loss Nan problem by add a line __syncthreads in BlockR… #33659
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
OPs
Describe
背景:
gpt2 训练过程中出现loss不稳定、不收敛、最终变成nan的情况。
经排查:
1)在P40上训练正常,V100上训练出现异常。
2)添加一行log打印训练正常,无log打印训练异常。
3)使用原线性相加方式训练正常,使用BlockReduceSum训练异常。
最终添加一行__syncthreads后解决训练异常问题。
同时对另外两个BlockReduceSum做了同步修改。
对于shared[32]使用的共享内存大小数据32,来源是:
int wid = threadIdx.x / warpSize;
nvidia gpu blockdim最大1024,warpSize 32,所以改大小不超过maxblockdim/warpsize=32。
cherry-pick from:
#33658