Minimal implementation of a Transformer model and a training script for language modeling in PyTorch.
Supports multi-GPU training via Distributed Data Parallel (DDP).
python train.py --config=config/config.yaml
torchrun --nnodes=1 --nproc_per_node=4 train.py --config=code/config/sweep.yaml
- Define Hyperparameters:
Create a single YAML file with lists of hyperparameter values. Each value in the list will represent a different configuration, e.g.:
lr: [0.1, 0.01] wd: [0.1, 0.2, 0.5] ...
- Submit the Sweep:
Use
job_idx
to specify which configuration file to use. Thejob_idx
should range from0
ton-1
, wheren
is the number of configurations in the YAML. This is done automatically bycondor.sub
. Python takes care of assigning the corresponding configuration to each job based on thejob_idx
.
- data loading
- improve readibility
- add seed to
DistributedSampler
- test macOS metal support
- add
LinearCooldown
compatible withWarmupConstant