Pretrain a Transformer on language modeling.

Minimal implementation of a Transformer model and a training script for language modeling in PyTorch.
Supports multi-GPU training via Distributed Data Parallel (DDP).

Usage

Single GPU/CPU:

  python train.py --config=config/config.yaml

Multiple GPUs:

  torchrun --nnodes=1 --nproc_per_node=4 train.py --config=code/config/sweep.yaml

Run a sweep:

Define Hyperparameters: Create a single YAML file with lists of hyperparameter values. Each value in the list will represent a different configuration, e.g.:
```
lr: [0.1, 0.01]
wd: [0.1, 0.2, 0.5]
...
```
Submit the Sweep: Use job_idx to specify which configuration file to use. The job_idx should range from 0 to n-1, where n is the number of configurations in the YAML. This is done automatically by condor.sub. Python takes care of assigning the corresponding configuration to each job based on the job_idx.

TODO:

data loading
- improve readibility
- add seed to DistributedSampler
test macOS metal support
add LinearCooldown compatible with WarmupConstant

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cluster/condor		cluster/condor
config		config
data		data
engine		engine
models		models
optim		optim
.gitignore		.gitignore
README.md		README.md
checkpoint_utils.py		checkpoint_utils.py
torch_utils.py		torch_utils.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pretrain a Transformer on language modeling.

Usage

Single GPU/CPU:

Multiple GPUs:

Run a sweep:

TODO:

About

Releases

Packages

Languages

Niccolo-Ajroldi/llm_pretrain

Folders and files

Latest commit

History

Repository files navigation

Pretrain a Transformer on language modeling.

Usage

Single GPU/CPU:

Multiple GPUs:

Run a sweep:

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages