Skip to content

Commit

Permalink
Merge branch 'PaddlePaddle:develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
fightfat committed Sep 18, 2024
2 parents 4a3297f + 0540e97 commit 6b6a2da
Show file tree
Hide file tree
Showing 82 changed files with 3,199 additions and 1,535 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,6 @@ FETCH_HEAD
./ppdiffusers/ppdiffusers/version.py

# third party
csrc/gpu/cutlass_kernels/cutlass
csrc/third_party/
dataset/
output/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ unit-test:

.PHONY: install
install:
pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html
pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
pip install -r requirements-dev.txt
pip install -r requirements.txt
pip install -r paddlenlp/experimental/autonlp/requirements.txt
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@

### <a href=#多硬件训推一体> 🔧 多硬件训推一体 </a>

支持英伟达 GPU、昆仑 XPU、昇腾 NPU、燧原 GCU 和海光 DCU 等多个硬件的大模型训练和推理,套件接口支持硬件快速切换,大幅降低硬件切换研发成本。
支持英伟达 GPU、昆仑 XPU、昇腾 NPU、燧原 GCU 和海光 DCU 等多个硬件的大模型和自然语言理解模型训练和推理,套件接口支持硬件快速切换,大幅降低硬件切换研发成本。
当前支持的自然语言理解模型:[多硬件自然语言理解模型列表](./docs/model_zoo/model_list_multy_device.md)

### <a href=#高效易用的预训练> 🚀 高效易用的预训练 </a>

Expand Down Expand Up @@ -174,13 +175,14 @@ PaddleNLP 提供了方便易用的 Auto API,能够快速的加载模型和 Tok
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
>>> input_features = tokenizer("你好!请自我介绍一下。", return_tensors="pd")
>>> outputs = model.generate(**input_features, max_length=128)
>>> print(tokenizer.batch_decode(outputs[0]))
>>> print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
['我是一个AI语言模型,我可以回答各种问题,包括但不限于:天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗?']
```

### 大模型预训练

```shell
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # 如已clone或下载PaddleNLP可跳过
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idx
Expand All @@ -191,6 +193,7 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_pretrain.py
### 大模型 SFT 精调

```shell
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # 如已clone或下载PaddleNLP可跳过
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gz
cd .. # change folder to PaddleNLP/llm
Expand Down
4 changes: 3 additions & 1 deletion README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,14 @@ PaddleNLP provides a convenient and easy-to-use Auto API, which can quickly load
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
>>> input_features = tokenizer("你好!请自我介绍一下。", return_tensors="pd")
>>> outputs = model.generate(**input_features, max_length=128)
>>> print(tokenizer.batch_decode(outputs[0]))
>>> print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
['我是一个AI语言模型,我可以回答各种问题,包括但不限于:天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗?']
```

### Pre-training for large language model

```shell
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # if cloned or downloaded, can skip this step
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idx
Expand All @@ -110,6 +111,7 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_pretrain.py
### SFT finetuning forlarge language model

```shell
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # if cloned or downloaded, can skip this step
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gz
cd .. # change folder to PaddleNLP/llm
Expand Down
13 changes: 12 additions & 1 deletion csrc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ pip install -r requirements.txt

## 编译 Cuda 算子

生成 FP8的 cutlass 算子(编译耗时较长)
```shell
python generate_code_gemm_fused_kernels.py
```

编译
```shell
python setup_cuda.py install
```
Expand All @@ -20,9 +26,14 @@ python setup_cuda.py install
2. 拉取代码:
git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git

3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`
3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`

4. 重新编译 Cuda 算子
```shell
python setup_cuda.py install
```

### FP8 GEMM 自动调优
```shell
sh tune_fp8_gemm.sh
```
Loading

0 comments on commit 6b6a2da

Please sign in to comment.