Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a layer with Docker/docker-compose #633

Merged
merged 27 commits into from
Apr 7, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
2c8e535
creating a layer with Docker/docker-compose
loeken Mar 29, 2023
c0f3347
using nvida image in second stage aswell to provide required libraries
loeken Mar 29, 2023
c67594f
Merge branch 'main' of https://github.com/oobabooga/text-generation-w…
loeken Mar 31, 2023
67a81da
Merge branch 'main' of https://github.com/oobabooga/text-generation-w…
loeken Mar 31, 2023
cf8196b
GPTQ switch to cuda branch, minor update to nvidia/cuda:11.8.0-devel-…
loeken Mar 31, 2023
1797fd5
docs for ubuntu 22.04/manjaro installation of dependencies
loeken Apr 1, 2023
b4886a2
Merge branch 'dockerize' of github.com:loeken/text-generation-webui i…
loeken Apr 1, 2023
d83a10c
unified arguments WEBUI_VERSION and GPTQ_VERSION
loeken Apr 1, 2023
6f05f2e
didnt save file
loeken Apr 1, 2023
1fc2dca
changes suggested by deece to allow running version with uncommited c…
loeken Apr 1, 2023
ecd5538
Merge branch 'main' of github.com:oobabooga/text-generation-webui int…
loeken Apr 1, 2023
657ce70
updated version of gptq, linked in links to models used in testing
loeken Apr 1, 2023
4551df7
webui version line to not fail if no WEBUI_VERSION provided
loeken Apr 2, 2023
fb49dbe
Merge branch 'main' of github.com:oobabooga/text-generation-webui int…
loeken Apr 3, 2023
0ba16a8
replaced devel with runtime for final stage, removed env vars as alre…
loeken Apr 4, 2023
df48ddb
added comment to point users with old cards to using an older GPTQ ve…
loeken Apr 4, 2023
50ba320
added venv to Dockerfile to avoid error failing for transfomers, rela…
loeken Apr 4, 2023
9571be8
Update Dockerfile
loeken Apr 4, 2023
e8ed319
Update Dockerfile
loeken Apr 4, 2023
7d0286b
Update Dockerfile
loeken Apr 4, 2023
de45b5c
updating pip prior to running pip installs
loeken Apr 4, 2023
9a5e278
tested 8bit, added examples for 8bit model download/cli args to start
loeken Apr 4, 2023
7d9728b
added .env and dockerfile to .dockerignore
loeken Apr 6, 2023
43fe224
Merge branch 'main' into loeken-dockerize
oobabooga Apr 7, 2023
4806703
Switch to oobabooga/GPTQ-for-LLaMa
oobabooga Apr 7, 2023
be7b3b7
Add vim to the requirements
oobabooga Apr 7, 2023
6b479cd
Add files to .dockerignore
oobabooga Apr 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/loras
/models
26 changes: 26 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# by default the Dockerfile specifies these versions: 3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX
# however for me to work i had to specify the exact version for my card ( 2060 ) it was 7.5
# https://developer.nvidia.com/cuda-gpus you can find the version for your card here
TORCH_CUDA_ARCH_LIST=7.5

# these commands worked for me with roughly 4.5GB of vram
CLI_ARGS=--model llama-7b-4bit --wbits 4 --listen --auto-devices
# example running 13b with 4bit/128 groupsize : CLI_ARGS=--model llama-13b-4bit-128g --wbits 4 --listen --groupsize 128 --pre_layer 25
# example with loading api extension and public share: CLI_ARGS=--model llama-7b-4bit --wbits 4 --listen --auto-devices --no-stream --extensions api --share

# the port the webui binds to on the host
HOST_PORT=7860
# the port the webui binds to inside the container
CONTAINER_PORT=7860

# the port the api binds to on the host
HOST_API_PORT=5000
# the port the api binds to inside the container
CONTAINER_API_PORT=5000

# the version used to install GPTQ from, defaults to cuda
GPTQ_VERSION=608f3ba71e40596c75f8864d73506eaf57323c6e
# older cards such as the k80 might have more luck with this GTPQ_VERSION=841feedde876785bc8022ca48fd9c3ff626587e2 https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/88#issuecomment-1485897212

# the version used to install text-generation-webui from
WEBUI_VERSION=HEAD
62 changes: 62 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 as builder

RUN apt-get update && \
apt-get install --no-install-recommends -y git build-essential python3-dev python3-pip && \
loeken marked this conversation as resolved.
Show resolved Hide resolved
rm -rf /var/lib/apt/lists/*

RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv

loeken marked this conversation as resolved.
Show resolved Hide resolved
RUN git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa /build

WORKDIR /build

ARG GPTQ_VERSION
RUN git checkout ${GPTQ_VERSION}

RUN virtualenv /build/venv
loeken marked this conversation as resolved.
Show resolved Hide resolved
RUN . /build/venv/bin/activate && \
pip3 install torch torchvision torchaudio && \
pip3 install -r requirements.txt

# https://developer.nvidia.com/cuda-gpus
# for a rtx 2060: ARG TORCH_CUDA_ARCH_LIST="7.5"
ARG TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
RUN . /build/venv/bin/activate && \
python3 setup_cuda.py bdist_wheel -d .

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

LABEL maintainer="Your Name <your.email@example.com>"
LABEL description="Docker image for GPTQ-for-LLaMa and Text Generation WebUI"

RUN apt-get update && \
apt-get install --no-install-recommends -y git python3 python3-pip && \
rm -rf /var/lib/apt/lists/*

RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv

COPY . /app/

WORKDIR /app

ARG WEBUI_VERSION
RUN test -n "${WEBUI_VERSION}" && git reset --hard ${WEBUI_VERSION} || echo "Using provided webui source"

RUN virtualenv /app/venv
RUN . /app/venv/bin/activate && \
pip3 install torch torchvision torchaudio && \
pip3 install -r requirements.txt

COPY --from=builder /build /app/repositories/GPTQ-for-LLaMa
RUN . /app/venv/bin/activate && \
pip3 install /app/repositories/GPTQ-for-LLaMa/*.whl

ENV CLI_ARGS=""

RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate cd extensions/api && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate cd extensions/elevenlabs_tts && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate cd extensions/google_translate && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate cd extensions/silero_tts && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate cd extensions/whisper_stt && pip3 install -r requirements.txt

CMD . /app/venv/bin/activate && python3 server.py ${CLI_ARGS}
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ As an alternative to the recommended WSL method, you can install the web UI nati

### Alternative: Docker

https://github.com/oobabooga/text-generation-webui/issues/174, https://github.com/oobabooga/text-generation-webui/issues/87
[docker/docker-compose instructions](docs/README_docker.md)

## Downloading models

Expand Down
32 changes: 32 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
version: "3.3"
services:
text-generation-webui:
build:
context: .
args:
# specify which cuda version your card supports: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST}
GPTQ_VERSION: ${GPTQ_VERSION}
WEBUI_VERSION: ${WEBUI_VERSION}
env_file: .env
ports:
- "${HOST_PORT}:${CONTAINER_PORT}"
- "${HOST_API_PORT}:${CONTAINER_API_PORT}"
stdin_open: true
tty: true
volumes:
- ./characters:/app/characters
- ./extensions:/app/extensions
- ./loras:/app/loras
- ./models:/app/models
- ./presets:/app/presets
- ./prompts:/app/prompts
- ./softprompts:/app/softprompts
- ./training:/app/training
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
97 changes: 97 additions & 0 deletions docs/README_docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
- [Linux](#linux)
- [Ubuntu 22.04](#ubuntu-2204)
- [update the drivers](#update-the-drivers)
- [reboot](#reboot)
- [docker \& container toolkit](#docker--container-toolkit)
- [Manjaro](#manjaro)
- [update the drivers](#update-the-drivers-1)
- [reboot](#reboot-1)
- [docker \& container toolkit](#docker--container-toolkit-1)
- [prepare environment \& startup](#prepare-environment--startup)
- [place models in models folder](#place-models-in-models-folder)
- [prepare .env file](#prepare-env-file)
- [startup docker container](#startup-docker-container)
- [Windows](#windows)
# Linux

## Ubuntu 22.04

### update the drivers
in the the “software updater” update drivers to the last version of the prop driver.

### reboot
to switch using to new driver

```bash
sudo apt update
sudo apt-get install curl

sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-compose -y

sudo usermod -aG docker $USER
newgrp docker
```

### docker & container toolkit
```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64 /" | \
sudo tee /etc/apt/sources.list.d/nvidia.list > /dev/null

sudo apt update

sudo apt install nvidia-docker2 -y
sudo systemctl restart docker
```

## Manjaro

### update the drivers
```bash
sudo mhwd -a pci nonfree 0300
```
### reboot
```bash
reboot
```
### docker & container toolkit
```bash
yay -S docker docker-compose buildkit gcc nvidia-docker
sudo usermod -aG docker $USER
newgrp docker
sudo systemctl restart docker # required by nvidia-container-runtime
```

## prepare environment & startup

### place models in models folder
download and place the models inside the models folder. tested with:

https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617
https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105

### prepare .env file
edit .env values to your needs
```bash
cp .env.example .env
nano .env
```

### startup docker container
```bash
docker-compose up --build
```


# Windows
coming soon