Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new paper: Dreambooth-StableDiffusion, Google Imagen based Textual Inversion alternative #914

Closed
underlines opened this issue Sep 23, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@underlines
Copy link

Is your feature request related to a problem? Please describe.
Only word embeddings are optimized in the current Textual Inversion implementation. But Dreambooth fine tunes the diffusion model as a whole. That's revolutionary.

Describe the solution you'd like
Add https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

Describe alternatives you've considered

Additional context

The training images are obtained from the issue in the Textual Inversion repository, and they are 3 images of a large trash container. Regularization images are generated by prompt photo of a container. Regularization images are shown here:

After training, generated images with prompt photo of a sks container:

https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/main/assets/a-container-0038.jpg

Generated images with prompt photo of a red sks container:

https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/main/assets/a-red-sks-container-0021.jpg

Generated images with prompt a dog on top of sks container:

https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/main/assets/a-dog-on-top-of-sks-container-0023.jpg

@bmaltais
Copy link

bmaltais commented Sep 23, 2022

I think we can already consume the resulting ckpt... but to create them you need to use the project you linked to. Not sure if it is worth integrating the creation part in this project when there is a dedicated repo for that.

It also require a GPU with 35GB or VRAM or more... so most regular folks don't have access to that. So pretty much a non starter. https://github.com/JoePenna/Dreambooth-Stable-Diffusion/

@grexzen
Copy link

grexzen commented Sep 23, 2022

Yeah no point for this gen UI, but for re training that is an awesome find.

OP needs to create a comparison though between textual inversion and this to see if there are real advantages across many prompts and image sets.

@ExponentialML
Copy link

Yeah no point for this gen UI, but for re training that is an awesome find.

OP needs to create a comparison though between textual inversion and this to see if there are real advantages across many prompts and image sets.

The comparison between TI and Dreambooth are a pretty sizable difference with the latter having a major advantage.
There are a lot of examples here.

Also, there's really no need to implement Dreambooth in this. It finetunes the entire model, meaning you simply just replace the default model with the trained one afterwards. There are no embeddings to use here.

@jd-3d
Copy link

jd-3d commented Sep 25, 2022

I would love to see the training part of the implementation put into the webui. There is a new memory tweak that just came out that allows the training to run on 24GB of VRAM which really opens things up to a lot of people:

See here for setting it up with the memory optimizations:
https://github.com/gammagec/Dreambooth-SD-optimized
and here for more info:
https://www.reddit.com/r/StableDiffusion/comments/xmkwmp/i_got_dreambooth_for_sd_to_work_on_my_3090_w24_gb/

Another example of how well it works:
https://www.reddit.com/r/StableDiffusion/comments/xn1jln/i_used_googles_dreambooth_to_finetune_the/

@underlines
Copy link
Author

underlines commented Sep 26, 2022

Nice explanation of the paper, showing what's possible:

https://www.youtube.com/watch?v=NnoTWZ9qgYg&t=5s

@LiJT
Copy link

LiJT commented Sep 29, 2022

Please!!!!!!!!

@jd-3d
Copy link

jd-3d commented Oct 2, 2022

It's down to 10GB VRAM requirements now. This would be an amazing feature. More info here:
https://www.reddit.com/r/StableDiffusion/comments/xtc25y/dreambooth_stable_diffusion_training_in_10_gb/

@bmaltais
Copy link

bmaltais commented Oct 11, 2022 via email

@d8ahazard
Copy link
Collaborator

How much VRAM does this version require? Get Outlook for iOShttps://aka.ms/o0ukef

________________________________ From: d8ahazard @.> Sent: Tuesday, October 4, 2022 2:52:56 PM To: AUTOMATIC1111/stable-diffusion-webui @.> Cc: bmaltais @.>; Comment @.> Subject: Re: [AUTOMATIC1111/stable-diffusion-webui] Implement new paper: Dreambooth-StableDiffusion, Google Imagen based Textual Inversion alternative (Issue #914) Added a q&d port of the "Optimized-Dreambooth-SD" repo's version for training checkpoints via #1655<#1655>. Still needs to be implemented and added to the UI, but the basic bit to do the work should be there. — Reply to this email directly, view it on GitHub<#914 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZA34RHEC5W2ONAWCNWRFDWBR4IRANCNFSM6AAAAAAQT3OC3U. You are receiving this because you commented.Message ID: @.***>

#2002

The PR here can be run with 8GB using the --medvram flag on launch, but it's VERY slow ATM. Testing with WSL and DeepSpeed to see if I can't make it faster.

@mezotaken
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants