Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't reproduce the results #8

Closed
andorxornot opened this issue Aug 24, 2022 · 21 comments
Closed

can't reproduce the results #8

andorxornot opened this issue Aug 24, 2022 · 21 comments

Comments

@andorxornot
Copy link

andorxornot commented Aug 24, 2022

hi! i trained ldm with three images and the token "container":



training takes lasted a few hours, the loss jumps, but i got exactly the same result as without training:

the config is loaded correctly. are there any logs besides the loss?

@rinongal
Copy link
Owner

What text are you using for inference?
Unless you changed the config, the placeholder word for your concept is *, so your sentences should be of the form: "a photo of *" (and not "a photo of a container")

@andorxornot
Copy link
Author

yeah, i used a photo of * prompt , but got the container

@rinongal
Copy link
Owner

Can you please:
(1) Post your full inference command?
(2) Check your logs folder images to see if the samples_scaled_gs images look like your input data?

@andorxornot
Copy link
Author

andorxornot commented Aug 24, 2022

hm, \logs\images...\testtube\version_0\media is empty for me, there are no images

train :

main.py
--data_root ./images
--base ./configs/latent-diffusion/txt2img-1p4B-finetune.yaml
-t
-n run_01
--actual_resume ./models/ldm/text2img-large/model.ckpt
--init_word container
--gpus 0

inference:

scripts/txt2img.py
--ddim_eta 0.0
--n_samples 3
--n_iter 2
--scale 10.0
--ddim_steps 50
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings.pt
--ckpt ./models/ldm/text2img-large/model.ckpt
--prompt "a photo of *"

@rinongal
Copy link
Owner

The images should be in your ./logs/images2022-08-23T21-03-11_run_01/images/ directory.
Either way, when you run txt2img, try to run with:
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings_gs-5xxx.pt where 5xxx is whatever checkpoint you have there which is closest to 5k.

@johndpope
Copy link

johndpope commented Aug 24, 2022

fyi - I've got it working and I'm very impressed - I'm interested to know how to boost quality / dimensions of output.. have to dig into docs.
HOW TO

I train all his photos as "cinematic"
python main.py --base configs/latent-diffusion/txt2img-1p4B-finetune.yaml -t --actual_resume ../stable-diffusion/models/ldm/text2img-large/model.ckpt -n leavanny_attempt_one --gpus 0, --data_root "/home/jp/Downloads/ImageAssistant_Batch_Image_Downloader/www.google.com/gregory_crewdson_-_Google_Search" --init_word=cinematic
(I gave up at 10,000 training iterations.)

I can then prime it with

 photo of * 
 pixelart or * 
 watercolor of * 

python scripts/txt2img.py --ddim_eta 0.0 \
                          --n_samples 8 \
                          --n_iter 2 \
                          --scale 10.0 \
                          --ddim_steps 50 \
                          --embedding_path /home/jp/Documents/gitWorkspace/textual_inversion/logs/gregory_crewdson_-_Google_Search2022-08-24T23-09-43_leavanny_attempt_one/checkpoints/embeddings_gs-9999.pt \
                          --ckpt_path ../stable-diffusion/models/ldm/text2img-large/model.ckpt \
                          --prompt "pixelart of *"

a-photo-of-*
pixelart-of-*
watercolor-of-*

@rinongal
Copy link
Owner

@johndpope Glad to see some positive results 😄
Regarding quality / dimensions: I'm still working on the Stable Diffusion port which will probably help with that. At the moment inversion is working fairly well, but I'm having some trouble finding a 'sweet spot' where editing (by reusing * in new prompts) works as expected. It might require moving beyond just parameter changes.

As a temporary alternative, you should be able to just invert these results into the stable diffusion model and let it come up with new variations at a higher resolution (using just 'a photo of *').

@XavierXiao
Copy link

Hi, when I train the embedding and run the generation command, I can obtain samples that shares some high-level similarity with my training inputs, however, they still look quite different in details (far less similar than the demo images in paper). Given that the reconstruction is perfect, is there a way to control the variation and let the generated samples look more similar to the inputs? Thanks!

@rinongal
Copy link
Owner

@XavierXiao First of all, just to make sure, you're using the LDM version, yes?

If that's the case, then you have several options:

  1. Re-invert with a higher learning rate (e.g. edit the learning rate in the config to 1.0e-2). The higher the learning rate, the higher the image similarity after editing, but more prompts will fail to change the image at all.
  2. Try to re-invert with another seed (using the --seed argument). Unfortunately sometimes the optimization just falls into a bad spot.
  3. Try the same prompt engineering tricks you'd try with text. For example, use the placeholder several times ("a photo of * on the beach. A * on the beach").

Other than that, you'll see in our paper that we report that the results are typically 'best of 16'. There are certainly cases where only 3-4 images out of a batch of 16 were 'good'. And of course like with all txt2img models, some prompts just don't work.

If you can show me some examples, I could maybe point you towards specific solutions.

@XavierXiao
Copy link

XavierXiao commented Aug 24, 2022

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

inputs_gs-005000_e-000050_b-000000

samples_gs-005000_e-000050_b-000000

@rinongal
Copy link
Owner

rinongal commented Aug 24, 2022

The one on the right is more or less what I'd expect to get. If you're still having bad results during training, then seed changes etc. probably won't help. Either increase LR, or have a look at the output images and see if there's still progress, in which case you can probably just train for more time.

I'll try a run myself and see what I can get.

@rinongal
Copy link
Owner

@andorxornot This is what I get with your data:

Training outputs (@5k):

samples_scaled_gs-005000_e-000131_b-000022

Watercolor painting of *:

watercolor-painting-of-%2A

A photo of * on the beach:

a-photo-of-%2A-on-the-beach

@rinongal
Copy link
Owner

@XavierXiao I cropped out and trained on these 2 samples from your image:
Picture1 Picture2

Current outputs @4k steps with default parameters:

samples_scaled_gs-004000_e-000160_b-000000

If you're using the default parameters but only 1 GPU, the difference might be because the LDM training script automatically scales LR by your number of GPUs and the batch size. Your effective LR is half of mine, which might be causing the difference. Can you try training with double the LR and letting me know if that improves things? If so, I might need to disable this scaling by default / add a warning to the readme.

@XavierXiao
Copy link

Thanks for the reply. I am using two GPUs so that shouldn't be an issue. I tried larger LR but it is hard to say whether I obtain improvements. I can obtain similar results as yours. Obviously the resulting images are less realistic than the trash container examples in the same thread, so maybe the input images are less familiar for the LDM model.

Some maybe unrelated things:

  1. I got the following warning after every epoch, is that expected?
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  1. In the first epoch I got the following warning
home/.conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 20. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

I use the default config with bs=4, and I have 8 training images. Not sure what caused this.

  1. In one over-night run, it seems like if we don't manually kill the process, it will run 1000 epochs, which is the max of pytorch lighting. So the max_step = 6100 is not effective?

@rinongal
Copy link
Owner

@XavierXiao Warnings should both be fine.
max_step: It should be working. I'll look into it.

@andorxornot
Copy link
Author

thanks for your tests! it seems that for one machine i had to raise the lr ten times

@rinongal
Copy link
Owner

@andorxornot Well, if everything's working now, feel free to close the issue 😄 Otherwise let me know if you need more help

@XodrocSO
Copy link

XodrocSO commented Aug 25, 2022

@rinongal I think I'm having a similar issue but not familiar with the format of the learning rate in order to increase it.

EDIT: noticed I'm getting "RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
rank_zero_warn(" from pytorch lightning lr_monitor.py

Also this is trying to use the stable diffusion v1_finetune.yaml and my samples_scaled all just look like noise at and well after 5000 global steps. Loss is pretty much staying at =1 or 0.99

I'll create a new issue if need be.

@rinongal
Copy link
Owner

@XodrocSO I think it might be worth a new issue, but when you open it could you please:

  1. Check the input and reconstruction images in your log directory to see that they look fine.
  2. Paste the config file you're using and let me know if you're using the official repo or some re-implementation and whether you changed anything else.
  3. Upload an example of your current samples_scaled results.

Hopefully that will be enough to get started on figuring out the problem :)

@Zxlan
Copy link

Zxlan commented May 23, 2023

@andorxornot Would it be convenient for you to share your images?

@yuhbai
Copy link

yuhbai commented Oct 4, 2023

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

inputs_gs-005000_e-000050_b-000000

samples_gs-005000_e-000050_b-000000
What is the effect of the image you generated later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants