Skip to content
This repository has been archived by the owner on Aug 6, 2024. It is now read-only.

Thanks for the shout out <3 Also I was wondering if you could help me out with something #1

Open
Extraltodeus opened this issue Aug 14, 2023 · 29 comments

Comments

@Extraltodeus
Copy link

Extraltodeus commented Aug 14, 2023

That was a pleasant surprise! :D

I used GPT4 to help me out about upscaling latents through lanczos this afternoon.
You can find it here.

Now of course the results that I am getting are as blurry as you can imagine and so just as viable as any other method.

01688UI_00001_
00362UI_00001_

You can find a workflow here

I was visiting your repository, wondering if there would be a way to somehow rearrange the latent into an image so to be able to do a proper lanczos upscale and then "put it back" into a latent. Since you know a lot more than me about the subject, I figured I would just ask :)

There wouldn't even be a need to modify the datas to make them RGB, just to be able to use lanczos on them with the provided functions within my node and reshape it to the correct latent format.

This combined with a fractal noise generator would allow to upscale mid-generation and new levels of creativity. As for now my only option is to decode->upscale image with lanczos (I made two small nodes using PIL that you can find in my linked repository if you want)->reencode with the VAE and finish the work with the refiner. While this is not a completely bad method it is also definitely not fast. Given that my knowledge about how the VAE works is pretty low, I also suspect that doing this mid-generation might also reduce the quality or complexity. It is also pretty limited overall.

Here is a Perlin Merlin Rabbit who used the wrong spell:

00370UI_00001_

@city96
Copy link
Owner

city96 commented Aug 14, 2023

Figured you'd appreciate being linked to, didn't really feel right naming my repo "advanced noise" when it isn't really, well, advanced.

I was visiting your repository, wondering if there would be a way to somehow rearrange the latent into an image so to be able to do a proper lanczos upscale and then "put it back" into a latent.
Since you know a lot more than me about the subject, I figured I would just ask :)

Fair, though I don't actually have a background in any of this AI stuff so I'd still double-check anything I say with GPT4 if I were you lol.

Anyway, to address your question, I think the main problem with upscaling latents is that not all features map 1:1 properly. Especially around edges/borders/fine details, it seems to be non-linear, at least to some degree. This part makes sense right, how else would it fit a 3x512x512 image into 4x64x64.

Another way to think of it is like this: If you grab a brush in photoshop and set it to 4px wide, then it will be the same width at any resolution, both 512x512 and 1024x1024. Now, if you upscale your 512x512 drawing to 1024x1024 and start drawing with your 4px brush, your lines will no longer line up to the old ones. The "VAE" expects all "lines" to look the same, regardless of resolution.

Here's an example. This image is perfectly split in the middle and the res is divisible by 8, meaning even when it gets upscaled, it should retain that "split" in the middle. It works for black and white (mostly), but add any colors and suddenly the weird behavior appears again. I think this might be because the blue channel is negative in some of the latent channels, meaning to make the image "more" blue you have to make the actual numbers in the latent space smaller, but then again, no clue, I haven't messed with this enough to know and the SDXL latent space is completely different to begin with.

COLOR_VS_GRAY

There wouldn't even be a need to modify the datas to make them RGB, just to be able to use lanczos on them with the provided functions within my node and reshape it to the correct latent format.

I know even less about image manipulation algorithms, but can't you directly run lanczos on your 4D vector if you're not turning it into an image? I doubt lanczos would mess up the details inside the latent space any less than bicubic does but who knows, might still be worth a try. Unless you mean fixing the issue with the details getting muddled, in which case I have zero idea. I guess you could get some content-aware algo going but that seems... painful.

This combined with a fractal noise generator would allow to upscale mid-generation and new levels of creativity.
I also suspect that doing this mid-generation might also reduce the quality or complexity.

Now, the problem here is that any time you force it back into the image, the "leftover noise" gets lost (I think). If I understand it correctly, your repo solves that by generating the same noise but at a different resolution, which is pretty genius. Your problem is the actual image part, which you'd have to somehow beat into the right shape to go into the second sampler.

My idea/solution for something like this would be to train a neural network for it. Probably something like a modified ESRGAN that works on latents instead of images, trained from scratch. This would (probably) preserve enough of the noise + the details to not ruin your image, and would also avoid the VAE encode/decode stuff.

Not sure how viable this is and it's not like I have an A100 to train on (all I have is 2xP40s and a crappy 10GB 3080 lol) but I can give it a shot if you want.

Anyway, I hope my rambling helps you at least somewhat. If it doesn't just ask me again, I'm horrible with explanations but happy to help lmao

@city96
Copy link
Owner

city96 commented Aug 15, 2023

Actually, wait, it doesn't even need ESRGAN. I can literally get a working upscaler with a 2MB model lol.

		module_list = [
			nn.Conv2d(4, 64, kernel_size=5, padding=2),
			nn.ReLU(),
			nn.Upsample(scale_factor=2.0),
			nn.ReLU(),
			nn.Conv2d(64, 64, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(64, 64, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(64, 32, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(32, 4, kernel_size=5, padding=2),
		]

LATENT_UPSCALER

@Extraltodeus
Copy link
Author

I think this might be because the blue channel is negative in some of the latent channels, meaning to make the image "more" blue you have to make the actual numbers in the latent space smaller, but then again, no clue, I haven't messed with this enough to know and the SDXL latent space is completely different to begin with.

That might explain why I had blue dots on my images during my attempts at creating usable noise. I was only able to solve that issue by "compressing towards zero" the values within the latent.

Example:

image

I don't actually have a background in any of this AI stuff

Me neither as you might have guessed lol!

Anyway, I hope my rambling helps you at least somewhat. If it doesn't just ask me again, I'm horrible with explanations but happy to help lmao

You helped me understand the problem better indeed and I thank you for you detailed answer! :)

Actually, wait, it doesn't even need ESRGAN. I can literally get a working upscaler with a 2MB model lol.

		module_list = [
			nn.Conv2d(4, 64, kernel_size=5, padding=2),
			nn.ReLU(),
			nn.Upsample(scale_factor=2.0),
			nn.ReLU(),
			nn.Conv2d(64, 64, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(64, 64, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(64, 32, kernel_size=7, padding=3),
			nn.ReLU(),
			nn.Conv2d(32, 4, kernel_size=5, padding=2),
		]

LATENT_UPSCALER

WAIT WHAT?! How do you do that? I have yet to know how to do such things.

@city96
Copy link
Owner

city96 commented Aug 15, 2023

WAIT WHAT?! How do you do that? I have yet to know how to do such things.

I just changed the training code from my latent interposer to take v1 latents on both sides then "designed" and trained a small neural net (the code part) that scales it up by a fixed amount. I'll clean up the code a bit and cook up some models overnight.

Should have your HQ latent upscaler by tomorrow ;D

(Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?)

@Extraltodeus
Copy link
Author

Extraltodeus commented Aug 15, 2023

That would be so awesome! Thank you!

(Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?)

Big max x4 if you can without bothering but these ratios seems to be pretty good already! And it is always possible to try to loop back to get bigger sizes.

My perlin latent noise generator is quite limited regarding the ratios anyway too.

@city96
Copy link
Owner

city96 commented Aug 15, 2023

And it is always possible to try to loop back to get bigger sizes.

4x is pushing it, my training code/dataset is probably just awful. Though you do make a good point, chaining two of the 2x ones should work as a stopgap for the madman who wants to directly 4x his latents lol.

@Extraltodeus
Copy link
Author

Yeah no then don't bother, I would rather loop through anyway!

@city96
Copy link
Owner

city96 commented Aug 15, 2023

Greetings. I finished the models/repo. It's available here: https://github.com/city96/SD-Latent-Upscaler

I have models for x1.25, x1.5 and x2.0, for both XL and v1.5. I teseted chaining two of the 2x ones for x4 and it works just fine. LMK if it works for your usecase.

LATENT_UPSCALER_COLOR

@Extraltodeus
Copy link
Author

I was frenetically refreshing the page since yesterday lol thank you so much!

@city96
Copy link
Owner

city96 commented Aug 15, 2023

No problem. These models are still relatively undertrained but from my testing they seems to work OK. Could've trained them longer but I just set them to a fixed epoch count then left to go to work hoping they'd be finished by the time I get back. Thankfully they were lol.

SD XL behaves a bit weird, but then again, it always does.

@Extraltodeus
Copy link
Author

Well it is still a better way to upscale the latents that anything that has been made so far so congrats!

@Extraltodeus
Copy link
Author

Extraltodeus commented Aug 15, 2023

I am currently trying to get results by combining it with the perlin-based noise generator but I wonder if I am not pushing it too far by multiplying the layers 6 times. While the overall pattern is still matching, I am getting results that so far are blurry. My generator is surely hard to set up correctly. I was able earlier to get, without upscale, correct results with 3 layers (I mean the "noise_iteration" value) after adding an option to make the mean value for each produced perlin layer to be at 0 instead of substracting it at the end. But trying to understand what "kind of mess" should be passed to the refiner after an upscale is finicky.

@city96
Copy link
Owner

city96 commented Aug 15, 2023

If you send me a sample workflow I can check it out. I'm currently trying to further finetune the upscaler. It looks like I'll be able to fix the odd hue shift stuff at least.

@Extraltodeus
Copy link
Author

Extraltodeus commented Aug 15, 2023

00288UI_00001_

It is more about keeping the in-between mess I guess. In order to provide something for the refiner to be able to work with.

But then again, my noise generator at 6 iterations might just be too "smooth" to be helpful for such upscale.

@city96
Copy link
Owner

city96 commented Aug 15, 2023

I think your main problem is that your first advanced K-sampler doesn't return the leftover noise. I don't think you can avoid doing that, even if you inject your own perlin noise? (The step count might also be messed up, I just converted those back to widgets for testing.)

Not sure if it's possible to properly scale the actual leftover noise from the sampler. My crappy model certainly struggles to do it.

You can re-inject your perlin noise, as long as you don't scale it:

house

@Extraltodeus
Copy link
Author

Yeah so far my best attempt was this one :
01116gimgimg_00001_

01007gimgimg_00001_

On this one I accidentally "rewinded" after the upscale without reinjecting noise but since I was using euler a I'd say that it doesn't count :-/

Using your latent interposer to do xl->1.5 -> upscale -> 1.5->xl made it better but so far my results are either blurry or having some sort of scanlines.

@city96
Copy link
Owner

city96 commented Aug 16, 2023

Using your latent interposer to do xl->1.5 -> upscale -> 1.5->xl made it better but so far my results are either blurry or having some sort of scanlines.

How the hell does this even work lmao. I would've expected it to destroy the image completely.

CHAINED

I have a quick question. Is your perlin noise generator specific to SDXL? I was testing on v1 and noticed it was outputting garbage most of the time.

Also, I got this semi-coherent example by upscaling the noise separately and slightly overlapping the two samplers, like mentioned in the "BlenderNeko/ComfyUI_Noise" readme/example, though this is on v1 with the BNK noisy latent image node.

Anyway, I'm out of time for today.

ComfyUI_temp_mkmpc_00056_

@Extraltodeus
Copy link
Author

Good call! Rewinding 10 steps works.
Not only it works but doing a x2 by generating a new and bigger layer of perlin noise (just as intended :D) actually enhances it to a freaky level!

00011UI_00001_

You could almost read the plate number of the car on the left lol

@Extraltodeus
Copy link
Author

To try the opposit I tested to upscale regular noise and re-used the same settings as for the house above and:
00025UI_00001_

@Extraltodeus
Copy link
Author

Extraltodeus commented Aug 16, 2023

To not fool myself I decided to inject higher scale usual noise after the upscale for an honnest comparison.

So here are the results of a usual noise use as if we never tried anything differently:

00057UI_00001_
00050UI_00001_
00051UI_00001_
00052UI_00001_
00054UI_00001_
00055UI_00001_
00056UI_00001_

It's just as smooth as usual SD1.5 even tho the model is good, the details seems to be smudged.

Now a batch of fractals based, using your latent upscaler! :D

00074UI_00001_
00076UI_00001_
00077UI_00001_
00082UI_00001_
00085UI_00001_

I think that we can say that it works.

@Extraltodeus
Copy link
Author

@Extraltodeus
Copy link
Author

00194UI_00001_

@city96
Copy link
Owner

city96 commented Aug 17, 2023

Just a quick heads-up, I re-did all the models on the upscaler. Completely new model arch and training code so image quality is a lot better now. A simple git pull should get you up to speed.

LATENT_UPSCALER_V2

@ntdviet
Copy link

ntdviet commented Aug 18, 2023

@city96 The update also handles leftover noise quite well. Very clean, just the hue got changed a bit (but that's not a big deal for post processing).
Left is original perlin noise after XL sampling & before upscale, top right is the default bislerp upscale, bottom right is yours. Yours is much much better, great job!
ảnh

@city96
Copy link
Owner

city96 commented Aug 18, 2023

@ntdviet Sorry, I saw you started the proper discussion but didn't have the time to post it there as well.

I'm glad it works, it seems a lot more flexible now but I didn't really test it with the leftover noise stuff. It's kinda amazing that it works, especially on XL (the v1 model still seems to outperform the xl model I think.).

@suede299
Copy link

suede299 commented Oct 9, 2023

WAIT WHAT?! How do you do that? I have yet to know how to do such things.

I just changed the training code from my latent interposer to take v1 latents on both sides then "designed" and trained a small neural net (the code part) that scales it up by a fixed amount. I'll clean up the code a bit and cook up some models overnight.

Should have your HQ latent upscaler by tomorrow ;D

(Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?)

When using LatentUpscaler, a ratio of 1.25-2 is sufficient.
What affects its wider use is that "the res is divisible by 8" causes the width and height pixels not to be aligned when the common image specification is upscale.
I often need to enlarge a 1920x1080 image for processing, and then I need to restore the image to 1920x1080, which is currently only possible with x2, but after Size*2, the image is too large, and even after tiling, it's a pain in the ass to process it with the SDXL model.
When you train the latent Upscaler model, choose the scale that can perfectly align the pixels with the common image specifications, which will greatly improve the practicability.

Translated with www.DeepL.com

@ntdviet
Copy link

ntdviet commented Oct 10, 2023

WAIT WHAT?! How do you do that? I have yet to know how to do such things.

I just changed the training code from my latent interposer to take v1 latents on both sides then "designed" and trained a small neural net (the code part) that scales it up by a fixed amount. I'll clean up the code a bit and cook up some models overnight.
Should have your HQ latent upscaler by tomorrow ;D
(Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?)

When using LatentUpscaler, a ratio of 1.25-2 is sufficient. What affects its wider use is that "the res is divisible by 8" causes the width and height pixels not to be aligned when the common image specification is upscale. I often need to enlarge a 1920x1080 image for processing, and then I need to restore the image to 1920x1080, which is currently only possible with x2, but after Size*2, the image is too large, and even after tiling, it's a pain in the ass to process it with the SDXL model. When you train the latent Upscaler model, choose the scale that can perfectly align the pixels with the common image specifications, which will greatly improve the practicability.

Translated with www.DeepL.com

Nx1080 is an odd resolution. We start with 512, 768 or 1024 for SDXL so if
you want it to be performant, keep it in scale with these figures and crop
your pic down to 1080 later.

@suede299
Copy link

WAIT WHAT?! How do you do that? I have yet to know how to do such things.

I just changed the training code from my latent interposer to take v1 latents on both sides then "designed" and trained a small neural net (the code part) that scales it up by a fixed amount. I'll clean up the code a bit and cook up some models overnight.
Should have your HQ latent upscaler by tomorrow ;D
(Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?)

When using LatentUpscaler, a ratio of 1.25-2 is sufficient. What affects its wider use is that "the res is divisible by 8" causes the width and height pixels not to be aligned when the common image specification is upscale. I often need to enlarge a 1920x1080 image for processing, and then I need to restore the image to 1920x1080, which is currently only possible with x2, but after Size*2, the image is too large, and even after tiling, it's a pain in the ass to process it with the SDXL model. When you train the latent Upscaler model, choose the scale that can perfectly align the pixels with the common image specifications, which will greatly improve the practicability.
Translated with www.DeepL.com

Nx1080 is an odd resolution. We start with 512, 768 or 1024 for SDXL so if you want it to be performant, keep it in scale with these figures and crop your pic down to 1080 later.

If the scaling is something like 1.2 / 1.4, then a 1080 image is perfectly fine.

@city96
Copy link
Owner

city96 commented Oct 10, 2023

@suede299
If I understand correctly, your main problem is that after the upscale the image doesn't align properly to 1920x1080 (e.g. downscaling would make it blurry)? The only thing I could really do is train a latent downscaler you can put after the x2 one but that's not really much of a solution...

The problem with resolutions like 1.2 and 1.4 is that they don't align with my training dataset properly. My max resolution is 1024x1024 for the x2 one, and my input is 512x512. From that, x1.25 is 640x640 and x1.5 is 768x768. x1.2 would be 614.4x614.4 and x1.4 would come out at 716.8x716.8.

You could try out NNLatentUpscale which can scale by arbitrary amounts. It's probably better than my solution anyway, since it used visual loss during training instead of just estimating loss across the latents.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants