Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the image a man rides a horse? #2

Open
hujunchao opened this issue Jun 19, 2023 · 4 comments
Open

How to get the image a man rides a horse? #2

hujunchao opened this issue Jun 19, 2023 · 4 comments

Comments

@hujunchao
Copy link

hujunchao commented Jun 19, 2023

I try this project. It's amzing and interesting.
But now, I meet a question. It's hard for me to get a good image by the text "a man rides a horse".
Can you give me some advice?
Thank you!

@hujunchao hujunchao changed the title How to get the image a man ride a horse? How to get the image a man rides a horse? Jun 19, 2023
@TonyLianLong
Copy link
Owner

TonyLianLong commented Jun 19, 2023

Some initial attempts (you can improve by trying more options and seeds)

image

image

image

You may wonder why the man's face is weird. This is a known artifact of stable diffusion on small objects that is out of our scope to fix. Generating a man with a larger proportion of face to image may help.

@hujunchao
Copy link
Author

Thank you for your reply!

@hujunchao
Copy link
Author

When two objects do not interact, it is easy to use layout to get perfect image. But when two objects interact, it may be hard to use layout to get good image. How to show the action between objects? For example, a man and a horse may be easy. A man rides a horse may be difficult. A man is chasing a horse may be more difficult.

@TonyLianLong
Copy link
Owner

TonyLianLong commented Jun 20, 2023

Good question! This is why the space allows specifying a prompt for overall generation. Without it, you use a default prompt and don't get object interaction (SD will try to guess the object interaction, so it could also guess a man standing close to a horse on the specified location). With it, you get the object interaction (e.g., a man riding the horse, then SD knows the man is supposed to ride the horse, as shown in the generation above).

image

However, adding more fine-grained control to object interactions is a very useful future direction. This paper specifies the idea of "text->intermediate representation->image". You are encouraged to extend to more representations (e.g., scene graph or LLM-generated SVG that captures more information).

Examples:
Same config, overall prompt: A man standing nearby a horse (I didn't play around the hyperparam)
image

Same config, overall prompt: A man riding a horse
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants