Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracker: move prepare_inputs_for_generation into the generation mixin 🧹 #32685

Open
1 of 8 tasks
gante opened this issue Aug 14, 2024 · 2 comments
Open
1 of 8 tasks
Assignees
Labels
Generation WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@gante
Copy link
Member

gante commented Aug 14, 2024

🧹 This is a tracker regarding the move of prepare_inputs_for_generation into the generation mixin 🧹

Why?

  1. prepare_inputs_for_generation is not part of the core modeling, but rather a utility for generate
  2. it should greatly reduce the need to touch modeling code, on generate changes. Fewer modeling changes -> improved model stability
  3. greatly reduced number of lines of code 🙏

Tracker

Ordered list of tasks:

  • Fix related slow tests before we start — all llama, generate, and cache_utils [except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (Llama: make slow tests green 🟢  #33138)
  • PreTrainedModel doesn't inherit from GenerationMixin, so that can_generate() becomes independent of prepare_inputs_for_generation being overwritten or not (Generation: deprecate PreTrainedModel inheriting from GenerationMixin  #33203 )
  • Move llama’s prepare_inputs_for_generation to the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there)
  • Add tests for the generalist prepare_inputs_for_generation — currently we don’t test it directly, and we should
  • Address the case of synced_gpus in generate: when synced_gpus and cache_positions is out of bounds, take the latest available input_ids for dummy computations (see Fix synced GPUs #33252; should fix Multi GPU generate with llama shape error #32885, Shape mismatch when generating with multiple processes #32603, and Bugfix for generation with an early-stopping process #32641)
  • Delete prepare_inputs_for_generation from as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possible
  • Change prepare_inputs_for_generation to forward **kwargs from its input to its output. With minimal changes, this should enable most VLMs to use the shared function (they forward pixel_values from the input to the output)
  • By this point most cases of prepare_inputs_for_generation should be removed 🤗 We would need to check the others individually, there may be further simplification patterns available!
@gante gante self-assigned this Aug 14, 2024
@gante
Copy link
Member Author

gante commented Aug 14, 2024

@ydshieh edit the tracker above as soon as you start working on a task, so we don't risk doing redundant work 🤗 (e.g. with the link to a draft PR)

I'll do the same!

@ydshieh ydshieh self-assigned this Aug 14, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented Aug 14, 2024

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Generation WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

2 participants