Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation: PreTrainedModel no longer inherits GenerationMixin 🚨 🚨 #33150

Closed
wants to merge 6 commits into from

Conversation

gante
Copy link
Member

@gante gante commented Aug 27, 2024

What does this PR do?

Step 2 of #32685 - Removes the GenerationMixin inheritance from PreTrainedModel. Instead, models classes with generative capabilities directly inherit GenerationMixin.

Why?

Currently, we have a circular dependency between PreTrainedModel and GenerationMixin:

  • PreTrainedModel 👈 GenerationMixin: PreTrainedModel has a can_generate() method, which depends on methods that exist in GenerationMixin. Depending on the value of can_generate(), it may hold a GenerationConfig object.
  • GenerationMixin 👈 PreTrainedModel: GenerationMixin needed to inspect the type of the model instance, to throw informative exceptions at the user. This was needed because ALL our models could call generate, but most of them didn't support it.

This PR breaks this circular dependency:

  1. GenerationMixin becomes a stand-alone class with no dependencies on PreTrainedModel. It is now a proper mixin: it may be used with other model base classes, if users desire to do so.
  2. PreTrainedModel doesn't inherit GenerationMixin. This means that non-generative models will become less bloated :)

What else can we improve as a result of this change?

  1. [added in this PR] can_generate() can be simplified: if a model is a subclass of GenerationMixin then it can generate
  2. [added in this PR] no need to validate the model class in generate -- all GenerationMixin subclasses can call generate
  3. Because of 1., all the changes planned in tracker: move prepare_inputs_for_generation into the generation mixin 🧹  #32685 become much simpler to implement (can_generate() no longer depends on prepare_inputs_for_generation -> easier to make structural changes there) 🤗
  4. Perhaps in the future, we can move the GenerationConfig instance to GenerationMixin, so that non-generative models don't hold a generation_config attribute.

🚨🚨 Caveats 🚨🚨

The changes in this PR have no visible consequences in the following cases:
✅ A user loads a transformers model, like LlamaForCausalLM
✅ A user loads custom modeling code from the hub with our auto classes, like this example

However, there are breaking changes in the following situations:
❌ A user has custom code, inheriting PreTrainedModel, and wants to call generate

@gante gante changed the title Generation: deprecate GenerationMixin inherited by PreTrainedModel Generation: PreTrainedModel no longer inherits GenerationMixin Aug 28, 2024
@gante gante changed the title Generation: PreTrainedModel no longer inherits GenerationMixin Generation: PreTrainedModel no longer inherits GenerationMixin 🚨 🚨 Aug 28, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante
Copy link
Member Author

gante commented Aug 29, 2024

hold up, found a way to make it fully BC 💛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants