Generation: PreTrainedModel
no longer inherits GenerationMixin
🚨 🚨
#33150
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Step 2 of #32685 - Removes the
GenerationMixin
inheritance fromPreTrainedModel
. Instead, models classes with generative capabilities directly inheritGenerationMixin
.Why?
Currently, we have a circular dependency between
PreTrainedModel
andGenerationMixin
:PreTrainedModel
👈GenerationMixin
:PreTrainedModel
has acan_generate()
method, which depends on methods that exist inGenerationMixin
. Depending on the value ofcan_generate()
, it may hold aGenerationConfig
object.GenerationMixin
👈PreTrainedModel
:GenerationMixin
needed to inspect the type of the model instance, to throw informative exceptions at the user. This was needed because ALL our models could callgenerate
, but most of them didn't support it.This PR breaks this circular dependency:
GenerationMixin
becomes a stand-alone class with no dependencies onPreTrainedModel
. It is now a proper mixin: it may be used with other model base classes, if users desire to do so.PreTrainedModel
doesn't inheritGenerationMixin
. This means that non-generative models will become less bloated :)What else can we improve as a result of this change?
can_generate()
can be simplified: if a model is a subclass ofGenerationMixin
then it can generategenerate
-- allGenerationMixin
subclasses can callgenerate
prepare_inputs_for_generation
into the generation mixin 🧹 #32685 become much simpler to implement (can_generate()
no longer depends onprepare_inputs_for_generation
-> easier to make structural changes there) 🤗GenerationConfig
instance toGenerationMixin
, so that non-generative models don't hold ageneration_config
attribute.🚨🚨 Caveats 🚨🚨
The changes in this PR have no visible consequences in the following cases:
✅ A user loads a
transformers
model, likeLlamaForCausalLM
✅ A user loads custom modeling code from the hub with our auto classes, like this example
However, there are breaking changes in the following situations:
❌ A user has custom code, inheriting
PreTrainedModel
, and wants to callgenerate