Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: generate Pydantic v2 + typing complete code #16624

Merged
merged 32 commits into from
Sep 28, 2023

Conversation

multani
Copy link
Contributor

@multani multani commented Sep 19, 2023

was: python: Generate fully described Python types

Warning

Sorry, it's a rather large change, but I think it all helps to produce better type information.

I'm looking for feedback to know if this is the good way to integrate this kind of change, and if this has a chance of being merged.

If needed, I can try to make this big change shorter by splitting this into multiple PRs; it may be harder to see the big picture with smaller PRs though.


The goal of this change is to replace the mypy incompatible types (conlist, conint, constr, etc.) by their more modern versions using Annotated types with Pydantic v2.

To do so, the code generator changes in several ways:

  • The generator introduces several new inner classes to help the code generation:
    • PythonType is a recursive object that hold the information about a particular type that the generator is trying to build
    • PydanticType, directly translated from the previous getPydanticType methods, creates PythonType instances from the OpenAPI specifications and produces the final Python types as strings.
    • An initial Imports class holds all the new Python's import statements that the generator produces along the way.

Via these new classes, the code generation gets improved by the following:

  • The string-based formater approach is now replaced by a tree of PythonType objects:
    • Each object can hold all its information: type constraints, metadata, etc.
    • By being a tree of itself, types can become naturally nested and allow to easily express complex types
  • Most of the code duplication between the 2 getPydanticType methods have been unified to remove duplication:
    • Each OpenAPI type definition has a matching OpenAPI type -> Python type translation method
    • Common type translations between CodegenParamerer and CodegenProperty have been unified
  • The new Imports class can be passed between calls to accumulate all the imports:
    • This removes several flags that were present only to track if something needed to be imported or not.
    • An instance of the class can be shared across multiple method calls to accumulate all the imports that have to be done
    • It reduces the amount of objects to carry around as it can hold any kind of imports.
    • Ultimately, it also allows to control better which Python resources are imported from which modules, and should help to reduce the number of unused imports in the generated code.

References I used to implement the new types:

Related:

Note that this improves only a subset of the code generator, but there are still several places that could be improved:

  • I think once generalized, usage of the Imports class instead of the numerous Set instances could help to track all the import dependencies, reduce the generated code (and unused imports) and the code generator itself.
  • A whole part of the code generator has not been touched and could benefit from the same kind of improvements.

PR checklist

  • Read the contribution guidelines.
  • Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
  • Run the following to build the project and update samples:
    ./mvnw clean package 
    ./bin/generate-samples.sh ./bin/configs/*.yaml
    ./bin/utils/export_docs_generators.sh
    
    Commit all changed files.
    This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master.
    These must match the expectations made by your contribution.
    You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*.
    For Windows users, please run the script in Git BASH.
  • File the PR against the correct branch: master (upcoming 7.1.0 minor release - breaking changes with fallbacks), 8.0.x (breaking changes without fallbacks)
  • If your PR is targeting a particular programming language, @mention the technical committee members, so they are more likely to review the pull request.

@krjakbrjak as Python tech-comittee
@wing328 as the previous major editor of the code generator

@multani multani marked this pull request as ready for review September 21, 2023 20:03
@multani
Copy link
Contributor Author

multani commented Sep 21, 2023

I removed the poetry.lock files I added during my experiment, and it divided the size of the pull request by 2 :D

@wing328
Copy link
Member

wing328 commented Sep 22, 2023

cc @OpenAPITools/generator-core-team

Copy link
Contributor

@robertschweizer robertschweizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR! I checked some of the generated classes.

@@ -30,11 +32,11 @@ class Color(BaseModel):
RGB array, RGBA array, or hex string.
"""
# data type: List[int]
oneof_schema_1_validator: Optional[conlist(conint(strict=True, le=255, ge=0), max_items=3, min_items=3)] = Field(None, description="RGB three element array with values 0-255.")
oneof_schema_1_validator: Optional[Annotated[List[Annotated[int, Field(le=255, strict=True, ge=0)]], Field(min_items=3, max_items=3)]] = Field(default=None, description="RGB three element array with values 0-255.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get an error if I construct this with oneof_schema_1_validator=[1, 2]. Maybe pydantic is thrown off by Field() being used on both sides of =? Also pydantic might not take into account Field() restrictions in the first argument of Annotated[].

Best add a unit-test to make sure validation works, or is there one already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more tests: this works with Pydantic v2 but not with Pydantic v1. We'll focus on the former anyway and I'll merge my PR in a Pydantic v2-only generator 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks a lot for moving forward to pydantic v2! Then things work of course :)

@multani
Copy link
Contributor Author

multani commented Sep 22, 2023

Thanks to the merge of #16643, I have plenty of test failures now :)

I will turn this back to draft while I'm fixing the comments and the tests 👍

@multani multani marked this pull request as draft September 22, 2023 10:33
@multani
Copy link
Contributor Author

multani commented Sep 25, 2023

  • I completely ignored the warnings for deprecated APIs so there are several Pydantic APIs that still need to be changed: the validator decorators, min_length vs instead of min_items, etc.

This is still not done, but the code seems to be working and this could be done in a 2nd step.

There are a couple of things that still don't work:

Circular dependencies

The "circular dependency" test (CircularReferenceModel imports FirstRef, which imports SecondRef, which imports CircularReferenceModel) in petstore_api/models/circular_reference_model.py totally doesn't work 🤦 This seems to be more a Python/Pydantic issue though, see:

* [Models referencing each other from separate files pydantic/pydantic#707](https://github.com/pydantic/pydantic/issues/707)

* [How to avoid circular imports when using models (defined in different modules) which refers to each other? pydantic/pydantic#1873](https://github.com/pydantic/pydantic/issues/1873)

To be honest, I don't know how it even worked before :)

This is solvable, if we bundle all the circular dependencies in the same module, and then use Pydantic model_rebuild function, see docs.pydantic.dev/2.3/usage/postponed_annotations/#cyclic-references

I "fixed" this by not calling Model.model_rebuild() in the code. For now, the typing won't fully work for these dependent modules.
I added a TODO: pydantic v2 in the code to re-enable this.

float

It took me a while to figure this one, but format_test.py contains something like this:

from pydantic import BaseModel

class FormatTest(BaseModel):
    number: float
    float: Optional[float] = None

Then, running this:

print(FormatTest(number=3.2))

fails with:

pydantic_core._pydantic_core.ValidationError: 1 validation error for FormatTest
number
  Input should be None [type=none_required, input_value=3.2, input_type=float]
    For further information visit https://errors.pydantic.dev/2.3/v/none_required

It took me a while to understand, but the float type gets aliases to None :)

I'm not sure how to fix this at the moment, I commented the related code. Note that simply loading a model which contains such a definition fails all the tests.

I "fixed" this by commenting the float inside the OpenAPI YAML file :)

This is fixable, but in the general case, it's a bit hard:

  • Everytime the generator generates a type (or subtype) for a model property:
    • We need to check if there's another model property with the same name as the type we are trying to generate:
      • If no, there it's all good
      • If a model property exist with the same name as the type (or subtype), we need to:
        • Create an alias for the type, for instance Float = float
        • The alias must also not be another model property
        • Then substitute the actual type with the alias
        • Generate the list of all the aliases before generating the model

It's not impossible, but a bit of work. I deferred that to later, there are TODO: pydantic v2 in the code.

actual_instance

There are still some test failures related to actual_instance, for instance in test_model.py.

This is probably fixable but I don't understand exactly the purpose of the code here and the tests are a bit weird (from_dict("a")?)

Other failures

There are a few other failures, but I think they should be solvable.

This is fixed now. I was using Pydantic's .model_dump_json(), but it didn't play well with how objects are currently composed.

I think the generated could be simplified and improved by reusing more Pydantic methods instead of implementing our own.

This could also be improved later on.

About typing itself

I also had a look at the output of mypy in the samples echo_client, and I think mypy should be not too far to validate almost all the code once we finish the migration (removing the Pydantic v2 warnings, etc.)

I'm not too familiar with all the features of the generated code, but it feels there are also opportunities to simplify the generated code by not generating part of it, or by reusing what Pydantic offers builtin now (__properties list attributes, custom implementation of to_json(), to_dict() and related from_*() methods, etc.)

I think this can be done in a second step.

I think this could be looked at again once we have fixed all (most of) the warnings for Pydantic v2. There are still a lot of typing errors, but more than 60% should go away once the migration has progressed further.

@multani multani marked this pull request as ready for review September 25, 2023 21:19
if (cp.hasValidation) {
List<String> fieldCustomization = new ArrayList<>();
List<String> intFieldCustomization = new ArrayList<>();
public String toEnumVariableName(String name, String datatype) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from old version, line 1615

return pt.getType(cp);
}

public void setMapNumberTo(String mapNumberTo) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is form old version, line 1605

LOGGER.warn("Codegen property is null (e.g. map/dict of undefined type). Default to typing.Any.");
typingImports.add("Any");
return "Any";
void createImportMapOfSet(String modelName, Map<String, CodegenModel> codegenModelMap) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from old version, line 1660

Comment on lines 305 to 307
elif isinstance(obj, set):
return {self.sanitize_for_serialization(sub_obj)
for sub_obj in obj}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not useful for now. Annotated[List, Field(unique_items=True)] should be replaced by Annotated[Set], but this has further implications (sets lose the ordering, not serializable as JSON automatically, etc.)

Suggested change
elif isinstance(obj, set):
return {self.sanitize_for_serialization(sub_obj)
for sub_obj in obj}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I wanted to remove this block of code, as it's unused right now.
There's a comment in the related part of the Java code:

        private PythonType arrayType(IJsonSchemaValidationProperties cp) {
            PythonType pt = new PythonType();
            if (cp.getMaxItems() != null) {
                pt.constrain("max_items", cp.getMaxItems());
            }
            if (cp.getMinItems()!= null) {
                pt.constrain("min_items", cp.getMinItems());
            }
            if (cp.getUniqueItems()) {
                // A unique "array" is a set
                // TODO: pydantic v2: Pydantic suggest to convert this to a set, but this has some implications:
                // https://github.com/pydantic/pydantic-core/issues/296
                // Also, having a set instead of list creates complications:
                // random JSON serialization order, unable to easily serialize
                // to JSON, etc.                                                                                                                                                                                                                                                
                //pt.setType("Set");
                //typingImports.add("Set");
                pt.setType("List");
                typingImports.add("List");
            } else {
                pt.setType("List");
                typingImports.add("List");
            }
            pt.addTypeParam(getType(cp.getItems()));
            return pt;
        }

Comment on lines 2143 to 2145
if (cp.baseName != null && !cp.baseName.equals(cp.name)) { // base name not the same as name
pt.annotate("serialization_alias", cp.baseName);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be alias, but in Pydantic v2, alias is both a validator and a serializer.

From the tests, I assume alias was used for serialization only (to produce the aliased field), not for validation (we still want to be able to read the original Pythonic field)

Note that in Pydantic v1, both forms (the field name and the alias name) could be used to create a new instance of the model. This is not possible anymore, only the field name is accepted.

@@ -7,9 +7,6 @@ import io
import warnings

from pydantic import validate_arguments, ValidationError
{{#asyncio}}
from typing import overload, Optional, Union, Awaitable
{{/asyncio}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these lines are removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 reasons:

  • These imports are not asyncio-specific
  • The imported names are not used in the template

The names could be used by the code generated by the Java code, in which case the imports would be provided from the Java code instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@@ -1009,19 +997,27 @@ private ModelsMap postProcessModelsMap(ModelsMap objs) {
model.getVendorExtensions().putIfAbsent("x-py-readonly", readOnlyFields);

// import models one by one
if (!modelImports.isEmpty()) {
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed the check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, I will add it back.

I need to change the place where x-py-model-imports (there could be resources to imports even if there no dependencies between models), I will move it outside the if block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-added the condition and moved the otherImports.exports() imports out of the block.

@wing328 wing328 merged commit 04fa53b into OpenAPITools:master Sep 28, 2023
46 checks passed
@multani multani deleted the python-typing-1 branch September 30, 2023 07:23
AlanCitrix pushed a commit to AlanCitrix/openapi-generator that referenced this pull request Oct 26, 2023
* python: improve type generation with more specific typing

* Annotate function parameters

* Remove unused imports

* remove unused files

* remove temporary hack

* remove lock file

* fix Annotated import

* support Python 3.7

* Regenerate code with typing-extensions

* Fix setup.py

* More Pydantic v2 compatibility

* depend on pydantic v2

* fix client_echo tests

* fix JSON serialization

* Fix references

* Skip circular dependency tests for now

* Temporarily hide the "float" property

The "float" property aliases the "float" type and completely breaks the
model: all the properties that were "float" now become the type of the
"float" property instead.

* Fix errors

* Import Literal from typing_extensions

* Fix GitHub Action workflows

* Fix Python 3.7 failure

* Fix quotes

* Apply suggestions from code review

* Fix tests

* split model imports from other modules imports

* fix workflow

* Comment the array unique items convertion, remove set translation

* Replace alias usage
@@ -24,16 +28,16 @@ class {{classname}}({{#parent}}{{{.}}}{{/parent}}{{^parent}}BaseModel{{/parent}}
{{vendorExtensions.x-py-name}}: {{{vendorExtensions.x-py-typing}}}
{{/composedSchemas.anyOf}}
if TYPE_CHECKING:
actual_instance: Union[{{#anyOf}}{{{.}}}{{^-last}}, {{/-last}}{{/anyOf}}]
actual_instance: Optional[Union[{{#anyOf}}{{{.}}}{{^-last}}, {{/-last}}{{/anyOf}}]] = None
Copy link

@b0g3r b0g3r Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@multani Hi Jonathan! Do you remember, by any chance, why did you introduce Optional here? Is it possible to have None value here even with one_of validator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey 👋 Sorry, I don't remember the details of this particular change :/ Does it create a problem for you?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately. Before this changes non-optional anyOf had correct typing (Union[A, B]), but now it has additional None in the union, which isn't right and stops me from using proper typing. Should I try to change it back and create PR for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course, feel free to create a pull request!

I just checked again, and there's also a is None check in the to_dict method at the end of the file, it's probably why I added the Optional type in the first place. If you remove Optional, can you also remove that check?

multani added a commit to multani/openapi-generator that referenced this pull request Dec 31, 2023
In OpenAPITools#16624, I introduced a new mechanism to record imports to other
modules, instead of having specialized datetime/typing/pydantic objects
to manage imports for these modules.

This change reuses the mechanism from OpenAPITools#16624 and replace the specialized
import managers by the generic one. Unused imports from various
.mustache templates are also cleaned up.
@multani multani mentioned this pull request Dec 31, 2023
5 tasks
wing328 pushed a commit that referenced this pull request Jan 3, 2024
In #16624, I introduced a new mechanism to record imports to other
modules, instead of having specialized datetime/typing/pydantic objects
to manage imports for these modules.

This change reuses the mechanism from #16624 and replace the specialized
import managers by the generic one. Unused imports from various
.mustache templates are also cleaned up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants