-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote protobuf generation scripts in Python #12527
base: main
Are you sure you want to change the base?
Conversation
requirements-tests.txt
Outdated
@@ -13,6 +13,8 @@ ruff==0.5.4 # must match .pre-commit-config.yaml | |||
|
|||
# Libraries used by our various scripts. | |||
aiohttp==3.10.2 | |||
grpcio-tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a minimal version of protoc that is shipped with grpcio-tools that we should be using, but I can't recall off the top of my head and would have to search through past PRs to find what it was.
scripts/sync_protobuf/_helpers.py
Outdated
from http.client import HTTPResponse | ||
from pathlib import Path | ||
from typing import TYPE_CHECKING, Iterable | ||
from urllib.request import urlopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm purposefully avoiding requests
here, as to not add requests
and types-requests
in requirements-tests.txt
def extract_python_version(file_path: Path) -> str: | ||
"""Extract the Python version from https://github.com/protocolbuffers/protobuf/blob/main/version.json""" | ||
with open(file_path) as file: | ||
data: dict[str, dict[str, dict[str, str]]] = json.load(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the type is dict[str, dict[str, str | dict[str, str]]]
so I'm being a bit cheesy here, but I think it's fine over a Any
. This is the type for the tree of key-values we traverse, everything else doesn't matter. A TypedDict felt overkill just to prevent a typo in data.values()
.
# grpc install only fails on Windows, but let's avoid building sdist on other platforms | ||
# https://github.com/grpc/grpc/issues/36201 | ||
grpcio-tools; python_version < "3.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref grpc/grpc#36201 & grpc/grpc#34922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm not really familiar with protobuf or protobuf stubs generation so my comments are limited to general issues.
In general, there are a few places that use /
as path separator, so I'd expect problems on Windows, but I'm fine with that for now.
resp: HTTPResponse = urlopen(url) | ||
if resp.getcode() != 200: | ||
raise RuntimeError(f"Error downloading {url}") | ||
with open(destination, "wb") as file: | ||
file.write(resp.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resp: HTTPResponse = urlopen(url) | |
if resp.getcode() != 200: | |
raise RuntimeError(f"Error downloading {url}") | |
with open(destination, "wb") as file: | |
file.write(resp.read()) | |
resp: HTTPResponse | |
with urlopen(url) as resp: | |
if resp.getcode() != 200: | |
raise RuntimeError(f"Error downloading {url}") | |
with open(destination, "wb") as file: | |
file.write(resp.read()) |
data: dict[str, dict[str, dict[str, str]]] = json.load(file) | ||
# The root key will be the protobuf source code version | ||
return next(iter(data.values()))["languages"]["python"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some validation of the version, considering its coming from an outside source. Something like:
data: dict[str, dict[str, dict[str, str]]] = json.load(file) | |
# The root key will be the protobuf source code version | |
return next(iter(data.values()))["languages"]["python"] | |
data = json.load(file) | |
# The root key will be the protobuf source code version | |
version = next(iter(data.values()))["languages"]["python"] | |
assert isinstance(version, str) | |
assert re.fullmatch(r"...", version) # proper re here | |
return version |
This way we're also sure (at runtime) that version has the correct type and format.
https://github.com/protocolbuffers/protobuf/blob/main/python/dist/BUILD.bazel | ||
""" | ||
with open(temp_dir / EXTRACTED_PACKAGE_DIR / "python" / "dist" / "BUILD.bazel") as file: | ||
matched_lines = filter(None, (re.search(PROTO_FILE_PATTERN, line) for line in file.readlines())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
matched_lines = filter(None, (re.search(PROTO_FILE_PATTERN, line) for line in file.readlines())) | |
matched_lines = filter(None, (re.search(PROTO_FILE_PATTERN, line) for line in file)) |
temp_dir = Path(tempfile.mkdtemp()) | ||
# Fetch s2clientprotocol (which contains all the .proto files) | ||
archive_path = temp_dir / ARCHIVE_FILENAME | ||
download_file(ARCHIVE_URL, archive_path) | ||
extract_archive(archive_path, temp_dir) | ||
|
||
# Remove existing pyi | ||
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | ||
old_stub.unlink() | ||
|
||
PROTOC_VERSION = run_protoc( | ||
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | ||
mypy_out=STUBS_FOLDER, | ||
proto_globs=extract_proto_file_paths(temp_dir), | ||
cwd=temp_dir, | ||
) | ||
|
||
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") | ||
|
||
# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times | ||
shutil.rmtree(temp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure the temp directory is always cleaned up:
temp_dir = Path(tempfile.mkdtemp()) | |
# Fetch s2clientprotocol (which contains all the .proto files) | |
archive_path = temp_dir / ARCHIVE_FILENAME | |
download_file(ARCHIVE_URL, archive_path) | |
extract_archive(archive_path, temp_dir) | |
# Remove existing pyi | |
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | |
old_stub.unlink() | |
PROTOC_VERSION = run_protoc( | |
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | |
mypy_out=STUBS_FOLDER, | |
proto_globs=extract_proto_file_paths(temp_dir), | |
cwd=temp_dir, | |
) | |
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") | |
# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times | |
shutil.rmtree(temp_dir) | |
with tempfile.TemporaryDirectory() as td: | |
temp_dir = Path(td) | |
# Fetch s2clientprotocol (which contains all the .proto files) | |
archive_path = temp_dir / ARCHIVE_FILENAME | |
download_file(ARCHIVE_URL, archive_path) | |
extract_archive(archive_path, temp_dir) | |
# Remove existing pyi | |
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | |
old_stub.unlink() | |
PROTOC_VERSION = run_protoc( | |
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | |
mypy_out=STUBS_FOLDER, | |
proto_globs=extract_proto_file_paths(temp_dir), | |
cwd=temp_dir, | |
) | |
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") |
|
||
|
||
def main() -> None: | ||
temp_dir = Path(tempfile.mkdtemp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
|
||
|
||
def main() -> None: | ||
temp_dir = Path(tempfile.mkdtemp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
Closes #12511
This is so much faster on Windows, takes a few seconds to run (including downloads and pre-commit). Compared to the over 1m it used to take me on WSL.
I have two design questions:
sync_stubs_with_proto
) that takes all of a scripts' special needs as parameters (including a "post-run" Callable). Or keep the 3 scripts separate with shared helper functions.I'm also open to name changes suggestions.