Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add doctest to tox #4285

Merged
merged 2 commits into from
Mar 22, 2022
Merged

Python: Add doctest to tox #4285

merged 2 commits into from
Mar 22, 2022

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 7, 2022

Closes #4284

While going through the Python code, I noticed that not all the examples are valid. I've added doctest to check the examples in the CI.

Some observations:

  • The s3 tests are currently failing because the bucket is invalid (we could add some credentials to the CI to read/write to a single bucket).
  • It looks like the requirements weren't installed.

@github-actions github-actions bot added the python label Mar 7, 2022
@Fokko Fokko force-pushed the fd-add-doctest branch 3 times, most recently from 8a36467 to f8e75bc Compare March 7, 2022 23:04
Copy link
Collaborator

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @Fokko! I left a few comments.

python/setup.py Outdated Show resolved Hide resolved
python/src/iceberg/expressions.py Outdated Show resolved Hide resolved
>>> file_content = input_file.open().read() # Read the contents of the PyArrowFile instance
>>> # Read the contents of the PyArrowFile instance
>>> # Make sure that you have permissions to read/write
>>> # file_content = input_file.open().read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense for now to get the checkdocs test passing but we talked about either mocking out S3 with some conftest monkeypatches or using a minio instance for more robustness. I'm not exactly sure if there's a way to inject mocks into checkdocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me dive into this and see if I can get this working 👍🏻

@samredai
Copy link
Collaborator

samredai commented Mar 9, 2022

We should also move the extras_require in setup.py over to setup.cfg so that we're not setting package info in two places. This stuff collides easily and causes confusion. We can then just have a barebones setup.py and have everything contained in setup.cfg:

setup.py

from setuptools import setup

setup()

Let me know what you think.

@Fokko
Copy link
Contributor Author

Fokko commented Mar 9, 2022

Hey @samredai thanks for the feedback. I think we can remove the setup.py all together. Less is more :)

Copy link
Collaborator

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but I left 2 small comments.

python/setup.cfg Outdated
fastavro>=1.3.2<1.4.0
hmsclient==0.1.1
boto3
pyarrow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of these will be added here as part of PR #4262 and some will be part of upcoming PR's (like boto3 being added as part of a BotoFileIO PR and hmsclient being added as part of a HiveCatalog PR). I think this list can be reduced to just pyarrow for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be just for tests, right? We've discussed keeping hard requirements to a minimum and using primarily optional dependencies. For example, arrow is required if you load ArrowFileIO.

Copy link
Collaborator

@samredai samredai Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point then I think it's worth adding this to the extras_require block, so something like this:

[options.extras_require]
arrow =
    pyarrow

This would allow someone to do pip3 install iceberg[arrow] which will provide all of the iceberg core dependencies + optional arrow dependencies and as we expand this, something like pip3 install iceberg[arrow,jdbc] would be valid. This still allows you to just do pip3 install iceberg if you already have a specific pyarrow version in your environment that you want to use instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @samredai @rdblue Great point, and I agree. Pulling in boto might be too much if you don't use it at all, same for Hive.

Ps. Maybe later on we also might want to look into a full dependency management system for the python codebase. Something like Poetry or PDM. We don't really pin versions of packages, which might cause problems when a downstream package is released with a breaking change (which unfortunately still happens quite a lot).

python/setup.cfg Outdated Show resolved Hide resolved
Copy link
Collaborator

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Fokko!

@rdblue rdblue merged commit 4a7e70f into apache:master Mar 22, 2022
@rdblue
Copy link
Contributor

rdblue commented Mar 22, 2022

Looks great! Thank you for getting this working, @Fokko!

I really like having doctests enforced. This will be useful for the transforms PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add doctest for Python
3 participants