Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code cleanup, first work toward CPython 3 compatibility #33036

Closed
wants to merge 2 commits into from
Closed

Python code cleanup, first work toward CPython 3 compatibility #33036

wants to merge 2 commits into from

Conversation

sanmai-NL
Copy link

This is a first attempt at making the Python codebase dual CPython 2 and 3 compatible.

I was guided by this official porting guide. I used futurize 0.15.2. In this PR, I restricted myself to stage 1 modifications, described as:

Modernize Python 2 code only; no compatibility with Python 3 (or dependency on future
This tools seems to rely on 2to3, the official porting tool.

Some print statements became a little uglier by additional parentheses. That can be prevented with a 2to3 parameter, by why not clean that up later if needed at all ... Let's for now just see if the build still succeeds.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@luqmana
Copy link
Member

luqmana commented Apr 16, 2016

@sanmai-NL Thanks for the pull! Since each of those commits is a small change and the commit messages are the same, you should probably squash them all to one commit.

@alexcrichton
Copy link
Member

Thanks! Most of these scripts aren't actually used anyway, for example almost all of these scripts are on the way out except for bootstrap.py. Perhaps that's the only script that needs to change for now?

Also would it be possible to just change how we print things to be a little more compatible? (parens are all we need, right?).

@nikomatsakis
Copy link
Contributor

r? @alexcrichton

@sanmai-NL
Copy link
Author

sanmai-NL commented Apr 19, 2016

@luqmana:
Thanks for the tip. I intended it this way, so that commits can be removed selectively. The commits have the same commit message to make it clear what (automated) manipulation they underwent. Since it so happens that @alexcrichton brought up limiting this PR to src/bootstrap/bootstrap.py, it seems squashing now is unhelpful?

So as these Python files other than src/bootstrap/bootstrap.py aren't so important, and since the changes are only minimal compatibility changes (with only one, very well understood change in bootstrap.py), there's even less reason to think that merging any of the changes will have any negative impact. Do you agree @alexcrichton?

Questions
I'm not sure about three things:

  1. How does bootstrap.py relate to e.g. rustup.rs? Shouldn't the latter replace the former (even if bootstrap.py is more build internal)?
  2. If most of the Python code isn't useful/important in general, why is it still around? What about removing/splitting off such non-essential stuff?
  3. In Python build scripts: 2.x or 3.x? #33026 @rkruppe suggests that the LLVM Python code isn't critical for the Rust build. Could you comment?

Minimal changes required to gain CPython 3 compatibility for bootstrap.py
@alexcrichton: The print incompatibilities are actually nicely/accidentally circumvented in this script, see #33026. I just tested importing this script in CPython 3.5.1, and this pointed to incompatibility in handling of strings vs. bytes:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-6a5394a49675> in <module>()
--> 1 import bootstrap

rust/src/bootstrap/bootstrap.py in <module>()
    328 
    329 # Fetch/build the bootstrap
--> 330 rb.build = rb.build_triple()
    331 rb.parse_nightly_dates()
    332 rb.download_rust_nightly()

rust/src/bootstrap/bootstrap.py in build_triple(self)
    258         elif ostype == 'Darwin':
    259             ostype = 'apple-darwin'
--> 260         elif ostype.startswith('MINGW'):
    261             # msys' `uname` does not print gcc configuration, but prints msys
    262             # configuration. so we cannot believe `uname -m`:

TypeError: startswith first arg must be bytes or a tuple of bytes, not str

I think the way to move forward is to merge this PR and if bootstrap.py is to stick around, then make it CPython 3 compatible as a first step. If not, I hope you can point out what would be a better time investment to contribute to cleaning up the Rust build process.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 19, 2016

@sanmai-NL

If most of the Python code isn't useful/important in general, why is it still around? What about removing/splitting off such non-essential stuff?

Some of the scripts are irrelevant to the build process but very much tied to the Rust source code, e.g., because they generate source code that goes into libstd or its dependencies. Moving them to a different repo would have no advantages (they're not reusable) and cause minor headaches (keeping the repos in sync). But there are also some scripts that seem obsolete, which nobody bothered to remove (e.g., #31642). Speaking of which:

If not, I hope you can point out what would be a better time investment to contribute to cleaning up the Rust build process.

Identifying and removing such obsolete scripts is thankless but not useless work. It also makes for a cool diffstat 😄

@alexcrichton
Copy link
Member

How does bootstrap.py relate to e.g. rustup.rs?

A good question! The bootstrap.py script is intended to kick off the src/bootstrap build system (written in Rust) to build the compiler itself. It does a similar thing to rustup in that it downloads a pinned version of the compiler/Cargo to work with, but unfortunately rustup won't even replace it as we then wouldn't have a Rust compiler to compiler rustup!

If most of the Python code isn't useful/important in general, why is it still around? What about removing/splitting off such non-essential stuff?

A bunch of it is on the way out but in general we don't clean up src/etc all that regularly, so if a script is checked in it just takes awhile to remove.

I'd definitely be ok removing anything we don't need, but stuff like src/etc/unicode.py are run occasionally and need to stay checked in (but aren't run as part of the normal build system) as @rkruppe mentioned.

In #33026 @rkruppe suggests that the LLVM Python code isn't critical for the Rust build. Could you comment?

Ah I believe what's going on here is @rkruppe is pointing out that LLVM requires Python 2.7 as the version to build it. We just, however, build LLVM, we don't test it. It may be the case that building LLVM has a looser restriction for Python than testing it, but neither @rkruppe nor I seem to know many details here :)

Either way, we're not gonna be rushing to update Python on our bots any time soon, but it's certainly a good idea to whip all our own stuff in to shape for using Python 3 eventually.

I just tested importing this script in CPython 3.5.1, and this pointed to incompatibility in handling of strings vs. bytes:

Oh dear! Feel free to include a fix for that in this PR!


I guess my main question is this: So most of this PR is adding this to the top of all Python files:

from __future__ import print_function

Is that necessary? Shouldn't it only matter if we call print in "flavorful ways", which we should in theory just avoid in our scripts?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 19, 2016

@alexcrichton

Ah I believe what's going on here is @rkruppe is pointing out that LLVM requires Python 2.7 as the version to build it. We just, however, build LLVM, we don't test it. It may be the case that building LLVM has a looser restriction for Python than testing it, but neither @rkruppe nor I seem to know many details here :)

I'm basing this off footnote 2 on http://llvm.org/releases/3.8.0/docs/GettingStarted.html#software which seems to imply no Python is needed at all if you don't run tests. Of course I never actually tried that.

Is that necessary? Shouldn't it only matter if we call print in "flavorful ways", which we should in theory just avoid in our scripts?

It is indeed unnecessary if you restrict yourself to the form print(<single expression>). But that's even uglier than the statement form of print. Updating print calls to use the function form would result in slightly nicer code, so I'd be in favor of that, at least for new code.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 19, 2016

Actually, spamming from __future__ import print_function would have one advantage even if no code changes: It would cause a syntax error when print is used as statement, thus highlighting lines that need to be fixed to be 3.x compatible.

@sanmai-NL
Copy link
Author

sanmai-NL commented Apr 20, 2016

Yes @rkruppe, that was exactly my reason for adding these lines everywhere. Is guess that was unclear. This is actually part of any first attempt at dual Python 2/3 compatibility.

If CPython turns out not be a build dependency at all, then wouldn't it be better if bootstrap.py gets replaced by a native executable? What about shipping a precompiled rustup.rs as part of the distribution? I think it is very valuable if a major build dependency like CPython could be dropped, especially if it is a legacy edition. CPython 2 is space hog even in a minimalistic/modular GNU/Linux distro such as Arch Linux:

Package Size: 11.9 MB
Installed Size: 74.7 MB

Not to mention it is slow and behaves unpredictably across platforms and environments. Probably some people are going to have issues down the road because something is unexpected about their CPython setup (e.g., similar to #32994).

Dropping CPython reduces compilation time and will reduce the size of any virtualization image (e.g., Docker container image) that ships a toolchain to build Rust from source (based on the stated dependencies).

@bors
Copy link
Contributor

bors commented Apr 20, 2016

☔ The latest upstream changes (presumably #32942) made this pull request unmergeable. Please resolve the merge conflicts.

@sanmai-NL
Copy link
Author

Not sure whether I have successfully rebased to resolve the merge conflicts now.

@alexcrichton: it would be helpful if you could make some graphical and/or verbal overview of the build process as public documentation. It will reduce time needed to prepare for contribution.

@hanna-kruppe
Copy link
Contributor

@sanmai-NL

Dropping CPython reduces compilation time and will reduce the size of any virtualization image (e.g., Docker container image) that ships a toolchain to build Rust from source (based on the stated dependencies).

I think this may be overstating the benefits. But regardless,

If CPython turns out not be a build dependency at all, then wouldn't it be better if bootstrap.py gets replaced by a native executable? What about shipping a precompiled rustup.rs as part of the distribution?

Perhaps this topic would be better discussed in a thread over at http://internals.rust-lang.org/ since it's more far-future planning and has little to do with the contents of this PR (which seems borked by the way, as 100+ commits by various authors show up).

@alexcrichton
Copy link
Member

Yeah we're probably not gonna drop Python just yet, but if it turns out that LLVM doesn't need it we're surely much closre! Also as @rkruppe mentioned you may need a git rebase for this.

@sanmai-NL
Copy link
Author

sanmai-NL commented Apr 21, 2016

Oops, sorry for the borked PR state. I've now reset the PR state to the last of my initial commits, recloned, then again tried to follow the project's tips creating the appropriate remotes etc., rebasing worked without conflict (odd?), then tried to push a merge commit to my fork. But that didn't work out so well because master and my branch have diverged. I've finally done a git pull --rebase based on a cursory read of this guide. Still, that also seems to have pulled in a bunch of other commits. I'm okay with redoing the PR or simply closing it, but would like to learn the required Git workflow better for future situations ...

@rkruppe: I questioned the very use of Python to find out whether there's any merit in me spending time on the Python files if they are more-or-less redundant/in flux. Given your and @alexcrichton' s comments and recent commits, I suppose I'd better leave this rather internal part of the source tree to you. Regardless, it would be good the build process (i.e., the purpose and status of the dependencies) is documented schematically somewhere, for outsiders.

@mitaa
Copy link
Contributor

mitaa commented Apr 21, 2016

git reset --hard 57e6b9e should work in this case
(more general would be git rebase master -i and simply removing the unwanted commits)

@alexcrichton
Copy link
Member

Looks like there may still be some lingering commits?

@sanmai-NL
Copy link
Author

@mitaa: thanks. That would reset back to my last commit, but bors reported the PR was unmergable in that state. @alexcrichton: any tips what to do, should I indeed reset to my last commit?

@hanna-kruppe
Copy link
Contributor

@sanmai-NL Reset to your last commit to get rid of the nonsense commits. Then rebase again, and hopefully it works the second time around.

@sanmai-NL
Copy link
Author

@alexcrichton: It's mergable now. 😃

@@ -1,3 +1,4 @@
from __future__ import print_function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these imports go under the license header? Or is it required that they're the first thing in the file?

Sander Maijers added 2 commits April 24, 2016 17:43
* Remove redundant parentheses, imports.
* Improve CPython 3 compatibility.
* Add or modify shebang for scripts predictably depending on specific CPython
  version.
* Simplify conditional control structures.
* Define RustBuild constructor instead of mutating object unnecessarily.
* Apply style normalization according to PEP guides.
* Et cetera.
@sanmai-NL
Copy link
Author

@alexcrichton: Yes, import statements be anywhere, and that didn't look so nice. I have revised the commits and added more work now to clean up the Python codebase, as well as fix some bugs that could cause runtime errors in some unusual branches.

There are also definite bugs in bootstrap.py (spotted through program analysis in my IDE) that I did not fix just yet, in order not to muddle the PR any further and perhaps make mistakes of my own.

If you wish to remove some Python code, please do so after merging this PR, and I if time permits I'll try to refactor the remaining code. Finding out what scripts are important/in use by your team is too much work for me to do I think. For example, it seems one script src/etc/indenter.py is not in use, but I can't be sure.

@alexcrichton
Copy link
Member

@sanmai-NL hm looks like there's a huge number more of related changes going on here now? Perhaps the stylistic ones can be left out for a future PR, and just the Python 3 compatibility ones can be here?

@sanmai-NL
Copy link
Author

sanmai-NL commented Apr 24, 2016

@alexcrichton, I fully agree that PRs should ideally be well circumscribed. One thing I think we already agree about, is that there's a lot to improve about the Python code needed for building and testing Rust.

The trade-off I made here takes into account reviewing/process overhead between submitting multiple PRs vs. submitting one larger PR. Practically of the changes are basic cleanup style changes, just to prepare the ground for more substantial refactoring still to come. I'll make sure that the number, scope and type of commits will be more productive next time. I'm sorry, I think the Python commit isn't disentangleable without excessive effort at this point.

The PR started out as a first, small batch of automated CPython 3 compatibility refactorings. After some exchanges and me having to learn a few things along the way, you then requested a stylistic fixup. But making a stylistic fixup in one or a few files isn't so optimal, without consistency with the other related Python code. And stylistic changes are I think best primarily done with an automated tool following common norms, instead of manually fixing one aspect (i.e., import line position). Ultimately, I've fixed them all up in one go. Most importantly, my PR has remained very easy to review, reason about and understand the motivation for by persons with intermediate Python knowledge.

I didn't find a way to look what the merge conflicts are that GitHub warns me about at the end of this page.

@sanmai-NL sanmai-NL changed the title CPython 3 compatibility Python code cleanup, first work toward CPython 3 compatibility Apr 24, 2016
@alexcrichton
Copy link
Member

Can we keep this PR constrained to just Python 3 compatibility? Let's move the stylistic changes (e.g. whitespace, etc) to a separate PR as they can be somewhat more controversial.

The merge conflicts mostly just indicate that the PR needs a rebase from master.

@sanmai-NL
Copy link
Author

I'll do my best, may have to wait until next weekend though.

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to resubmit with a rebase!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants