Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some refactoring of string encoding/decoding utilities #1342

Closed
wants to merge 2 commits into from

Conversation

embray
Copy link
Contributor

@embray embray commented Sep 21, 2018

This is a bit of refactoring pulled out from my Cygwin branch #998.

I think these new string utilities functions are very clear and logical in their respective purposes. I also took a close look at the current state of the _psutil_windows.c module, and I believe it is quite consistent about returning unicode strings, so the several calls to unicode2str should be safe.

I also moved open_text and open_binary from the linux module to the common module. They're still only used in the linux module, but my experience from #998 shows that there should be use for them elsewhere as well.

@embray
Copy link
Contributor Author

embray commented Sep 21, 2018

Hmm--I'm having trouble saying for sure, since I also get some test failures on Linux even against master, but I think at least some of these test fails might be caused by my changes... I will take a closer look.

@embray
Copy link
Contributor Author

embray commented Sep 21, 2018

Ok, with this push I think I at least fixed all the failures that were caused by this change (it only required updating some of the tests that mocked psutil._pslinux.open).

@embray
Copy link
Contributor Author

embray commented Sep 21, 2018

That worked on Travis, I think, in that the failures there are the same as on master. Not sure what went wrong with AppVeyor but it seems like a local problem there.

@embray
Copy link
Contributor Author

embray commented Sep 26, 2018

Rebased to skip the changes to open_text/open_binary since those ended up being included in #1341.

@giampaolo
Copy link
Owner

I have mixed feelings about this.

  • the only thing I'm really in favor of is bytes2str() to be put in _common.py.
  • str2bytes() is only used once (by _pslinux.py) so it doesn't seem necessary for it to live in _common.py
  • unicode2str() is only used by _pswindows.py and Windows in general is the only platform with a C extension returning unicode so I suggest to keep this utility function in _pswindows.py (the py2_strencode -> unicode2str rename is a good one).

@embray
Copy link
Contributor Author

embray commented Sep 26, 2018

If the question is just a matter of where the functions should live, I think it makes more sense to keep them all grouped together. I'll add that most, if not all of these will wind up being used in the Cygwin implementation, even as I am in the process of redoing that. I think many of them could also find uses in other existing modules, but I haven't done an exhaustive search yet so as to avoid unnecessary diffs.

If it's a question of too much clutter in _common.py these could also go in _compat.py since they really are primarily useful for Python 2/3 compatibility and are not really needed for pure Python 3.

@giampaolo
Copy link
Owner

Kinda busy in these days. Lemme get back to this next week.

@embray
Copy link
Contributor Author

embray commented Sep 26, 2018

Ok no rush.

@giampaolo
Copy link
Owner

Hello. Sorry for taking so long.
I think str2bytes is not necessary because it's only used once. Also I think it makes sense to define unicode2str in _pswindows.py. If you'll need it for the Cygwin PR you can move it later. bytes2str looks good.

@embray
Copy link
Contributor Author

embray commented Nov 29, 2018

I understand the tendency in this code base to keep function definitions only in the modules that use them. However, I don't agree with that as a strict principle, especially when it comes to very generic utility functions. In particular, there's no point in putting a function in one specific module when I can say for sure that it will be used elsewhere.

In the case of str2bytes it is only used once in this PR for the sake of keeping it minimal, but it can and will be used in more places as #998 demonstrates (there it was called py2_strencode. Even if it is very simple it makes sense to have as a dedicated function: It has the obvious symmetry to bytes2str, and code like this is meant primarily for Python 2/3 straddling, so keeping cross-compatibility details out of functional code makes it less cluttered and easier to refactor later (e.g. when Python 2 is dropped). It's much clearer than something like s if PY2 else s.encode('ascii') or something.

@embray
Copy link
Contributor Author

embray commented Nov 29, 2018

In other words, although this PR is simple, please consider that I have a bigger picture in mind here (resulting from breaking up a large PR like #998 into smaller, more digestible pieces).

@giampaolo
Copy link
Owner

giampaolo commented Nov 30, 2018

Hello @embray. I understand your rationale and that this PR is a preparation for #998, but #998 is still largely incomplete and has been in that state for a very long time. We're not even sure if it will ever be merged actually, so I don't think it makes sense to make assumptions related to a PR which is still a WIP. Code changes should be made in relation to the situation of the current master branch, not another unmerged PR. If you need str2bytes in #998 I think it makes more sense to just do it as part of #998 (after all it's a simple change/refactoring).

@giampaolo giampaolo closed this Mar 1, 2019
@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

Is there a rationale for closing this? Do you have some better alternative now?

@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

As I explained I would like to try to add Cygwin support, but in smaller pieces at a time so that it's easier to deal with (e.g. start with only supporting minimal functionality and then build on that). But your last comment was very discouraging so I haven't felt very motivated to work on this lately, which is not to say I won't, because maintaining a downstream patch from version to version of psutils sucks.

@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

Your comment

I think str2bytes is not necessary because it's only used once.

still doesn't make sense. Since when has "it's only used once" been a reason on its own not to write a function for something, especially in case like this where it has an obvious symmetry with another function (bytes2str)?

If it will help get this merged I can rework a bit to do things your way but it doesn't really make sense to me.

@giampaolo
Copy link
Owner

giampaolo commented Mar 1, 2019

As I said, the PR should be made in respect to the master GIT branch, not to another PR which is still incomplete. I understand you that this change may make sense related to the Cygwin PR, but it makes less sense in relation to the the current "Cygwin-less" code. It's not wrong per se, it's just that I have mixed feelings about it, so I prefer not to include it exactly as you propose. As the code currently stands I think bytes2str is a win, so if you remove the rest I'll be happy to merge it. I think the removed parts can easily be added either to _pswindows.py (if Cygwin is gonna live there) or in _common.py (if Cygwin is gonna live in _pscygwin.py). The fact that we're not sure yet where Cygwin code is gonna live is one of the reasons I'm rejecting this PR as it stands.

As for Cygwin, don't let this tiny change demotivate you. Cygwin support is something I'm very happy to include and that I'm looking forward to. If you decide to go back working on that I can give you full support in the integration process (I know it's challenging).

@giampaolo
Copy link
Owner

Also, I think this should be included:

unicode2str() is only used by _pswindows.py and Windows in general is the only platform with a C extension returning unicode so I suggest to keep this utility function in _pswindows.py (the py2_strencode -> unicode2str rename is a good one)

@giampaolo giampaolo reopened this Mar 1, 2019
@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

My point is that I do have other changes in the waiting that use this (as you say, very minor) addition, so from my perspective I can already see what's going to be used where. Would it help if I went ahead and added some additional PRs that are based on top of this one, so that that point would be clearer? Because you seem to be uncertain about how some things are going to go, whereas from my perspective there is total certainty (modulo any changes I might make in response to review).

Otherwise, like I said, it's your project so I'm happy to do things your way for now.

@giampaolo
Copy link
Owner

giampaolo commented Mar 1, 2019

I would have to properly review the cygwin code again (I was actually looking at it right now). As far as the Python related code goes, by the look of it, it seems that we have a lot in common with Linux (more than with Windows), so I guess it would make sense to land cygwin code in _pslinux.py. If the python logic differs a lot it may make sense to have _pslinux.py import _pscygwin.py and do if CYGWIN: return _pscygwin.fun(). But let's talk about that in #998.

@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

Thank you again for re-opening this issue.

I think maybe there's a point I'm not making clearly enough: I don't plan to continue work on #998. It's too old at this point, and it suffered from trying to do too much at once. I already long since started work on this over again in my local branches (though obviously keeping much from the original attempt). But I'm trying to move toward a new attempt that will start with just a very minimally functional Cygwin port that to start with lacks all but the most basic functionality (a handful of Process methods at the most).

Cygwin is different enough from most other platforms that it will require its own support module, though being largely POSIX-compliant it will use some functionality from the posix module. It may also be able to reuse some parts of the linux module, but on the whole it's different enough from linux that it would result very quickly in unmanageability if it tried to use the existing linux module directly.

Also, in principle, Cygwin should not use any functionality from the Windows module. In #998 I ended up doing so for a few more advanced cases that were not possibly natively through Cygwin. For my rework I want to avoid doing that entirely, at least for the first version, in order to keep things simpler. Later I would like to add more shared functionality from the windows module (parts of which will also possibly take the form of upstream patches to Cygwin, e.g. to add better support for /proc/net) but not at first.

Does that make sense?

@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

I think I'll go ahead and put together a PR for my simplified bare-minimum implementation which I think will be a lot clearer to look at than #998, and might also help clarify why it's not appropriate to use _pslinux.py (at least as-is) on Cygwin.

@giampaolo
Copy link
Owner

That sounds good. That code is indeed too old. Please keep in mind that the main goal should be avoiding the duplication of functionality. If you feel the need to copy something the answer should be to move stuff around and refactor instead. I will give you a detailed reply in #998.

@embray
Copy link
Contributor Author

embray commented Mar 1, 2019

Yep. This time I'm avoiding duplication from the start.

@embray
Copy link
Contributor Author

embray commented Apr 10, 2019

Come on man, I'm happy to drop this code if you're not happy with it. I just wanted an opportunity to demonstrate how I planned to use it, but it's not so important--if it's still not convincing I can drop it.

Will you please consider reopening or #1483? Or perhaps I can start a new one with less toxicity, and dropping the commit from this PR.

I want to work with you on getting this done. My main frustration is just your unwillingness to explain your reasoning behind things. Open source is a collaboration and if you want to take code from other people and force into your way of thinking I think it at least deserves an explanation, so that we can work more easily toward an agreeable solution. "Because I said so" is seldom the way to do that. In the case of this I don't care but in other cases it's reasonable...

@embray
Copy link
Contributor Author

embray commented Apr 10, 2019

Following up here since you locked the other PR without giving me a chance to respond

Crucial things (because interconnected with other functionality) like Process create_time(), ppid(), pids(), memory_info(), name(), are either not implemented or skipped in unit-tests, meaning they're broken,

I don't know what you mean here. Every last one of those are implemented and working quite well. Originally I had a few tests for them marked to skip before they were implemented. I thought I removed the skip decorators on those later but I migh have missed a couple?

and you ignored my initial inline comments to this PR. I have and would have more of those as we proceed, because this is how the code review workflow is supposed to work, and for good reasons,
not because I don't like you or something.

If there were any inline comments I didn't see them? Maybe when I did a force-push they were hidden, although I thought GH has gotten better about that lately. Anyways I never saw them and it's too bad because I would have loved to see some concrete commentary on the code, as I knew there would be some questions and areas of concern about specific things. Anyways I know how code review works...

There is no point in committing this as-is and it is inappropriate of you to tell me what should be part of a new release so insistently.

I didn't tell you what to do. I just said what I thought made sense and explained my reasoning. If you don't want to make a release with experimental functionality fine. My follow-up suggestion (which I think maybe I wasn't clear about) was to merge a partial implementation into master, or maybe even some other branch, so that I could build on it in additional PRs before making a release from that branch. This is very easy to do in git and I thought might be a more digestible approach.

If you reject that workflow too then fine, I can try to do everything in PR. That just gets more difficult over time because the longer it takes to finish the more it gets out of sync with master and the more rebasing is needed and the like. I can do that anyways but I think it's worth explaining my reasoning.

My aim at delivering a close-to-complete implementation is because I believe it can be achieved.

That's easy to say when you're not the one doing the actual work =)

And even in case we decide to do otherwise (minimal functionality) as we go, it would at the very least pass the quality standards, meaning engaging in a code review first and progressively go from "draft" to "minimal" (but stable).

That's totally fine! I'd say it'd already "minimal but stable" but that's obviously pending changes based on review.

I'm sorry you don't like my tone in other cases. I'll try to be mindful of that.

@giampaolo
Copy link
Owner

Feel free to open a new PR @embray so we can forget about the old one. But I want to be clear: I do not want to go into the whole design-related and decision-making dance again, OK? And above all let's avoid this:

<<I'm frustrated by your unwillingness to explain your reasoning behind things>>

I am not that. I think I motivated my decisions multiple times.

If there were any inline comments I didn't see them? Maybe when I did a force-push they were hidden, although I thought GH has gotten better about that lately.

I also can't see them anymore. Not sure what happened (I guess it's the forced push). I will re-add them to the new PR and we'll restart from there.

I'm sorry you don't like my tone in other cases. I'll try to be mindful of that.

Deal. Let's move on.

@giampaolo
Copy link
Owner

giampaolo commented Apr 11, 2019

My follow-up suggestion (which I think maybe I wasn't clear about) was to merge a partial implementation into master, or maybe even some other branch, so that I could build on it in additional PRs before making a release from that branch.

Makes sense. I created a branch here:
https://github.com/giampaolo/psutil/tree/cygwin
You can push your previous patch minus the str utility functions in there and make a new PR (+ next ones) against that branch.

@giampaolo
Copy link
Owner

For the files returning an empty string (e.g. /proc/pid/stat) for now just duplicate the logic/method in _pscygwin.py, even if it's identical to Linux (just add a comment saying the method is identical, so we can keep track).

Same goes for _psutil_cygwin.c in which we can brutally copy/paste stuff from _psutil_windows.c, but let's leave that for later and aim at an initial PR which touches python files only.

The idea is to duplicate code now and refactor later when we have the full picture.

Also please keep the changes to the bare minimum (resist the temptation to refactor unrelated stuff or add new one, like PY2 (but moving encode, decode etc. into _common.py is fine)).

@giampaolo
Copy link
Owner

Moved common utilities into _common.py (master): c0aba35

@embray
Copy link
Contributor Author

embray commented Apr 24, 2019

I'll open a new PR when I feel up to dealing with it again.

@giampaolo giampaolo mentioned this pull request Nov 24, 2020
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants