Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite _create_code() with Structural Pattern Matching (limited to tuples) #496

Merged
merged 9 commits into from
Jul 3, 2022

Conversation

leogama
Copy link
Contributor

@leogama leogama commented Jun 2, 2022

This is a proposal to simplify the _create_code() logic using Structural Pattern Matching. Sadly, it's a Python 3.10+ only feature, but we just need a limited subset of its functionality that can be implemented with a simple class.

Seems to be working across Python versions –I get the same fails for the version combinations as with the current version. However, I'm not sure the logic is 100% equivalent (mainly for lnotab).

Note: match and case are "soft keywords" and can be used freely outside a match statement.

Related to #488 and #495.

@leogama leogama marked this pull request as ready for review June 8, 2022 17:17
@mmckerns mmckerns self-requested a review June 8, 2022 19:56
@mmckerns mmckerns added this to the dill-0.3.6 milestone Jun 8, 2022
dill/_shims.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice cleanup of the logic, and makes it more maintainable. Have you checked what kind of impact it has on speed? Also, as noted in comments, I'm not convinced that _shims is the right place for the match class. Please address the questions in the comments above.

@leogama
Copy link
Contributor Author

leogama commented Jun 20, 2022

I'm glad you asked about performance, because my first version of the function using pattern matching was performing really poorly as the first 10+ arguments almost always matched (as they are common to almost all patterns and, contrary to the built-in implementation, mine doesn't do anything clever to optimize these cases). I stepped back a little bit the abstraction degree —will push in a couple minutes.

Edit: removed unused features from the match class, they can be added later if needed.

Here is a summary of a microbenchmark of unpickling with dill.loads(obj) done with pyperf on my cheap notebook. Unpickling a function with these changes takes 70% more time in CPython and 15% more in PyPy, which I think is acceptable as these operations have a magnitude order of 10μs.

Version Empty code object* Small function (5 LOC) Median function (30 LOC)
python3.7 1.9 1.5 1.3
python3.8 2.2 1.9 1.7
python3.9 2.2 1.8 1.7
python3.10 2.1 1.8 1.6
python3.11a 2.7 1.9 1.7
pypy3.8 1.5 1.2 1.1

Structural Pattern Matching time divided by procedural time (current implementation). *(def f(): pass).__code__


What I don't get is why the overhead seems to depend on the function size in CPython:

Version Small func / Empty code Median func / Empty code
python3.7 1.1 2.0
python3.8 1.4 2.6
python3.9 1.3 2.7
python3.10 1.4 2.9
python3.11a 1.1 2.3
pypy3.8 1.0 1.8

Ratio of time overheads.

Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you profile to see if the speed hit is localized to something, and potentially try to mitigate it a bit? I'm a bit focused on speed here because we aren't gaining any new features... only making maintenance easier. I like this PR, and hopefully we can squeeze a bit more speed out of it.

@leogama
Copy link
Contributor Author

leogama commented Jun 26, 2022

I actually did some experiments trying to optimize it, with few gains, but the code was heading more and more towards the old implementation, with lots of if statements and manipulations of args. Couldn't localize it though. The version with the best performance was still ~60% slower (instead of 70%) than the current one. Even using the actual match statement in Python 3.10 yielded similar results (I didn't repeat the comparison after my first version, but I think it would be slightly worse than this). Maybe the overhead is because of the dictionary operations after the matching...

Edit: The overhead is not concentrated in the later dict operations, see comment below. Also, the final version I reached is a compromise between logic clearness and really minor efficiency that can be squeezed out of this.

@leogama
Copy link
Contributor Author

leogama commented Jun 26, 2022

I'm playing with line_profiler here... It isn't shedding much light over the localization issue, it seem to depend on the Python version.

@leogama
Copy link
Contributor Author

leogama commented Jun 26, 2022

New benchmark! I did achieve some extra performance squeezing. Changes:

  • Merged Py3.7 with Py3.8/3.9/3.10 case;
  • Added an optimization for the most common case, in which the pickle was created by the same python version of the unpickler;
  • Got rid of a dictionary comprehension, passing the encoding step to the args creation prior to the CodeType(*args) call.

And here are the time results (with an empty code object):

Version New (optimized) Native Python SPM Small function
python3.7 1.35 - 1.17
python3.8 1.44 - 1.26
python3.9 1.52 - 1.29
python3.10 1.43 1.35 1.30
python3.11b 1.91 1.90 1.43
pypy38 1.11 - 1.06

Time ratio over current implementation

So this is a mean overhead of 53% over the supported CPython versions (against the previous 124%), which is diluted down to 29% for a small function (78% previously). For better maintainability and smaller risk of bugs —Python 12a was just released on Friday 🙄— I think it's a good compromise.

@leogama leogama requested a review from mmckerns June 26, 2022 16:15
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is still too much of a speed hit for something that will be called very regularly and is purely for maintainability. If there's no way to squeeze out more performance, I'd suggest we close this. I really appreciate the effort here, and I'd like to see you get the speed down further rather than we close it... I'd expect that if you used match from python STL in 3.10+, then you might see some additional performance for 3.10+. Let's say that we don't want to significantly impact the speed of 3.8+ -- so if that means things like rearranging some if statements for incremental gains, then do it.

@leogama
Copy link
Contributor Author

leogama commented Jul 2, 2022

I understand. I'll see if can get any more microseconds out of this. The middle column in my last comment is a variation using the match statement in Python 3.10 and 3.11b instead of the match class, and you can see that the performance cost is equivalent. And what is Python STL? Standard Library? (I'm confused because match is an addition to the language, not to the library.)

@mmckerns
Copy link
Member

mmckerns commented Jul 2, 2022

And what is Python STL? Standard Library? (I'm confused because match is an addition to the language, not to the library.)

Ah... I meant using the python match addition to the language. Ok, now I understand the middle column.

@leogama
Copy link
Contributor Author

leogama commented Jul 3, 2022

Edit: put the Python 3.11a as the last case as it's the least likely.


Look, if we don't check the code members' types, the average overhead among supported CPython versions goes down to a minimum of 10% for an empty code object (against 53% with type checks) and of 6% for a small function (was 29% with type checks). Also it's apparently faster than the current implementation in PyPy. I didn't commit this version yet.

Version Empty code obj Small function
python3.7 1.06 0.97
python3.8 1.08 1.07
python3.9 1.09 1.05
python3.10 1.09 1.12
python3.11b 1.19 1.09
pypy38 0.97 0.97

Ratio between new implementation time and current implementation time.

The only different in behavior will be, if the code object changes again to a format with the same number of members from previous versions but other composition, _create_code() will raise a TypeError from CodeType(*args) instead of a nice UnpicklingError("pattern match for code object failed"). Maybe we can live with that. 🤔

I think this version is faster in general than the native pattern matching (didn't test this time), as there are no variable/keyword bindings in the commonest case. Anyway, if in the future it is necessary to distinguish between code objects of the same size but different types, the framework to do it is there.

@leogama leogama requested a review from mmckerns July 3, 2022 03:01
@mmckerns mmckerns merged commit dc35b66 into uqfoundation:master Jul 3, 2022
@leogama leogama deleted the matching-backport branch July 3, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants