Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recursive and empty closure cells #443

Merged
merged 46 commits into from
Jan 27, 2022

Conversation

anivegesana
Copy link
Contributor

@anivegesana anivegesana commented Dec 11, 2021

This PR uses a better workaround for super objects. Instead of forcing _byref to False, this PR keeps track of what classes are in the process of being created. When a cell whose value is an incomplete class is encountered, it is saved to a list. Its initial value is None to prevent infinite recursion. Once a class is fully created, all of the cells that recursively reference the class are updated with the completed class.

This PR adds generalized machinery to handle thunks by saving a list of postprocessing steps for each potentially recursive object, like changing the value of a cell. The _postproc can be used for other operations, like registering ABCs and updating the __globals__ and __dict__ of recursive functions and enum classes.

This PR also adds a _shims module to handle version incompatibilities in Python and dill more seamlessly.

Fixes #229, fixes #239, fixes #300, fixes #344, fixes #399, fixes #211

@mmckerns
Copy link
Member

mmckerns commented Dec 14, 2021

Yours is a pretty nice solution...
I think something like the following would work in python 2 (and python 3).

def _create_reference_cell():
    contents = [None]
    def updater(value):
        contents[0] = value
    updater(updater)
    return (lambda: contents[0]).__closure__[0]

@anivegesana
Copy link
Contributor Author

Unfortunately, the value of the cell would become the list instead of the element inside the list, but thank you for the suggestion. I was able to use it to make a solution that works for both Python 2 and Python 3 using the vars function to effectively pass the scope of the outer function to an inner class. For reasons I do not understand, this doesn't work if a function is used instead of a class.

def _create_reference_cell():
    contents = None
    v = vars()
    class Updater(object):
        def __call__(self, value):
            v['contents'] = value
    contents = Updater()
    return (lambda: contents).__closure__[0]

Also, what are you plans with regards to compatibility? In Python 3.7+, cell_contents is assignable directly in Python, so there is no need to coerce Python into changing the value of the variable, we can set it directly using setattr as in fce797e. Python 3.6 will reach end of life in 10 days, so we could just go the easy route and use setattr and not introduce any new functions to dill._dill. Your call.

@mmckerns
Copy link
Member

mmckerns commented Dec 14, 2021

I figured that either a list or a dict could serve as a stand-in for nonlocal. Thanks for making the extension to python 2.7. With regard to compatibility, we will sunset 2.7 and 3.6 soon. The plan is to cut a release this month that is 2.7 compatible, and then turn off 2.7 shortly afterward. Turning off 2.7 has been delayed as we have been holding on to support for pypy2.7. I believe the following release, or so, we would then sunset 3.6. We generally need to keep support alive for a EOL version of python as a use of dill is to store pickled objects, so we generally wait for a threshold of other packages to stop supporting an EOL version before we also drop support.

tests/test_recursive.py Outdated Show resolved Hide resolved
tests/test_recursive.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given the changes requested inline

@anivegesana
Copy link
Contributor Author

Thank you for the help! I'll comb through the issues and see if there are new test cases I need to add. Will let you know when I am done.

@anivegesana
Copy link
Contributor Author

Alright. I did some playing around and found out that creating the pickling of the cell object the way I did does not actually work in Python 2. I just noticed that this issue only applies to Python 3 not Python 2. What I mean is that super is treated totally differently in both Python 2 and Python 3. In Python 3, it is treated as a cell object with a recursive value and in Python 2, it is set as special attributes of the __init__ function:

>>> # Python 2.7
>>> import dill
>>> class A:
...     def __init__(self):
...             super(A, self).__init__()
... 
>>> dill.copy(A)
<class __main__.A at 0x10bb4f7a0>
>>> A
<class __main__.A at 0x10bb4f6d0>
>>> dir(A.__init__)
['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__format__', '__func__', '__get__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'im_class', 'im_func', 'im_self']
>>> A.__init__.im_class
<class __main__.A at 0x10bb4f6d0>

Because of this, we do not need a copy of _create_reference_cell for Python 2. Having one for Python 3 is sufficient.

@anivegesana
Copy link
Contributor Author

Also, I noticed that a PicklingWarning is thrown warning that The byref setting is on..., but the invoking code doesn't result in an error, and instead passes the test. For example, try the klepto tests (seen in at least python 3.7). I assumed that the intent of the warning was to be more informative for the error that is expected to follow. However, if the warning is thrown "often" and there's still a pickling "success"... is this a good thing?

The warning was appearing whenever byref was used instead of when the object was recursive. It should be fixed now and the message should only show up in cases that cause UnpicklingErrors when the pickle is loaded (both in current and older versions of dill.)

@mmckerns
Copy link
Member

I don't feel like there's anything else needed for this particular PR. Please follow up (with at least a comment) on each conversation above where you have been tagged... and let's get this merged.

@anivegesana
Copy link
Contributor Author

Alright. LGTM.

@mmckerns
Copy link
Member

Thanks for the follow up, the change looks good. Can you comment on these two dangling threads?

  1. The @move_to decorator must be paired with the overloading by the Getattr later in the module, correct? It seems like this could be consolidated. It's worth thinking on that a bit. Is it reasonable to punt and open a new ticket for their refactor (i.e. there should be no impact to using them as is, and then refactoring later)?
  2. Do you think there is any impact to the usability of old pickles that relied on the _super hack you fixed (i.e. would this potentially need a shims entry)?

@anivegesana
Copy link
Contributor Author

  1. Sorry about that. Got lost in the giant maze of threads we have here.
  2. There is no need for a shims entry for the _super hack and older pickles that relied on it still load. The reference cycle that it tried to prevent (class -> __init__ function -> closure cell -> class) is already broken in a different place (save_cell) instead of save_function, so the hack is now reduntant. The _super hack was only used in the pickler side (in the save_function). The load functions were not touched, so they will still be able to handle the older pickles that were already created without any shims.

Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mmckerns
Copy link
Member

  • Sorry about that. Got lost in the giant maze of threads we have here.

No problem. Feel free to merge. I'll open up a new ticket for 'explorations' on (1) above.

@anivegesana
Copy link
Contributor Author

Sounds good! I cannot merge the branch because I lack write access to the repo, but I think it is ready for you to merge. I have a couple of other PRs that I am opening now with more changes that were not related to this PR.

@mmckerns
Copy link
Member

Riiiiight. After this long PR, it feels like you should be able to merge. I'll do it now.

@mmckerns mmckerns merged commit 0392e14 into uqfoundation:master Jan 27, 2022
mmckerns added a commit to uqfoundation/mystic that referenced this pull request Mar 12, 2022
@mmckerns
Copy link
Member

@anivegesana: see interesting new failure case in f6f71f5

@anivegesana
Copy link
Contributor Author

anivegesana commented Mar 12, 2022

Yep. I thought that this particular recursion would be uncommon. I only handled the recursion that goes (function/class -> ... -> cell -> function/class). This recursion is (function/class -> ... -> list/dict/set -> function/class). This can be easily fixed by checking that no elements in the dictionary are recursive before picking the dictionary, and adding the dictionary to pickler._postproc. The reason that I didn't address this was that I couldn't find a solution for (function/class -> ... -> list/dict/set/cell -> (tuple/frozenset/object ->)+ function/class), so I didn't do it with the other collections either, in case if there was a more elegant solution to handle all 5, instead of creating placeholder elements and updating them later.

Will look into this later next week. Thanks for keeping me posted!

@mmckerns
Copy link
Member

mmckerns commented Mar 12, 2022

Thanks for the follow-up. To me, placeholder elements are not a bad solution... I've used it several times in _dill. Can you open a new issue and add the details, as above? It will help me get up to speed on it.

@mmckerns
Copy link
Member

Also, @anivegesana, just FYI for the future -- intentionally removing functionality should involve an in-depth management decision. In a package with M/2 dl/day, uncommon cases are going to be encountered regularly. So, I want to be clear here, your PR is superb... and I'd also appreciate you not intentionally dropping corner cases without discussion. I guess there should be tests that fully cover the expected level of functionality, but there's obviously not if I generally run the multiprocess and pathos test suites upon major changes to dill...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment