Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling fails for immutable objects that occur in recursive structures #458

Closed
anivegesana opened this issue Mar 17, 2022 · 5 comments
Closed
Labels
Milestone

Comments

@anivegesana
Copy link
Contributor

Minimal failure case:

def f():
  d = {}
  def g():
    return d
  d['g'] = g
  return g

import dill
dill.dumps(f())

g.__closure__ contains g, but not directly. Currently, save_cell only breaks cycles that directly contain the function, not collections or objects that contain the function.

CloudPickle's solution of breaking the cycles within functions as opposed to within cells (like dill does) may be undesirable in some situations:

import cloudpickle

class C:
  def __init__(self, fx):
    self.fx = fx
    print(self.fx.d)
  def __reduce__(self):
    return (C, (self.fx,))

def f():
  t = ((lambda: t),)
  t[0].d = 0
  t[0].c = C(t[0])
  return t

pkl = cloudpickle.dumps(f())
cloudpickle.loads(pkl)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
AttributeError: 'function' object has no attribute 'd'

Therefore, a more dynamic cycle breaking mechanism would be preferred, but tradeoffs of each method should be deliberated.

Another example:

uqfoundation/mystic@f6f71f5

@mmckerns mmckerns added the bug label Mar 18, 2022
@mmckerns
Copy link
Member

mmckerns commented Apr 9, 2022

@anivegesana: I'm really hesitant to put out a new release that removes functionality, however we are at that point after your PR #443. What are your plans for this issue? I may have to roll back #443, or try to fix this issue, before I put out the next release. I've been delaying it in hopes this gets resolved. Essentially, it's my fault that I didn't catch you'd removed functionality in #443. However, let me know what your plans are with addressing this, or if I need to do it.

@anivegesana
Copy link
Contributor Author

anivegesana commented Apr 9, 2022

I was unable to find any significant cases where cloudpickle's approach (break cycles in the functions) fails. The only two cases that their approach fails are ones where the __closure__ member of a function is used in its own definition and ones that depend on the insertion order of __dict__ (like the one I mentioned above.) Neither case would appear in normal Python code.

I guess I only need to know if it is fine to exclude these two rarer edge cases so that we can adopt this simpler solution. This, a complete solution, or not making any new changes will not remove functionality since this feature didn't work before #443. The only changes of a more complete solution would be reimplementations of save_list, save_dict, save_tuple, etc. that can gracefully handle RecursionErrors, so adding a more complete solution later wouldn't break backwards compatibility since none of the Unpickler functions would have to be removed or added.

The implementation of the simpler solution is at #461.

@mmckerns
Copy link
Member

sounds like a good plan, thanks for the quick follow-up

@mmckerns
Copy link
Member

@anivegesana: What do you want to do with this issue in light of #461 and #448?

@anivegesana
Copy link
Contributor Author

I think we can close it. Although there are some objects that cannot be handled by the PR, a custom __reduce__ function can be written to manually break the circularity. Another issue can be opened if this is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants