-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickling error with numba gufunc #7929
Comments
Sorry for the late reply. I can reproduce this. here is another reproducer (note that from numba import guvectorize, int64, jit
@jit(nopython=True, cache=True)
def f(x, y):
return x + y
@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)', cache=True)
def g(x, y, res):
for i in range(x.shape[0]):
res[i] = x[i] + y
from distributed import Client
with Client() as client:
client.submit(f, 1, 2).result()
print("first success")
# The following fails
client.submit(g, arr, 2).result() |
Sorry, I believe the error I am reproducing with my example is actually a little different. My example only fails if the The dask-examples version fails with a different exception |
Looking into this, git bisect'd to #7564. but I think that just brought out the issue that has always been. It worked before b/c we didn't try to pickle it. It will also 'fail' in dask w/o distributed if one tries to pickle the output array from the numba function. ie: import numba
import dask.array as da
@numba.guvectorize(["int8,int8[:]"], "()->()")
def double(x, out):
out[:] = x * 2
def main():
x = da.random.randint(0, 127, size=(10, 10, 10), chunks=('1 MB', None, None), dtype='int8')
y = double(x)
import pickle
pickle.dumps(double) # Can pickle the function directly.
# But not when the generated function from numba is part of the dask array.
# Fails w/ PicklingError: Can't pickle <ufunc 'double'>: it's not the same object as __main__.double
pickle.dumps(y)
y.max().compute()
if __name__ == '__main__':
main() |
Essentially, I think someone, numba or dask, is generating a bad wrapper to the function and that's somehow brought out when attached to the graph. For example this will raise the same error, as we're changing the location of "foo" function, pickle thinks it's on main, but it's been 'moved' to class Foo:
def __init__(self, func):
self.func = func
def __call__(self):
return self.func()
def decorator(func):
return Foo(func)
@decorator
def foo():
pass
def main():
import pickle
pickle.dumps(foo) # Fails w/ same error "...it's not the same object as __main__.foo"
if __name__ == '__main__':
main() |
xref #3450 |
The problem is that a numba gufunc is a pure-python wrapper around a locally-compiled numpy ufunc. When you pickle the wrapper, you trigger serialization code which ships over the wrapped pure-python function and recompiles it upon unpickle: However, the wrapper undoes itself on By the time control reaches Workaround 1If you just need an elementwise operation, you can use @numba.vectorize(["int8(int8)"])
def double(x):
return x * 2
x = da.random.randint(...)
y = double(x)
pickle.dumps(y) # Works dynamic vectorize works as well: @numba.vectorize()
def double(x):
return x * 2 Workaround 2If you do need to operate on vectors, you can hack the call as follows: @numba.guvectorize(["f8,f8[:]"], "()->()")
def double(x, out):
out[:] = x * 2
x = da.random.randint(...)
y = x.__array_ufunc__(double, "__call__", x)
pickle.dumps(y) # Works The SolutionThis issue has been around at least since 2019 - i did some archeology and found my own ticket, never resolved, on the numba board: numba/numba#4314 Here I'm suggesting a solution: numba/numba#4314 (comment) |
I'm trying this example from the dask-examples repo. When computing the graph it gives a
PicklingError
This happens only when using distributed. I launch a cluster in a Slurm cluster via
ipyparallel
'sbecome_dask
. Something like this. Without using distributed the example works fine.These are the versions of some relevant packages I'm using:
Not sure if this is a bug. Probably it's something in my environment. I found similar serialization errors but nothing recent. Only solved issues earlier than 2022.
The text was updated successfully, but these errors were encountered: