Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some unit tests failing with xarray 2024.3.0 #958

Closed
pont-us opened this issue Apr 2, 2024 · 2 comments · Fixed by #959
Closed

Some unit tests failing with xarray 2024.3.0 #958

pont-us opened this issue Apr 2, 2024 · 2 comments · Fixed by #959
Assignees

Comments

@pont-us
Copy link
Member

pont-us commented Apr 2, 2024

See e.g. https://ci.appveyor.com/project/bcdev/xcube/builds/49524952/job/6jrh4bi9r3o9unpj.

FAILED test/core/store/fs/test_registry.py::NewCubeDataTestMixin::test_open_unpacked - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::FileFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::MemoryFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::S3FsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/test_timeslice.py::TimeSliceTest::test_append_time_slice - ValueError: Specified zarr chunks encoding['chunks']=(180, 2) for variable named 'lat_bnds' would overlap multiple dask chunks ((90, 90), (2,)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.
@pont-us
Copy link
Member Author

pont-us commented Apr 2, 2024

The <class 'numpy.float32'> != dtype('float64') failures are due to pydata/xarray#8713 fixing pydata/xarray#2304 in xarray 2024.3.0. In summary:

  • Failures were from tests were using a Zarr with a float encoded as int16.
  • Previously, xarray produced a float32 when decoding this variable from the Zarr.
  • In the issue I linked above, it was noted that the NetCDF standard says: "When packed data is read, it should be unpacked to the type of the scale_factor and add_offset attributes, which must have the same type if both are present." xarray 2024.3.0 now implements this behaviour.
  • In a Zarr, these attributes are stored in a JSON file which represents them as decimal numbers without an associated dtype.
  • When xarray reads the Zarr, the attribute values are read as native Python floats (which are 64-bit), which are then converted to NumPy floats (which are therefore float64).
  • Per the NetCDF standard, this np.float64 type is then used for the actual variable data.

So as far as I can see, a variable in a Zarr with scale_factor and add_offset encoding attributes will from now always be read as a float64.

@pont-us
Copy link
Member Author

pont-us commented Apr 2, 2024

The append failure is due to pydata/xarray#8459 fixing pydata/xarray#8882. append_time_slice unchunks co-ordinate variables after every append, which breaks the next append (since slice co-ordinates are chunked).

pont-us added a commit that referenced this issue Apr 2, 2024
We now expect float64, not float32, when decoding and scaling variables
from a Zarr. See Issue #958.
pont-us added a commit that referenced this issue Apr 3, 2024
- Add a data_vars_only parameter to chunk_dataset and
  update_dataset_chunk_encoding.

- Update test_append_time_slice to use this parameter in order to
  ensure compatibility with xarray 2024.3.0

Addresses Issue #958.
@pont-us pont-us self-assigned this Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant