Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_repr_multiindex* and test_array_repr_dtypes_unix tests failing on 32-bit platforms #9127

Closed
2 of 5 tasks
mgorny opened this issue Jun 14, 2024 · 7 comments · Fixed by #9128
Closed
2 of 5 tasks
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@mgorny
Copy link
Contributor

mgorny commented Jun 14, 2024

What happened?

When running the test suite on 32-bit platforms (e.g. x86), I'm getting the following test failures:

FAILED xarray/tests/test_dataarray.py::TestDataArray::test_repr_multiindex - ...
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_repr_multiindex_long
FAILED xarray/tests/test_dataset.py::TestDataset::test_repr_multiindex - Asse...
FAILED xarray/tests/test_formatting.py::test_array_repr_dtypes_unix - Asserti...

(log below)

I think the tests assume specific object sizes for 64-bit platforms.

This is Python 3.11.9 on 32-bit x86, retested on 380979f.

What did you expect to happen?

Tests passing.

Minimal Complete Verifiable Example

# on a 32-bit architecture, literally:
python -m pytest

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

=================================== FAILURES ===================================
______________________ TestDataArray.test_repr_multiindex ______________________

self = <xarray.tests.test_dataarray.TestDataArray object at 0xe9e027f0>

    def test_repr_multiindex(self) -> None:
        expected = dedent(
            """\
            <xarray.DataArray (x: 4)> Size: 32B
            array([0, 1, 2, 3], dtype=uint64)
            Coordinates:
              * x        (x) object 32B MultiIndex
              * level_1  (x) object 32B 'a' 'a' 'b' 'b'
              * level_2  (x) int64 32B 1 2 1 2"""
        )
>       assert expected == repr(self.mda)
E       AssertionError: assert '<xarray.Data...4 32B 1 2 1 2' == '<xarray.Data...4 32B 1 2 1 2'
E         
E         Skipping 97 identical leading characters in diff, use -v to show
E         Skipping 43 identical trailing characters in diff, use -v to show
E         - x) object 16B MultiIndex
E         ?           ^^
E         + x) object 32B MultiIndex
E         ?           ^^...
E         
E         ...Full output truncated (4 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataarray.py:122: AssertionError
___________________ TestDataArray.test_repr_multiindex_long ____________________

self = <xarray.tests.test_dataarray.TestDataArray object at 0xe9e029d0>

    def test_repr_multiindex_long(self) -> None:
        mindex_long = pd.MultiIndex.from_product(
            [["a", "b", "c", "d"], [1, 2, 3, 4, 5, 6, 7, 8]],
            names=("level_1", "level_2"),
        )
        mda_long = DataArray(
            list(range(32)), coords={"x": mindex_long}, dims="x"
        ).astype(np.uint64)
        expected = dedent(
            """\
            <xarray.DataArray (x: 32)> Size: 256B
            array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
                   17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
                  dtype=uint64)
            Coordinates:
              * x        (x) object 256B MultiIndex
              * level_1  (x) object 256B 'a' 'a' 'a' 'a' 'a' 'a' ... 'd' 'd' 'd' 'd' 'd' 'd'
              * level_2  (x) int64 256B 1 2 3 4 5 6 7 8 1 2 3 4 ... 5 6 7 8 1 2 3 4 5 6 7 8"""
        )
>       assert expected == repr(mda_long)
E       AssertionError: assert '<xarray.Data...2 3 4 5 6 7 8' == '<xarray.Data...2 3 4 5 6 7 8'
E         
E         Skipping 228 identical leading characters in diff, use -v to show
E         Skipping 124 identical trailing characters in diff, use -v to show
E         - x) object 128B MultiIndex
E         ?           - ^
E         + x) object 256B MultiIndex
E         ?            ^^...
E         
E         ...Full output truncated (4 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataarray.py:143: AssertionError
_______________________ TestDataset.test_repr_multiindex _______________________

self = <xarray.tests.test_dataset.TestDataset object at 0xe9d08170>

    def test_repr_multiindex(self) -> None:
        data = create_test_multiindex()
        expected = dedent(
            """\
            <xarray.Dataset> Size: 96B
            Dimensions:  (x: 4)
            Coordinates:
              * x        (x) object 32B MultiIndex
              * level_1  (x) object 32B 'a' 'a' 'b' 'b'
              * level_2  (x) int64 32B 1 2 1 2
            Data variables:
                *empty*"""
        )
        actual = "\n".join(x.rstrip() for x in repr(data).split("\n"))
        print(actual)
>       assert expected == actual
E       AssertionError: assert '<xarray.Data...\n    *empty*' == '<xarray.Data...\n    *empty*'
E         
E         Skipping 71 identical trailing characters in diff, use -v to show
E         - <xarray.Dataset> Size: 64B
E         ?                         -
E         + <xarray.Dataset> Size: 96B
E         ?                        +
E           Dimensions:  (x: 4)...
E         
E         ...Full output truncated (9 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataset.py:351: AssertionError
----------------------------- Captured stdout call -----------------------------
<xarray.Dataset> Size: 64B
Dimensions:  (x: 4)
Coordinates:
  * x        (x) object 16B MultiIndex
  * level_1  (x) object 16B 'a' 'a' 'b' 'b'
  * level_2  (x) int64 32B 1 2 1 2
Data variables:
    *empty*
_________________________ test_array_repr_dtypes_unix __________________________

    @pytest.mark.skipif(
        ON_WINDOWS,
        reason="Default numpy's dtypes vary according to OS",
    )
    def test_array_repr_dtypes_unix() -> None:
    
        # Signed integer dtypes
    
        ds = xr.DataArray(np.array([0]), dims="x")
        actual = repr(ds)
        expected = """
    <xarray.DataArray (x: 1)> Size: 8B
    array([0])
    Dimensions without coordinates: x
            """.strip()
>       assert actual == expected
E       AssertionError: assert '<xarray.Data...oordinates: x' == '<xarray.Data...oordinates: x'
E         
E         Skipping 37 identical trailing characters in diff, use -v to show
E         - <xarray.DataArray (x: 1)> Size: 8B
E         ?                                 ^
E         + <xarray.DataArray (x: 1)> Size: 4B
E         ?                                 ^
E           array([

/tmp/xarray/xarray/tests/test_formatting.py:1090: AssertionError

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: 380979f
python: 3.11.9 (main, May 6 2024, 20:29:08) [GCC 13.2.1 20240210]
python-bits: 32
OS: Linux
OS-release: 6.9.4-gentoo-dist
machine: x86_64
processor: AMD Ryzen 5 3600 6-Core Processor
byteorder: little
LC_ALL: None
LANG: C.UTF8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.6.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.0rc5
dask: None
distributed: None
matplotlib: 3.9.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: None
conda: None
pytest: 8.2.2
mypy: None
IPython: None
sphinx: None

@mgorny mgorny added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 14, 2024
@max-sixty
Copy link
Collaborator

I see some of those tests are skipped on windows; possibly we should also check for whether the platform is 32 bit and skip on those?

@mgorny
Copy link
Contributor Author

mgorny commented Jun 14, 2024

Either that, or adjusting expected sizes depending on the platform.

@max-sixty
Copy link
Collaborator

These are looking at the reprs, so adjusting the expectations isn't easy.

We would def take a PR to skip them based on the platform though!

@mgorny
Copy link
Contributor Author

mgorny commented Jun 14, 2024

Does skipping based on the value of sys.maxsize sound about right? I think that's the simplest way of determining whether we're dealing with a 64-bit platform.

@dcherian
Copy link
Contributor

That seems to be the recommended way

@keewis
Copy link
Collaborator

keewis commented Jun 14, 2024

we can also try hard-coding the dtype for those variables. Since the difference is in the repr, the default dtype of the platform is not actually important

@mgorny
Copy link
Contributor Author

mgorny commented Jun 15, 2024

Actually, fixing expectations doesn't seem that hard, so I'm going to try doing that first. If you don't like that, we can look into other solutions.

mgorny added a commit to mgorny/xarray that referenced this issue Jun 15, 2024
Adjust the expectations in repr tests to account for different object
sizes and numpy type representations across platforms, particularly
fixing the tests on 32-bit platforms.

Firstly, this involves getting the object type size from NumPy and using
it to adjust the expectations in DataArray and Dataset tests.  The tests
were already using int64 type consistently, so only the sizes used
for Python objects needed to be adjusted.

Secondly, this involves fixing `test_array_repr_dtypes_unix`.  The test
specifically focuses on testing a 32-bit, 64-bit and "native" data type,
which affect both size and actual representation (NumPy skips the dtype
attribute for the native data type).  Get the expected size from NumPy
for the native int type, and reuse `repr()` from NumPy for all array
types.
max-sixty pushed a commit that referenced this issue Jun 16, 2024
* adjust repr tests to account for different platforms (#9127)

Adjust the expectations in repr tests to account for different object
sizes and numpy type representations across platforms, particularly
fixing the tests on 32-bit platforms.

Firstly, this involves getting the object type size from NumPy and using
it to adjust the expectations in DataArray and Dataset tests.  The tests
were already using int64 type consistently, so only the sizes used
for Python objects needed to be adjusted.

Secondly, this involves fixing `test_array_repr_dtypes_unix`.  The test
specifically focuses on testing a 32-bit, 64-bit and "native" data type,
which affect both size and actual representation (NumPy skips the dtype
attribute for the native data type).  Get the expected size from NumPy
for the native int type, and reuse `repr()` from NumPy for all array
types.

* Try combining Unix and Windows dtype repr tests
dcherian added a commit to dcherian/xarray that referenced this issue Jun 21, 2024
* main:
  Split out distributed writes in zarr docs (pydata#9132)
  Update zendoo badge link (pydata#9133)
  Support duplicate dimensions in `.chunk` (pydata#9099)
  Bump the actions group with 2 updates (pydata#9130)
  adjust repr tests to account for different platforms (pydata#9127) (pydata#9128)
dcherian added a commit that referenced this issue Jul 24, 2024
* main: (48 commits)
  Add test for #9155 (#9161)
  Remove mypy exclusions for a couple more libraries (#9160)
  Include numbagg in type checks (#9159)
  Improve zarr chunks docs (#9140)
  groupby: remove some internal use of IndexVariable (#9123)
  Improve `to_zarr` docs (#9139)
  Split out distributed writes in zarr docs (#9132)
  Update zendoo badge link (#9133)
  Support duplicate dimensions in `.chunk` (#9099)
  Bump the actions group with 2 updates (#9130)
  adjust repr tests to account for different platforms (#9127) (#9128)
  Grouper refactor (#9122)
  Update docstring in api.py for open_mfdataset(), clarifying "chunks" argument (#9121)
  Add test for rechunking to a size string (#9117)
  Move Sphinx directives out of `See also` (#8466)
  new whats-new section (#9115)
  release v2024.06.0 (#9113)
  release notes for 2024.06.0 (#9092)
  [skip-ci] Try fixing hypothesis CI trigger (#9112)
  Undo custom padding-top. (#9107)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants