Address latest pandas-related upstream test failures #9081

spencerkclark · 2024-06-09T16:36:55Z

This PR addresses the upstream failures described in #8844 (comment) with a few minor changes to ensure that, for the time being, nanosecond precision times continue to be used in xarray. These failures stem from pandas-dev/pandas#55901, which causes pandas.to_datetime to infer the precision to use based its input instead of always using nanosecond precision.

See the review comments for an explanation of the changes.

spencerkclark · 2024-06-09T17:49:42Z

xarray/tests/test_variable.py

+        (datetime(2000, 1, 1), has_pandas_3),
+        (np.array([datetime(2000, 1, 1)]), has_pandas_3),


With pandas 3, pd.Series(datetime.datetime(...)) will produce a Series with np.datetime64[us] values instead of np.datetime64[ns] values, so this conversion now warns.

spencerkclark · 2024-06-09T17:52:11Z

xarray/tests/test_groupby.py

-    dd = times.to_pydatetime()
-    reference_dates = [dd[0], dd[2]]
+    reference_dates = [times[0], times[2]]


As far as I can tell, whether reference_dates started as datetime.datetime objects or np.datetime64[ns] values was not material to this test, so I removed the conversion to datetime.datetime to avoid the conversion warning under pandas 3 (the times would previously get converted back to datetime64[ns] values in the DataArray constructor).

spencerkclark · 2024-06-09T17:55:26Z

xarray/tests/test_dataarray.py

-        roundtripped = DataArray.from_dict(da.to_dict())
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", message="Converting non-nanosecond")
+            roundtripped = DataArray.from_dict(da.to_dict())


da.to_dict() produces datetime.datetime objects, which under pandas 3 lead to a conversion warning in the DataArray constructor.

if we have this pattern in multiple modules, it might be worth adding the code as a special context manager to xarray.tests.__init__. Something like this might work (I didn't check):

from contextlib import contextmanager import warnings @contextmanager def ignore_warnings(category=None, pattern=None): if category is None and pattern is None: raise ValueError("need at least one of category and pattern") try: with warnings.catch_warnings(): warnings.filterwarnings("ignore", message=pattern, category=category) yield finally: pass

Thanks—I ended up switching to marking these tests with @pytest.mark.filterwarnings("ignore:Converting non-nanosecond"), since that is a pattern we use elsewhere in the tests already.

spencerkclark · 2024-06-09T17:56:59Z

xarray/tests/test_plot.py

-        darray = DataArray(data, dims=["time"])
-        darray.coords["time"] = np.array([datetime(2017, m, 1) for m in month])
+        times = pd.date_range(start="2017-01-01", freq="ME", periods=12)
+        darray = DataArray(data, dims=["time"], coords=[times])


Use of datetime.datetime objects was immaterial to this test, so we use pd.date_range to produce the dates instead to avoid the non-nanosecond conversion warning.

spencerkclark · 2024-06-09T17:58:22Z

xarray/tests/test_variable.py

+            with warnings.catch_warnings():
+                warnings.filterwarnings("ignore", message="Converting non-nanosecond")
+                expected = self.cls("t", dates)


This is needed since dates sometimes consists of datetime.datetime objects, which leads to a conversion warning under pandas 3.

spencerkclark · 2024-06-09T18:00:43Z

xarray/tests/test_backends.py

@@ -529,7 +529,7 @@ def test_roundtrip_string_encoded_characters(self) -> None:
            assert actual["x"].encoding["_Encoding"] == "ascii"

    def test_roundtrip_numpy_datetime_data(self) -> None:
-        times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"])
+        times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"], unit="ns")


pandas.to_datetime will infer the precision from the input in pandas 3, so we explicitly specify the desired precision now.

xarray/tests/test_combine.py

keewis

Thanks for the quick fixes! This looks good to me.

I know I have not been doing that either for the numpy>=2 changes, but I wonder if we should add a whats-new entry (internal changes)?

dcherian · 2024-06-10T15:49:37Z

Thanks @spencerkclark

* Address pandas-related upstream test failures * Address more warnings * Don't lose coverage for pandas < 3 * Address one more warning * Fix accidental change from MS to ME * Use datetime64[ns] arrays * Switch to @pytest.mark.filterwarnings

* don't remove `netcdf4` from the upstream-dev environment * also stop removing `h5py` and `hdf5` * hard-code the precision (I believe this was missed in #9081) * don't remove `h5py` either * use on-diks _FillValue as standrd expects, use view instead of cast to prevent OverflowError. * whats-new * unpin `numpy` * rework UnsignedCoder * add test * Update xarray/coding/variables.py Co-authored-by: Justus Magin <keewis@users.noreply.github.com> --------- Co-authored-by: Kai Mühlbauer <kai.muehlbauer@uni-bonn.de> Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Address pandas-related upstream test failures

891fd6e

spencerkclark added the run-upstream Run upstream CI label Jun 9, 2024

spencerkclark force-pushed the upstream-failures-2024-06-09 branch 2 times, most recently from ed23adc to ad164b7 Compare June 9, 2024 17:30

Address more warnings

bd875d3

spencerkclark force-pushed the upstream-failures-2024-06-09 branch from ad164b7 to bd875d3 Compare June 9, 2024 17:39

spencerkclark added 2 commits June 9, 2024 13:48

Don't lose coverage for pandas < 3

616c179

Address one more warning

1a3bdf6

spencerkclark commented Jun 9, 2024

View reviewed changes

Fix accidental change from MS to ME

334d118

keewis mentioned this pull request Jun 10, 2024

Fix upcasting with python builtin numbers and numpy 2 #8946

Merged

4 tasks

spencerkclark added 3 commits June 10, 2024 07:39

Use datetime64[ns] arrays

36a005a

Switch to @pytest.mark.filterwarnings

85c95a1

Merge branch 'main' into upstream-failures-2024-06-09

d181b97

keewis approved these changes Jun 10, 2024

View reviewed changes

dcherian merged commit ef709df into pydata:main Jun 10, 2024
26 of 28 checks passed

spencerkclark deleted the upstream-failures-2024-06-09 branch June 11, 2024 01:10

This was referenced Jun 12, 2024

⚠️ Nightly upstream-dev CI failed ⚠️ #9098

Closed

release notes for 2024.06.0 #9092

Merged

keewis added a commit to keewis/xarray that referenced this pull request Jun 19, 2024

hard-code the precision (I believe this was missed in pydata#9081)

429f87d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address latest pandas-related upstream test failures #9081

Address latest pandas-related upstream test failures #9081

spencerkclark commented Jun 9, 2024 •

edited

Loading

spencerkclark Jun 9, 2024

spencerkclark Jun 9, 2024

spencerkclark Jun 9, 2024

keewis Jun 10, 2024

spencerkclark Jun 10, 2024

spencerkclark Jun 9, 2024

spencerkclark Jun 9, 2024

spencerkclark Jun 9, 2024

keewis left a comment

dcherian commented Jun 10, 2024

		(datetime(2000, 1, 1), has_pandas_3),
		(np.array([datetime(2000, 1, 1)]), has_pandas_3),

Address latest pandas-related upstream test failures #9081

Address latest pandas-related upstream test failures #9081

Conversation

spencerkclark commented Jun 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keewis left a comment

Choose a reason for hiding this comment

dcherian commented Jun 10, 2024

spencerkclark commented Jun 9, 2024 •

edited

Loading