Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add days_in_year and decimal_year to dt accessor #9105

Merged
merged 20 commits into from
Sep 10, 2024

Conversation

aulemahal
Copy link
Contributor

@aulemahal aulemahal commented Jun 12, 2024

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

Changes:

  • Added is_leap_year to the CFTimeIndex.
  • Added two dt accessors:
    • days_in_year : the number of days in the year of the datetime, based on the previous method for cftime and the already existing one for numpy.
    • decimal_year : the date as the year + fraction of the elapsed year. The underlying function is used for converting 360_day calendars and for interp_calendar. This can be useful for certain astronomical calculations, among others.
  • Rewrote xr.coding.calendar_ops._days_in_year to make it more explicit and more performant. It now accepts both DataArrays or scalars, returning the appropriate type. This is used for 360 day calendar conversion. I could not use the dt accessor as there is a need for a calendar override here.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aulemahal—this seems useful. I realize it adds a bit of extra work, but what do you think about adding days_in_year and decimal_year attributes to cftime.datetime objects themselves? It seems cleaner for the calendar math / logic to be defined there rather than xarray—then adding accessors is fairly straightforward.

Admittedly it may not have been totally planned this way, but this is how the days_in_month accessor came to be (#3935).

@aulemahal
Copy link
Contributor Author

I didn't think of that, but indeed, this seems to make most sense. I guess this also means implementing it in pandas ? The current decimal_year uses stuff from cftime, but could be rewritten without in pandas.

Well, I'll do that, but it will take more time 😅

@spencerkclark
Copy link
Member

True, yeah, let me ponder it a bit more before we make issues in a bunch of places. There may at least be a way we can take better advantage of existing public cftime and pandas code.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I think there are ways we can simplify a lot of this. Here are some initial thoughts—let me know if they make sense.

While it would be nice if cftime and pandas supported these as attributes of datetime-like objects, for some of these calendar operations it seems like you still need to have a function that allows you to query how many days are in an integer year (unattached to a datetime) with a given calendar, so for now we might as well use that function in implementing the days_in_year accessor.

It is also nice not to need to worry about only allowing these accessors for certain versions of cftime and pandas. We could still think about trying to upstream some of this functionality eventually and switching to that once minimum versions were appropriate, but that could come later.

xarray/coding/calendar_ops.py Outdated Show resolved Hide resolved
xarray/coding/calendar_ops.py Outdated Show resolved Hide resolved
xarray/core/accessor_dt.py Outdated Show resolved Hide resolved
xarray/core/accessor_dt.py Outdated Show resolved Hide resolved
@aulemahal
Copy link
Contributor Author

@spencerkclark It took me a long time to come back to this! I tried to address your comments the best I could.

The top post was edited to mirror how I changed this PR. I will try to find time and open PRs in pandas and cftime.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aulemahal! I think this is close—just a few minor suggestions.

Could you also add days_in_year and decimal_year as fields in the test_dask_field_access test within test_accessor_dt.py? This will confirm these are / remain appropriately dask and multi-dimensional array compatible.

xarray/core/accessor_dt.py Outdated Show resolved Hide resolved
xarray/coding/calendar_ops.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/core/accessor_dt.py Show resolved Hide resolved
xarray/coding/calendar_ops.py Outdated Show resolved Hide resolved
@spencerkclark
Copy link
Member

The test failures are all in environments with NumPy < 2, and the issue looks reminiscent of #9387 (comment). The dask issues with the rollback approach were unrelated and seemed more readily solvable, which I tried to address in Ouranosinc#18. @aulemahal let me know if those changes make sense.

@spencerkclark
Copy link
Member

Actually my approach in Ouranosinc#18 suffers from the same problem as the _yearstart_np approach, since _rollback similarly returns a datetime64[ns] array 🤦. I'll see if I can come up with a different workaround, but worst case we can skip the dask test with NumPy < 2.

@spencerkclark
Copy link
Member

OK I think I have an alternative approach that should work now in Ouranosinc#18 (basically keeping the decimal year computation fully within NumPy and not exposing datetime64[ns] arrays as dask outputs).

I tested this interactively for cftime dates too, but I'll work on adding some official tests for this.

@spencerkclark
Copy link
Member

I updated the tests for decimal_year in Ouranosinc#18 to now cover all combinations of datetime64[ns] / cftime and NumPy / dask.

Fix dask compatibility issues in `_decimal_year`
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience @aulemahal—this looks good to me now!

@spencerkclark spencerkclark added the plan to merge Final call for comments label Sep 9, 2024
@dcherian
Copy link
Contributor

Thanks @aulemahal and @spencerkclark

@dcherian dcherian merged commit cc74d3a into pydata:main Sep 10, 2024
34 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (26 commits)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (29 commits)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  DataTree should not be "Generic" (pydata#9445)
  Disallow passing a DataArray as data into the DataTree constructor (pydata#9444)
  Support additional dtypes in `resample` (pydata#9413)
  Shallow copy parent and children in DataTree constructor (pydata#9297)
  Bump minimum versions for dependencies (pydata#9434)
  Always include at least one category in random test data (pydata#9436)
  Avoid deep-copy when constructing groupby codes (pydata#9429)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants