Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DataTree repr to indicate inheritance #9470

Merged
merged 5 commits into from
Sep 10, 2024
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Sep 9, 2024

Still need a few more tests.

The HTML formatter will also need updates (not as part of this PR)

xarray/core/formatting.py Outdated Show resolved Hide resolved
@shoyer shoyer marked this pull request as ready for review September 10, 2024 01:40
@shoyer
Copy link
Member Author

shoyer commented Sep 10, 2024

This is ready for review now.

"""
<xarray.DataTree 'child'>
Group: /child
Dimensions: (x: 1, y: 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this one incorrect? x should not be inherited in this case, because it's defined only on parents' data variables, not parents' coordinate variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do still want to include it here, because it is included in the alignment checks. If you tried to insert an array with a different sized x dimension on "child" it would raise an error.

Copy link
Contributor

@TomNicholas TomNicholas Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. But it would raise an error basically because it didn't align with a variable that only exists in the parent (and is not inherited)? I feel like our discussions about this are being a bit vague in distinguishing between "aligns with data in a node" and "aligns with whole tree".

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Sep 10, 2024
Copy link
Contributor

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really really good - I'm really pleased with how neat this is turning out!

Only one comment about a case that confused me.

Comment on lines +1073 to +1074
def _inherited_vars(mapping: ChainMap) -> dict:
return {k: v for k, v in mapping.parents.items() if k not in mapping.maps[0]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is anywhere else that this function might be useful.

@shoyer
Copy link
Member Author

shoyer commented Sep 10, 2024

@TomNicholas does this look good to merge to you?

@TomNicholas
Copy link
Contributor

I don't have any changes to suggest but I'm still wrapping my head around this:

#9470 (comment)

So basically alignment is relative to all dimensions in the tree, not just inherited coordinates? (Because we check up the ancestors recursively.)

Therefore alignment with ancestors and inheritance of coordinates are really two separate new features, in that we could have theoretically implemented the former without the latter.

And your logic (both here and in the repr of .coords) is that Dimensions should display anything relevant to alignment. And the conclusion about Inherited Dimensions Vs just Dimensions was that the distinction is unimportant because both are equally relevant to alignment, and the user cannot manipulate dimensions directly.

I think that all makes sense to me now.

@shoyer
Copy link
Member Author

shoyer commented Sep 10, 2024

So basically alignment is relative to all dimensions in the tree, not just inherited coordinates? (Because we check up the ancestors recursively.)

Correct -- alignment in Xarray is defined by both coordinates and dimensions. So it depends on inherited dimensions, not just inherited coordinates.

Therefore alignment with ancestors and inheritance of coordinates are really two separate new features, in that we could have theoretically implemented the former without the latter.

Yes, in theory. Though in practice, inheriting coordinates (and dimensions) on sub-trees is the easiest way to ensure alignment. Otherwise you could pull out a sub-tree and use it somewhere else without coordinates, and it could align erroroneously.

@shoyer shoyer merged commit a6bacfe into pydata:main Sep 10, 2024
33 of 34 checks passed
@shoyer shoyer deleted the datatree-repr2 branch September 10, 2024 23:36
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (26 commits)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (29 commits)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  DataTree should not be "Generic" (pydata#9445)
  Disallow passing a DataArray as data into the DataTree constructor (pydata#9444)
  Support additional dtypes in `resample` (pydata#9413)
  Shallow copy parent and children in DataTree constructor (pydata#9297)
  Bump minimum versions for dependencies (pydata#9434)
  Always include at least one category in random test data (pydata#9436)
  Avoid deep-copy when constructing groupby codes (pydata#9429)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider indicating inherited coordinates & dimensions in DataTree repr
2 participants