Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xr.concat concatenates along dimensions that it wasn't asked to #8231

Open
4 tasks done
TomNicholas opened this issue Sep 25, 2023 · 4 comments
Open
4 tasks done

xr.concat concatenates along dimensions that it wasn't asked to #8231

TomNicholas opened this issue Sep 25, 2023 · 4 comments
Labels
bug topic-combine combine/concat/merge

Comments

@TomNicholas
Copy link
Contributor

What happened?

Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists).

import xarray as xr

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    coords={
        'x_center': ('x_center', [4, 5, 6]),
        'x_outer':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)

Calling xr.concat on these with dim='x_center' happily concatenates them

xr.concat([ds1, ds2], dim='x_center')
<xarray.Dataset>
Dimensions:   (x_outer: 7, x_center: 6)
Coordinates:
  * x_outer   (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5
  * x_center  (x_center) int64 1 2 3 4 5 6
Data variables:
    *empty*

but notice that the returned result has been concatenated along both x_center and x_outer.

What did you expect to happen?

I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. x_outer).

What I expected to happen was that (as by default coords='different') both variables would be attempted to be concatenated along the x_center dimension, which would have succeeded for the x_center variable but failed for the x_outer variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens:

import xarray as xr

ds1 = xr.Dataset(
    data_vars={
        'a': ('x_center', [1, 2, 3]),
        'b':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    data_vars={
        'a': ('x_center', [4, 5, 6]),
        'b':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)
xr.concat([ds1, ds2], dim='x_center', data_vars='different') 
ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4}

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I was trying to create an example for which you would need the automatic combined concat/merge that happens within xr.combine_by_coords.

Environment

xarray 2023.8.0

@TomNicholas TomNicholas added bug needs triage Issue that has not been reviewed by xarray team member topic-combine combine/concat/merge labels Sep 25, 2023
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Sep 27, 2023
@TomNicholas
Copy link
Contributor Author

A consequence of the alignment behavior described in #6806

@etienneschalk
Copy link
Contributor

The PR suggests using the proposed join='strict' kwarg.

test_concat_join_coordinate_variables_non_asked_dims tests:

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

xr.concat([ds1, ds2], dim='x_center') will still produce the same current surprising behavior, but using xr.concat([ds1, ds2], dim='x_center', join='strict') would throw an error. The issue I see here, is maybe strict would not really be a join mode, but a whole new parameter. It seems that we could want strict dimension names checks whether the join type is inner or outer etc. For now strict is really just an even more restrictive exact, adding more checks at multiple places inside of the aligner.py module.

test_concat_join_non_coordinate_variables tests:

ds1 = xr.Dataset(
    data_vars={
        'a': ('x_center', [1, 2, 3]),
        'b':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

This tests just enforce that the expected behavior happens.

@dcherian
Copy link
Contributor

Wouldn't join="exact" raise an error here?

@etienneschalk
Copy link
Contributor

Indeed join='exact' raises an error:

import xarray as xr

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    coords={
        'x_center': ('x_center', [4, 5, 6]),
        'x_outer':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)
xr.concat([ds1, ds2], dim='x_center', join='exact')
ValueError: cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'x_outer' ('x_outer',)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-combine combine/concat/merge
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants