Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to speciefy the names of coordinates from the indices which droped by DataArray.reset_index. #5874

Closed
weipeng1999 opened this issue Oct 18, 2021 · 3 comments

Comments

@weipeng1999
Copy link

weipeng1999 commented Oct 18, 2021

When I try to use some different coordinates as the index of a dim, I notice the new API on v0.9 provided by DataArray.set_index:

>>> import numpy as np
>>> import xarray as xr
>>> arr = xr.DataArray(np.r_[:4].reshape(4,4),dims=('t','x'))
>>> arr['x_str'] = "x",["a","b","c","d"]
>>> arr["x_num"] = "x",[1,2,3,4]; arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_str    (x) <U1 'a' 'b' 'c' 'd'
    x_num    (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> arr.set_index(x='x_str'); arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_num    (x) int64 1 2 3 4
  * x        (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...

That's really convenient, what a nice API.
But when I want to switch to another coordinate, I found that I can not recovery my arr to the version before I using :

>>> arr=arr.reset_index('x'); arr # why the croodinate to used as index now lose its name "x_str"? 
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_num    (x) int64 1 2 3 4
    x_       (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t, x
>>> arr=arr.set_index(x="x_num");arr # anyway, continue going to the code use coordinate "x_num" as index
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr=arr.reset_index('x');arr # now I need "x_str" coordinate as index, here we go
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_       (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> #NOOP!!! the "x_int" coordinate COVER the "x_str" coordinate, I can't access the later any more :(

To solve this problem, I get following ways:

  • Add new function to do this.
    • may be most directly way.
    • add new function to implement and new API to design.
  • stop the DataArray.set_index to change coordinate name
    • The benefit is that without new function, its the most convenient one.
    • But it may change the calling-promise of DataArray.set_index, so it may be useless.
    • Here are some example to show how it can be used:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.set_index({'x':'x_str'}) ...
  • store the coordinate name in DataArray when calling "DataArray.set_index" and recovery them when calling "DataArray.reset_index"
    • Just a bit more complex than previews one.
    • But it need to add redundant data inner DataArray, useless too.
    • Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_str'}) ...
  • let DataArray.reset_index support Mapping as names parameters, while use the keys as dims to reset indices and the value as the names of coordinates converted from those indices.
    • More complex.
    • Maybe the one cause least change, so I prefer it.
    • Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_str'}).set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_int'}).set_index({'x':'x_str'}) ...
@weipeng1999 weipeng1999 changed the title Need a way to speciefy the names of coordinates, for the indices droped by DataArray.reset_index. Need a way to speciefy the names of coordinates from the indices which droped by DataArray.reset_index. Oct 18, 2021
@shoyer
Copy link
Member

shoyer commented Oct 18, 2021

You currently need reset_index because Xarray requires that indexes match a dimension.

We plan to relax this constraint soon, as part of the ongoing "Explicit index" refactor being led by @benbovy :

The desired behavior is the possibility to set x_str as an index without changing its name.

@benbovy
Copy link
Member

benbovy commented Sep 27, 2022

The desired behavior is the possibility to set x_str as an index without changing its name.

This will be enabled by #6971, hopefully merged before the next release:

arr = arr.set_xindex("x_str").set_xindex("x_num")

arr
# <xarray.DataArray (t: 4, x: 4)>
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15]])
# Coordinates:
#   * x_str    (x) <U1 'a' 'b' 'c' 'd'
#   * x_num    (x) int64 1 2 3 4
# Dimensions without coordinates: t, x

arr.sel(x_str="a")
# <xarray.DataArray (t: 4)>
# array([ 0,  4,  8, 12])
# Coordinates:
#     x_str    <U1 'a'
#     x_num    int64 1
# Dimensions without coordinates: t

arr.sel(x_num=1)
# <xarray.DataArray (t: 4)>
# array([ 0,  4,  8, 12])
# Coordinates:
#     x_str    <U1 'a'
#     x_num    int64 1
# Dimensions without coordinates: t

@benbovy
Copy link
Member

benbovy commented Sep 28, 2022

Closed in #6971.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants