Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): Support for pandas ExtensionArray #8723

Merged
merged 101 commits into from
Apr 18, 2024

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Feb 8, 2024

Some outstanding points/decisions brought up by this PR:

  • Confirm type promotion rules and write them out. As it stands now, if everything is of the same extension array type, it is passed onwards and otherwise is converted to numpy. (related: Avoid coercing to numpy in as_shared_dtypes #8714)
    - [ ] Acceptance of plum as a dispatch method. Without it, the behavior should be fallen back on from before (cast to numpy types). I am a big fan of dispatching and think it could serve as a model going forward for making support of other data types/arrays more feasible. The other option, I think, would be to just use the underlying array of the ExtensionDuckArray class to decide and then have some central registry that serves as the basis for a decorator (like the api for accessors via _CachedAccessor). That being said, the current defaults are quite good so this is a marginal feature, in all likelihood.
  • Do we allow just pandas ExtensionArray directly or can we also allow Series?

Possible missing something else! Let me know!

Checklist:

Copy link

welcome bot commented Feb 8, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@TomNicholas
Copy link
Contributor

This looks like a big feature addition!

I'm very ignorant of ExtensionArrays - but is it possible to imagine a design where ExtensionDuckArray conforms well enough to the duck array structure that xarray expects that ExtensionDuckArray could live entirely outside of xarray? I'm also curious why we need isinstance(data, ExtensionDuckArray rather than calling our already-existing is_duck_array(data).

@TomNicholas TomNicholas added the topic-arrays related to flexible array support label Feb 8, 2024
@ilan-gold
Copy link
Contributor Author

What's the status here?

xarray/tests/__init__.py Outdated Show resolved Hide resolved
@dcherian dcherian added the plan to merge Final call for comments label Apr 13, 2024
@dcherian
Copy link
Contributor

Sorry, this fell off my radar. Can you open an issue regarding the to_dataframe problem?

@dcherian dcherian enabled auto-merge (squash) April 16, 2024 15:10
@ilan-gold ilan-gold disabled auto-merge April 17, 2024 08:48
@dcherian dcherian merged commit 9eb180b into pydata:main Apr 18, 2024
31 checks passed
Copy link

welcome bot commented Apr 18, 2024

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

dcherian added a commit to djhoese/xarray that referenced this pull request Apr 18, 2024
* main:
  (feat): Support for `pandas` `ExtensionArray` (pydata#8723)
  Migrate datatree mapping.py (pydata#8948)
  Add mypy to dev dependencies (pydata#8947)
  Convert 360_day calendars by choosing random dates to drop or add (pydata#8603)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-arrays related to flexible array support
Projects
Development

Successfully merging this pull request may close these issues.

Categorical Array Support for pandas Extension Arrays
6 participants