Micro optimizations to improve indexing #9002

hmaarrfk · 2024-05-05T16:05:11Z

These are some micro optimizaitons to improve indexing within xarray objects.

Generally speaking, the use of isinstance in python is pretty slow. Its pretty hard to avoid with the amount of "niceness" that xarray does.

However, I feel like these changes are "ok" and kick ups the performance of
#9001

from

h5netcdf  lazy      : 100%|███████████| 100000/100000 [00:11<00:00, 8951.20it/s]

to

h5netcdf  lazy      : 100%|███████████| 100000/100000 [00:09<00:00, 10079.44it/s]

With my "controverisal" fix for _apply_indexes_fast:

h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 14410.26it/s]

without it:

h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:07<00:00, 12895.45it/s]

a nice 40% boost!

Benchmarked on

model name      : AMD Ryzen Threadripper 2950X 16-Core Processor

its not the best processor, but it doesn't throttle!

~~I'll mark this as not draft when it is ready for review.~~ done!

Concepts used:

Fastpath added for BasicIndexer construction
Tuple extension used instead of list + tuple cast. I believe that CPython improved tuple performance somewhere around 3.9/3.10 era so this is now "faster" than the old 3.6 tricks.
LazilyIndexedArray now store the shape at creation time. Otherwise this was causing the shape to be recomputed many times during indexing operations that use operators like len or shape to broadcast the indexing tuple.
isel now avoids the _id_coord_names by avoidining xindexes.group_by_index() giving a 20% speedup.

Findings

_id_coord_names has terrible performance... I had to avoid it to get isel to speed up. It is likely that other code paths can be sped up the same way too. I just don't know how to use those code paths so i'm hesitant to go down that rabbit hole myself.

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

xref: #2799
xref: #7045

max-sixty

Looks great!

xarray/core/indexing.py

hmaarrfk · 2024-05-05T19:08:55Z

xarray/core/indexes.py

        return self.__id_coord_names

+    def _create_id_coord_names(self) -> None:


its strange that the function _create_id_coord_names gets hit all the time by the profiler. I would have thought that once the dataset is created this function would only be called once.

kmuehlbauer · 2024-05-05T19:51:31Z

@hmaarrfk Sorry Marc, moving to ready for review was by accident. Still have problems to properly use GitHub Android app.

hmaarrfk · 2024-05-05T19:53:45Z

GitHub Android app.

the github app is terrible..... i keep making similar mistakes on my phone.

hmaarrfk · 2024-05-05T22:57:40Z

Ok enough for today.

But generally speaking, I think the the main problem, large compatibility with Pandas is a known issue.

However, I think that some things are simply strange in the sense that certain isel operations just seem too slow for comfort. It may be that we can add enough fastpath variables to remove alot of the expensive validation

You can see some of the numbers that I use.

I use a 100_000 selection because it amplifies the problem and makes the profiler find the problematic hotspots.

The function _apply_indexes_fast might be controversial. I am happy to remove it and just document it hmaarrfk#29 for when i have time again.

It drops the performance on my Threadripper 2950x from 14kits/sec to 12kits/sec. still an improvement from the 9kits/sec where this pull request #9001 leaves off.

hmaarrfk · 2024-05-05T23:02:51Z

feel free to push any cleanup.s

xarray/core/indexes.py

This targets optimization for datasets with many "scalar" variables (that is variables without any dimensions). This can happen in the context where you have many pieces of small metadata that relate to various facts about an experimental condition. For example, we have about 80 of these in our datasets (and I want to incrase this number) Our datasets are quite large (On the order of 1TB uncompresed) so we often have one dimension that is in the 10's of thousands. However, it has become quite slow to index in the dataset. We therefore often "carefully slice out the matadata we need" prior to doing anything with our dataset, but that isn't quite possible with you want to orchestrate things with a parent application. These optimizations are likely "minor" but considering the results of the benchmark, I think they are quite worthwhile: * main (as of pydata#9001) - 2.5k its/s * With pydata#9002 - 4.2k its/s * With this Pull Request (on top of pydata#9002) -- 6.1k its/s Thanks for considering.

hmaarrfk · 2024-06-10T21:30:23Z

xarray/core/variable.py

@@ -2624,6 +2624,10 @@ def __init__(self, dims, data, attrs=None, encoding=None, fastpath=False):
        if self.ndim != 1:
            raise ValueError(f"{type(self).__name__} objects must be 1-dimensional")

+        # Avoid further checks if fastpath is True
+        if fastpath:
+            return


This is causing:

=============================================================================== FAILURES =============================================================================== ___________________________________________________________________ test_field_access[365_day-year] ____________________________________________________________________ data = <xarray.DataArray 'data' (lon: 10, lat: 10, time: 100)> Size: 80kB array([[[0.5488135 , 0.71518937, 0.60276338, ..., 0....222 4.444 6.667 ... 13.33 15.56 17.78 20.0 * time (time) object 800B 2000-01-01 00:00:00 ... 2000-01-05 03:00:00 field = 'year' @requires_cftime @pytest.mark.parametrize( "field", ["year", "month", "day", "hour", "dayofyear", "dayofweek"] ) def test_field_access(data, field) -> None: result = getattr(data.time.dt, field) expected = xr.DataArray( getattr(xr.coding.cftimeindex.CFTimeIndex(data.time.values), field), name=field, coords=data.time.coords, dims=data.time.dims, ) > assert_equal(result, expected) E AssertionError: Left and right DataArray objects are not equal /home/mark/git/xarray/xarray/tests/test_accessor_dt.py:442: AssertionError

xarray/core/indexes.py

hmaarrfk · 2024-06-11T02:57:09Z

With the removal of the fastpath

| Change   | Before [95c1ef72] <main>   | After [4a7411b9] <fastestpath>   |   Ratio | Benchmark (Parameter)                                         |
|----------|----------------------------|----------------------------------|---------|---------------------------------------------------------------|
| -        | 167±0.7μs                  | 149±0.9μs                        |    0.89 | indexing.Assignment.time_assignment_basic('1scalar')          |
| -        | 164±1μs                    | 146±0.4μs                        |    0.89 | indexing.Assignment.time_assignment_basic('2slicess-1scalar') |
| -        | 182±1μs                    | 162±0.9μs                        |    0.89 | indexing.Indexing.time_indexing_basic('1slice-1scalar')       |
| -        | 161±0.5μs                  | 142±0.4μs                        |    0.88 | indexing.Assignment.time_assignment_basic('1slice-1scalar')   |
| -        | 149±0.6μs                  | 129±1μs                          |    0.86 | indexing.Indexing.time_indexing_basic('1slice')               |
| -        | 250±1μs                    | 209±1μs                          |    0.84 | indexing.Indexing.time_indexing_basic('2slicess-1scalar')     |
| -        | 165±0.6μs                  | 133±1μs                          |    0.81 | indexing.Assignment.time_assignment_basic('1slice')           |
| -        | 86.9±0.1μs                 | 64.4±0.9μs                       |    0.74 | indexing.HugeAxisSmallSliceIndexing.time_indexing             |
| -        | 607±3μs                    | 326±2μs                          |    0.54 | indexing.AssignmentOptimized.time_assign_identical_indexes    |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

We get most of the speedups, We seem to lose out on '2d-1scalar'.

Honestly, I'll take it. I understand that fastpaths are not always good to add for maintainability sake.

Let me cleanup, and rerun the benchamrks again

hmaarrfk · 2024-06-11T03:21:35Z

Running the benchmarks again:

| Change   | Before [95c1ef72] <main>   | After [d526ae43] <fastestpath>   |   Ratio | Benchmark (Parameter)                                         |
|----------|----------------------------|----------------------------------|---------|---------------------------------------------------------------|
| -        | 244±0.7μs                  | 222±0.8μs                        |    0.91 | indexing.Assignment.time_assignment_outer('2d-1scalar')       |
| -        | 183±1μs                    | 163±0.8μs                        |    0.89 | indexing.Indexing.time_indexing_basic('1slice-1scalar')       |
| -        | 171±0.7μs                  | 150±0.8μs                        |    0.88 | indexing.Assignment.time_assignment_basic('1scalar')          |
| -        | 168±0.4μs                  | 147±0.3μs                        |    0.88 | indexing.Assignment.time_assignment_basic('2slicess-1scalar') |
| -        | 166±0.8μs                  | 144±0.6μs                        |    0.87 | indexing.Assignment.time_assignment_basic('1slice-1scalar')   |
| -        | 150±0.5μs                  | 131±0.4μs                        |    0.87 | indexing.Indexing.time_indexing_basic('1slice')               |
| -        | 253±1μs                    | 220±1μs                          |    0.87 | indexing.Indexing.time_indexing_basic('2slicess-1scalar')     |
| -        | 168±0.6μs                  | 138±0.5μs                        |    0.82 | indexing.Assignment.time_assignment_basic('1slice')           |
| -        | 87.2±0.5μs                 | 66.7±0.3μs                       |    0.76 | indexing.HugeAxisSmallSliceIndexing.time_indexing             |
| -        | 617±5μs                    | 331±2μs                          |    0.54 | indexing.AssignmentOptimized.time_assign_identical_indexes    |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

xarray/core/indexing.py

xarray/core/indexes.py

hmaarrfk · 2024-06-11T17:15:59Z

Thanks for the fix, i guess that was a merge conflict?

dcherian · 2024-06-11T17:21:05Z

Ah last request: can you add a whats-new entry

hmaarrfk · 2024-06-11T18:51:16Z

done.

This targets optimization for datasets with many "scalar" variables (that is variables without any dimensions). This can happen in the context where you have many pieces of small metadata that relate to various facts about an experimental condition. For example, we have about 80 of these in our datasets (and I want to incrase this number) Our datasets are quite large (On the order of 1TB uncompresed) so we often have one dimension that is in the 10's of thousands. However, it has become quite slow to index in the dataset. We therefore often "carefully slice out the matadata we need" prior to doing anything with our dataset, but that isn't quite possible with you want to orchestrate things with a parent application. These optimizations are likely "minor" but considering the results of the benchmark, I think they are quite worthwhile: * main (as of pydata#9001) - 2.5k its/s * With pydata#9002 - 4.2k its/s * With this Pull Request (on top of pydata#9002) -- 6.1k its/s Thanks for considering.

* upstream/main: [skip-ci] Try fixing hypothesis CI trigger (pydata#9112) Undo custom padding-top. (pydata#9107) add remaining core-dev citations [skip-ci][skip-rtd] (pydata#9110) Add user survey announcement to docs (pydata#9101) skip the `pandas` datetime roundtrip test with `pandas=3.0` (pydata#9104) Adds Matt Savoie to CITATION.cff (pydata#9103) [skip-ci] Fix skip-ci for hypothesis (pydata#9102) open_datatree performance improvement on NetCDF, H5, and Zarr files (pydata#9014) Migrate datatree io.py and common.py into xarray/core (pydata#9011) Micro optimizations to improve indexing (pydata#9002) (fix): don't handle time-dtypes as extension arrays in `from_dataframe` (pydata#9042)

* conda instead of mamba * Make speedups using fastpath * Change core logic to apply_indexes_fast * Always have fastpath=True in one path * Remove basicindexer fastpath=True * Duplicate a comment * Add comments * revert asv changes * Avoid fastpath=True assignment * Remove changes to basicindexer * Do not do fast fastpath for IndexVariable * Remove one unecessary change * Remove one more fastpath * Revert uneeded change to PandasIndexingAdapter * Update xarray/core/indexes.py * Update whats-new.rst * Update whats-new.rst * fix whats-new --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Deepak Cherian <deepak@cherian.net>

This targets optimization for datasets with many "scalar" variables (that is variables without any dimensions). This can happen in the context where you have many pieces of small metadata that relate to various facts about an experimental condition. For example, we have about 80 of these in our datasets (and I want to incrase this number) Our datasets are quite large (On the order of 1TB uncompresed) so we often have one dimension that is in the 10's of thousands. However, it has become quite slow to index in the dataset. We therefore often "carefully slice out the matadata we need" prior to doing anything with our dataset, but that isn't quite possible with you want to orchestrate things with a parent application. These optimizations are likely "minor" but considering the results of the benchmark, I think they are quite worthwhile: * main (as of pydata#9001) - 2.5k its/s * With pydata#9002 - 4.2k its/s * With this Pull Request (on top of pydata#9002) -- 6.1k its/s Thanks for considering.

max-sixty approved these changes May 5, 2024

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

xarray/core/indexing.py Outdated Show resolved Hide resolved

xarray/core/indexing.py Outdated Show resolved Hide resolved

hmaarrfk force-pushed the fastestpath branch 4 times, most recently from d975fba to c2a065d Compare May 5, 2024 18:39

hmaarrfk commented May 5, 2024

View reviewed changes

hmaarrfk force-pushed the fastestpath branch 2 times, most recently from 8c49aba to 2b5e936 Compare May 5, 2024 19:27

kmuehlbauer marked this pull request as ready for review May 5, 2024 19:49

hmaarrfk marked this pull request as draft May 5, 2024 19:52

hmaarrfk force-pushed the fastestpath branch 8 times, most recently from 5d4e9b2 to f958953 Compare May 5, 2024 22:48

hmaarrfk force-pushed the fastestpath branch 4 times, most recently from 2c202ff to 1a659bc Compare May 5, 2024 23:01

hmaarrfk marked this pull request as ready for review May 5, 2024 23:01

hmaarrfk commented May 5, 2024

View reviewed changes

xarray/core/indexes.py Show resolved Hide resolved

xarray/core/indexes.py Show resolved Hide resolved

hmaarrfk mentioned this pull request May 6, 2024

Micro optimize dataset.isel for speed on large datasets #9003

Open

4 tasks

hmaarrfk commented Jun 10, 2024

View reviewed changes

hmaarrfk added 2 commits June 10, 2024 17:30

Do not do fast fastpath for IndexVariable

5bd48fd

Remove one unecessary change

4a7411b

hmaarrfk commented Jun 11, 2024

View reviewed changes

xarray/core/indexes.py Outdated Show resolved Hide resolved

Remove one more fastpath

d526ae4

dcherian reviewed Jun 11, 2024

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

Revert uneeded change to PandasIndexingAdapter

f442532

dcherian reviewed Jun 11, 2024

View reviewed changes

xarray/core/indexes.py Outdated Show resolved Hide resolved

dcherian added 2 commits June 11, 2024 09:42

Update xarray/core/indexes.py

904de4f

Merge branch 'main' into fastestpath

f10c6f4

hmaarrfk added 2 commits June 11, 2024 14:50

Update whats-new.rst

57c5bd2

Update whats-new.rst

4edd8cc

hmaarrfk and others added 2 commits June 11, 2024 15:58

Merge branch 'main' into fastestpath

fefb3dc

fix whats-new

7f7b691

dcherian added the plan to merge Final call for comments label Jun 11, 2024

hmaarrfk closed this Jun 11, 2024

hmaarrfk reopened this Jun 11, 2024

dcherian merged commit d9e4de6 into pydata:main Jun 11, 2024
40 of 47 checks passed

spencerkclark mentioned this pull request Jun 19, 2024

Can't select time with str after converting the calendar to noleap #9138

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro optimizations to improve indexing #9002

Micro optimizations to improve indexing #9002

hmaarrfk commented May 5, 2024 •

edited

Loading

max-sixty left a comment

hmaarrfk May 5, 2024

kmuehlbauer commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk Jun 10, 2024

hmaarrfk commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

dcherian commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

		return self.__id_coord_names

		def _create_id_coord_names(self) -> None:

Micro optimizations to improve indexing #9002

Micro optimizations to improve indexing #9002

Conversation

hmaarrfk commented May 5, 2024 • edited Loading

max-sixty left a comment

Choose a reason for hiding this comment

hmaarrfk May 5, 2024

Choose a reason for hiding this comment

kmuehlbauer commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk commented May 5, 2024

hmaarrfk Jun 10, 2024

Choose a reason for hiding this comment

hmaarrfk commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

dcherian commented Jun 11, 2024

hmaarrfk commented Jun 11, 2024

hmaarrfk commented May 5, 2024 •

edited

Loading