Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stores are no longer MutableMappings #2023

Open
Tracked by #182
ghidalgo3 opened this issue Jul 10, 2024 · 3 comments
Open
Tracked by #182

Stores are no longer MutableMappings #2023

ghidalgo3 opened this issue Jul 10, 2024 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@ghidalgo3
Copy link

Zarr version

3.0.0a0

Numcodecs version

0.12.1

Python Version

3.10.12

Operating System

Linux

Installation

pip install zarr==3.0.0a0

Description

This is a downstream issue for Kerchunk.

In ZarrV2, BaseStore inherited from MutableMapping which brought with it the ability to copy() and call items() on a BaseStore, which Kerchunk depended on here.

In ZarrV3, neither StorePath nor Store are copy()-able, and of course __iter__, __setitem__, and __getitem__ are not available to make it anything like a MutableMapping.

@martindurant do you think this is something Kerchunk should handle or should ZarrV3 preserve the behavior of stores?

Steps to reproduce

This is the definition of the function Kerchunk is calling:

from zarr.store.core import MemoryStore, StorePath

def _encode_for_JSON(store):
    """Make store JSON encodable"""
    for k, v in store.copy().items():
        if isinstance(v, list):
            continue
        else:
            try:
                # minify JSON
                v = ujson.dumps(ujson.loads(v))
            except (ValueError, TypeError):
                pass
            try:
                store[k] = v.decode() if isinstance(v, bytes) else v
            except UnicodeDecodeError:
                store[k] = "base64:" + base64.b64encode(v).decode()
    return store

_encode_for_JSON(StorePath(MemoryStore(mode="w"))) # Will throw

Additional output

No response

@ghidalgo3 ghidalgo3 added the bug Potential issues with the zarr-python library label Jul 10, 2024
@martindurant
Copy link
Member

Kerchunk doesn't need an actual subclass of MutableMapping, it could have a simple method something like

def to_dict_with_copy(self):
    return {k: self.get(k) for k in self.list()}

(but now we need to worry about calling into async code and awaiting the coroutines)

However, when we know it's a reference-based store, we can get the underlying FS and reference set anyway.

There will be more things than just this broken for zarr v3, since in a number of places, kerchunk assumes the .zarray/.zattrs layout of v2, so careful thought will have to go into making it work for both versions.

@martindurant
Copy link
Member

(note that kerchunk should be happily able to produce references with zarr v2, which can then be used by an environment with v3)

@d-v-b
Copy link
Contributor

d-v-b commented Jul 10, 2024

should ZarrV3 preserve the behavior of stores?

We moved stores away from implementing MutableMapping deliberately, so I think this particular aspect of stores won't change. But I could imagine a mutable mapping API exposed an attribute of a store. But I think we would need some use case internal to zarr python for this to make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants