Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

Merged
merged 5 commits into from
Jan 5, 2023

Conversation

jorisvandenbossche
Copy link
Member

Closes #177

This PR adds a minimal implementation of querying metadata items (based on Toblerity/Fiona#950), not yet publicly exposed (we should do that as well at some point), but just enough to use internally.

We then use the metadata to check the dataset and layer creation options for the specific driver, so we can split the **kwargs in separate layer and dataset creation kwargs.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review December 31, 2022 08:10
elif k in layer_option_names:
layer_kwargs[k] = v
else:
raise ValueError(f"unrecognized option '{k}' for driver '{driver}'")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK with raising an error here in case an option is not supported for the driver in question?

At the moment, we pass it through and the the GDAL warning message gets printed (so in effect we ignore the option).

For example:

In [18]: df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))

In [19]: pyogrio.write_dataframe(df, "test.geojson", spatial_index=False)
Warning 6: dataset test.geojson does not support layer creation option SPATIAL_INDEX

But with this change, we actually raise an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, also added a test for this

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @jorisvandenbossche ! A few minor comments to consider, but otherwise this is ready to merge.

We should add an example to docs/introduction.md showcasing these options (added #193 to track that) in a separate PR.

pyogrio/raw.py Outdated
if xml:
root = ET.fromstring(xml)
for option in root.iter("Option"):
if option.attrib.get("scope", "vector") == "raster":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

if option.attrib.get("scope", "vector") != "raster":
    options.append(option.attrib["name"])

pyogrio/raw.py Outdated
to `SPATIAL_INDEX="YES"`.
"""
if not isinstance(options, dict):
raise TypeError(f"Expected a dict as options, got {type(options)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise TypeError(f"Expected a dict as options, got {type(options)}")
raise TypeError(f"Expected options to be a dict, got {type(options)}")

elif k in layer_option_names:
layer_kwargs[k] = v
else:
raise ValueError(f"unrecognized option '{k}' for driver '{driver}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.

@@ -471,21 +471,64 @@ def test_write_read_empty_dataframe_unsupported(tmp_path, ext):
_ = read_dataframe(filename)


def test_write_dataframe_gdalparams(tmp_path, naturalearth_lowres):
original_df = read_dataframe(naturalearth_lowres)
def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could probably be parametrized instead, perhaps something like (untested)

@pytest.mark.parametrize("spatial_index", [False, True])
def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres, spatial_index):
    df = read_dataframe(naturalearth_lowres)

    outfilename1 = tmp_path / "test1.shp"
    write_dataframe(df, outfilename1, SPATIAL_INDEX="YES" if spatial_index else "NO")
    assert outfilename.exists() is True

    index_filename1 = tmp_path / "test1.qix"
    assert index_filename1.exists() is spatial_index

    # using explicit layer_options instead
    outfilename2 = tmp_path / "test2.shp"
    write_dataframe(df, outfilename2, layer_options=dict(spatial_index=spatial_index))
    assert outfilename.exists() is True
    index_filename2 = tmp_path / "test2.qix"
    assert test_noindex_index_filename.exists() is spatial_index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks better!

@jorisvandenbossche
Copy link
Member Author

@brendan-ward thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to set dataset creations exporting a data frame to GeoPackage
2 participants