ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

jorisvandenbossche · 2022-12-29T21:08:38Z

Closes #177

This PR adds a minimal implementation of querying metadata items (based on Toblerity/Fiona#950), not yet publicly exposed (we should do that as well at some point), but just enough to use internally.

We then use the metadata to check the dataset and layer creation options for the specific driver, so we can split the **kwargs in separate layer and dataset creation kwargs.

…layer creation options)

jorisvandenbossche · 2023-01-02T08:07:05Z

pyogrio/raw.py

+            elif k in layer_option_names:
+                layer_kwargs[k] = v
+            else:
+                raise ValueError(f"unrecognized option '{k}' for driver '{driver}'")


Are we OK with raising an error here in case an option is not supported for the driver in question?

At the moment, we pass it through and the the GDAL warning message gets printed (so in effect we ignore the option).

For example:

In [18]: df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres")) In [19]: pyogrio.write_dataframe(df, "test.geojson", spatial_index=False) Warning 6: dataset test.geojson does not support layer creation option SPATIAL_INDEX

But with this change, we actually raise an error.

Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.

OK, also added a test for this

brendan-ward

Thanks for working on this @jorisvandenbossche ! A few minor comments to consider, but otherwise this is ready to merge.

We should add an example to docs/introduction.md showcasing these options (added #193 to track that) in a separate PR.

brendan-ward · 2023-01-04T22:39:17Z

pyogrio/raw.py

+    if xml:
+        root = ET.fromstring(xml)
+        for option in root.iter("Option"):
+            if option.attrib.get("scope", "vector") == "raster":


Suggestion:

if option.attrib.get("scope", "vector") != "raster": options.append(option.attrib["name"])

brendan-ward · 2023-01-04T22:40:14Z

pyogrio/raw.py

+    to `SPATIAL_INDEX="YES"`.
+    """
+    if not isinstance(options, dict):
+        raise TypeError(f"Expected a dict as options, got {type(options)}")


Suggested change

raise TypeError(f"Expected a dict as options, got {type(options)}")

raise TypeError(f"Expected options to be a dict, got {type(options)}")

brendan-ward · 2023-01-04T22:49:16Z

pyogrio/raw.py

+            elif k in layer_option_names:
+                layer_kwargs[k] = v
+            else:
+                raise ValueError(f"unrecognized option '{k}' for driver '{driver}'")


Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.

brendan-ward · 2023-01-04T22:57:42Z

pyogrio/tests/test_geopandas_io.py

@@ -471,21 +471,64 @@ def test_write_read_empty_dataframe_unsupported(tmp_path, ext):
        _ = read_dataframe(filename)


-def test_write_dataframe_gdalparams(tmp_path, naturalearth_lowres):
-    original_df = read_dataframe(naturalearth_lowres)
+def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres):


This test could probably be parametrized instead, perhaps something like (untested)

@pytest.mark.parametrize("spatial_index", [False, True]) def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres, spatial_index): df = read_dataframe(naturalearth_lowres) outfilename1 = tmp_path / "test1.shp" write_dataframe(df, outfilename1, SPATIAL_INDEX="YES" if spatial_index else "NO") assert outfilename.exists() is True index_filename1 = tmp_path / "test1.qix" assert index_filename1.exists() is spatial_index # using explicit layer_options instead outfilename2 = tmp_path / "test2.shp" write_dataframe(df, outfilename2, layer_options=dict(spatial_index=spatial_index)) assert outfilename.exists() is True index_filename2 = tmp_path / "test2.qix" assert test_noindex_index_filename.exists() is spatial_index

Yes, that looks better!

jorisvandenbossche · 2023-01-05T07:52:33Z

@brendan-ward thanks for the review!

ENH: support dataset creation options (use metadata to split dataset/…

945734e

…layer creation options)

jorisvandenbossche mentioned this pull request Dec 30, 2022

Unable to set dataset creations exporting a data frame to GeoPackage #177

Closed

explict dataset_options and layer_options + add test

539612b

jorisvandenbossche marked this pull request as ready for review December 31, 2022 08:10

properly clean up options

87551ec

jorisvandenbossche mentioned this pull request Dec 31, 2022

Expose GDAL driver metadata #103

Open

jorisvandenbossche commented Jan 2, 2023

View reviewed changes

jorisvandenbossche mentioned this pull request Jan 2, 2023

Release version 0.5.0 #192

Closed

jorisvandenbossche added this to the 0.5.0 milestone Jan 2, 2023

brendan-ward mentioned this pull request Jan 4, 2023

DOC: add example of dataset / layer creation options to Introduction #193

Closed

brendan-ward reviewed Jan 4, 2023

View reviewed changes

brendan-ward mentioned this pull request Jan 5, 2023

DOC: Add examples of dataset & layer creation options #194

Merged

jorisvandenbossche added 2 commits January 5, 2023 08:27

Merge remote-tracking branch 'upstream/main' into metadata

6708eeb

address feedback

2f96624

jorisvandenbossche merged commit 8e1a32c into geopandas:main Jan 5, 2023

jorisvandenbossche deleted the metadata branch January 5, 2023 07:52

jorisvandenbossche mentioned this pull request Feb 13, 2023

ENH: Auto-detect driver from filename extension when writing #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

jorisvandenbossche commented Dec 29, 2022

jorisvandenbossche Jan 2, 2023

brendan-ward Jan 4, 2023

jorisvandenbossche Jan 5, 2023

brendan-ward left a comment

brendan-ward Jan 4, 2023

brendan-ward Jan 4, 2023

brendan-ward Jan 4, 2023

brendan-ward Jan 4, 2023

jorisvandenbossche Jan 5, 2023

jorisvandenbossche commented Jan 5, 2023

	raise TypeError(f"Expected a dict as options, got {type(options)}")
	raise TypeError(f"Expected options to be a dict, got {type(options)}")

ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189

Conversation

jorisvandenbossche commented Dec 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendan-ward left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jan 5, 2023