Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set dataset creations exporting a data frame to GeoPackage #177

Closed
felnne opened this issue Nov 14, 2022 · 6 comments · Fixed by #189
Closed

Unable to set dataset creations exporting a data frame to GeoPackage #177

felnne opened this issue Nov 14, 2022 · 6 comments · Fixed by #189
Milestone

Comments

@felnne
Copy link

felnne commented Nov 14, 2022

Expansion of comment in #71 (comment).

I am trying to export a dataframe to a GeoPackage but with two dataset creation options set, specifically VERSION and ADD_GPKG_OGR_CONTENTS documented in https://gdal.org/drivers/vector/gpkg.html#dataset-creation-options.

The code I'm using:

from pathlib import Path

from pyogrio import read_dataframe, write_dataframe

geojson_path = Path('input.geojson')
output_path = Path('output.gpkg')

write_dataframe(read_dataframe(geojson_path), path=str(output_path), VERSION=1.3, ADD_GPKG_OGR_CONTENTS='NO')

Running this code gives the following:

Warning 6: dataset output.gpkg does not support layer creation option VERSION
Warning 6: dataset output.gpkg does not support layer creation option ADD_GPKG_OGR_CONTENTS

From this I assume all **kwargs are being set as layer creation options rather than dataset creation options but I'm unsure how to set these (without using another tool).

Python version: 3.9.1
pyogrio version: 0.4.2

$ gdalinfo --version                                                                                                                                                                          15:17:34
GDAL 3.5.2, released 2022/09/02

Happy to provide any other information and apologies if I've missed anything useful to debug this.

@felnne
Copy link
Author

felnne commented Nov 14, 2022

To sanity check the ADD_GPKG_OGR_CONTENTS option isn't being applied I checked whether the generated GPKG had a gpkg_ogr_contents table (I expected it not to):

$ sqlite3 output.gpkg
SQLite version 3.39.4 2022-09-29 15:55:41
Enter ".help" for usage hints.
sqlite> .tables  
gpkg_ogr_contents
gpkg_contents
gpkg_spatial_ref_sys
gpkg_extensions
gpkg_tile_matrix
gpkg_geometry_columns
gpkg_tile_matrix_set
input

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 14, 2022

Yes, so currently we pass options (**kwargs) only to the layer creation step (GDALDatasetCreateLayer):

pyogrio/pyogrio/_io.pyx

Lines 1275 to 1287 in bdd7bf4

# Setup other layer creation options
for k, v in kwargs.items():
if v is None:
continue
k = k.upper().encode('UTF-8')
if isinstance(v, bool):
v = ('ON' if v else 'OFF').encode('utf-8')
else:
v = str(v).encode('utf-8')
options = CSLAddNameValue(options, <const char *>k, <const char *>v)

And so passing dataset creation options is right now not possible (and I am not aware of any workaround ..).

But this is something we should certainly solve.

One option is to have an explicit dataset_options vs layer_options keywords that take a dict (as mentioned in #71 (comment)), or at least have this for the dataset options, and leave generic kwargs for the layer creation options.

Another option could be to split the user-passed kwargs automatically into dataset and layer creation options, using the driver metadata (cfr #103).

A third option to pass the kwargs to both dataset and layer creation doesn't seem desirable, since that causes warnings.

@martinfleis
Copy link
Member

Another option could be to split the user-passed kwargs automatically into dataset and layer creation options

Are they always exclusive to dataset or layer?

@jorisvandenbossche
Copy link
Member

Another option could be to split the user-passed kwargs automatically into dataset and layer creation options

Are they always exclusive to dataset or layer?

Using the code in #189, checking this:

In [17]: for driver in pyogrio.list_drivers().keys():
    ...:     dataset_options = pyogrio.raw._parse_options_names(pyogrio.raw._get_driver_metadata_item(driver, "DMD_CREATIONOPTIONLIST"))
    ...:     layer_options = pyogrio.raw._parse_options_names(pyogrio.raw._get_driver_metadata_item(driver, "DS_LAYER_CREATIONOPTIONLIST"))
    ...:     common_options = set(dataset_options).intersection(set(layer_options))
    ...:     print(f"{driver}: {list(common_options) if common_options else '-'}")
    ...: 
    ...: 
ESRIC: -
FITS: -
PCIDSK: -
netCDF: -
PDS4: -
VICAR: -
JP2OpenJPEG: -
PDF: -
MBTiles: ['MINZOOM', 'DESCRIPTION', 'NAME', 'MAXZOOM']
BAG: -
EEDA: -
OGCAPI: -
ESRI Shapefile: -
MapInfo File: ['ENCODING']
UK .NTF: -
LVBAG: -
OGR_SDTS: -
S57: -
DGN: -
OGR_VRT: -
REC: -
Memory: -
CSV: ['GEOMETRY']
NAS: -
GML: -
GPX: -
LIBKML: ['LISTSTYLE_ICON_HREF', 'VISIBILITY', 'OPEN', 'SNIPPET', 'DESCRIPTION', 'LISTSTYLE_TYPE', 'NAME']
KML: -
GeoJSON: -
GeoJSONSeq: -
ESRIJSON: -
TopoJSON: -
Interlis 1: -
Interlis 2: -
OGR_GMT: -
GPKG: -
SQLite: -
OGR_DODS: -
WAsP: -
PostgreSQL: -
OpenFileGDB: -
DXF: -
CAD: -
FlatGeobuf: -
Geoconcept: -
GeoRSS: -
GPSTrackMaker: -
VFK: -
PGDUMP: -
OSM: -
GPSBabel: -
OGR_PDS: -
WFS: -
OAPIF: -
EDIGEO: -
SVG: -
CouchDB: -
Cloudant: -
Idrisi: -
ARCGEN: -
XLS: -
ODS: -
XLSX: -
Elasticsearch: -
Carto: -
AmigoCloud: -
SXF: -
Selafin: ['DATE']
JML: -
PLSCENES: -
CSW: -
VDV: -
GMLAS: -
MVT: ['MINZOOM', 'DESCRIPTION', 'NAME', 'MAXZOOM']
NGW: ['KEY', 'DESCRIPTION']
MapML: -
TIGER: -
AVCBin: -
AVCE00: -
HTTP: -

So there are some drivers where an option name can be passed both to dataset creation and layer creation .. (although for the most common ones it is not the case, so automatically splitting might still be convenient for those).
(I didn't check the specific cases, for some it might also give no difference in practice if it is passed as dataset or layer option)

We could also have both: explicit dataset_options and layer_options keyword where you need to pass a dict, but still allow **kwargs as well that is passed automatically to either of those. Then we give the convenience for most cases, but give the explicit option when needed.

@martinfleis
Copy link
Member

We could also have both: explicit dataset_options and layer_options keyword where you need to pass a dict, but still allow **kwargs as well that is passed automatically to either of those. Then we give the convenience for most cases, but give the explicit option when needed.

That is not a bad solution. +1

@jorisvandenbossche
Copy link
Member

OK, I updated #189 to take that route

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants