Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add append support to write functions #197

Merged
merged 5 commits into from
Jan 13, 2023
Merged

Conversation

brendan-ward
Copy link
Member

@brendan-ward brendan-ward commented Jan 5, 2023

Resolves #137

This adds basic support to append additional records to existing layers when supported by the driver.

This uses driver metadata plus a known list of drivers to exclude in order to determine which drivers support write capability (and updates listing from list_drivers()), and raises errors when attempting to write to unsupported drivers.

This adds a specific GDAL version test when appending to FlatGeobuf to avoid segfaults due to OSGeo/gdal#5739; anything prior to GDAL 3.5.1 appears to segfault.

I could not find a driver metadata key that indicates if a driver supports append capability. Rather than try to maintain a list of known drivers that support append, I think we probably just need to let GDAL raise the errors when a given driver / GDAL version does not support append.

@brendan-ward brendan-ward added this to the 0.5.0 milestone Jan 5, 2023
@brendan-ward brendan-ward added the enhancement New feature or request label Jan 5, 2023
@jorisvandenbossche
Copy link
Member

With the other PRs merged, this has some conflicts now

pyogrio/_io.pyx Outdated Show resolved Hide resolved
@brendan-ward brendan-ward marked this pull request as ready for review January 6, 2023 01:07
docs/source/introduction.md Outdated Show resolved Hide resolved
pyogrio/_ogr.pyx Outdated
def ogr_driver_supports_write(driver):
# exclude drivers known to be unsupported by pyogrio even though they are
# supported for write by GDAL
if driver in {"CSV", "XLSX"}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV actually does work if specify GEOMETRY='AS_WKT'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering (but for a different issue/PR), do we want to support writing only the attributes without geometry (like we support reading with ignoring the geometry)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - didn't know that; probably means that XLSX would work with that as well. In which case, maybe we should just pass through those drivers that are supported for write, and let the user figure out what additional options may be required if GDAL returns an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: #105 addresses writing without geometry; was not trying to fast-track that for the upcoming release though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XLSX does not support geometry, so we need to keep this exclusion for XLSX until we do #105 and allow writing to it without geometry (though probably cleaner for user to do this through Pandas to_excel(...))

pyogrio/geopandas.py Outdated Show resolved Hide resolved
@@ -254,6 +255,11 @@ def write_dataframe(
layer_options : dict, optional
Layer creation option (format specific) passed to OGR. Specify as
a key-value dictionary.
append : bool, optional (default: False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else: I like append=True/False, but note that geopandas / fiona uses mode="w"/"a" (but fiona has this because it mimics the open(..) API, and in geopandas we can easily map the keyword for compat; we currently even explicitly raise an error for mode="a" on the geopandas side in case of pyogrio engine)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that pandas supports the same modes as the open() API for things like to_csv(). So we may want to be consistent but only support a subset: write_dataframe would never support mode="r", for instance.

I'm not sure what the meaning of mode="r+" would be here, since we don't expose a stateful API, and we don't (yet) let you update existing records an existing data layer.

In which case, read_dataframe would only ever use mode="r" (so parameter is not needed), and write_dataframe could only ever use mode="w" or mode="a" - which ends up being an awful lot like our boolean append.

Would be nice not to change this API later, so I'm certainly open to using mode in order to make this more compatible with GeoPandas if that is what you'd like here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, for a direct user of pyogrio, I find append=True the nicer and easier API.

Comment on lines +11 to +12
get_gdal_version,
get_gdal_version_string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_gdal_version,
get_gdal_version_string,
get_gdal_version,

(the latter is not used here?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are used here; using the string version on line 289 for the error message

@jorisvandenbossche jorisvandenbossche merged commit cc62d2c into main Jan 13, 2023
@jorisvandenbossche jorisvandenbossche deleted the issue137 branch January 13, 2023 20:26
@jorisvandenbossche
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: support appending to existing files while writing
2 participants