-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yield batches from ogr_read_arrow #205
Comments
Do you envision that there would be a counterpart function to write data in batches, via the Arrow I/O API (once available)? |
That seems entirely dependent on GDAL? https://gdal.org/development/rfc/rfc86_column_oriented_api.html says
I would of course love for that to be added to GDAL, and getting greater adoption of RFC 86 seems very helpful for that. #206 appeared to work on my local machine 🤷♂️. If a maintainer could enable CI on that PR that would be helpful! |
If GDAL were to add it, I think a similar API like with write_ogr_batches("file.gpkg", arrow_schema) as writer:
writer.write_batch(batch) could make sense. But for my own needs I think I'm more likely to only write to geoparquet and thus not use OGR as much for writing |
@kylebarron thanks for looking into this! That seems like a really nice idea, I didn't think about a contextmanager to keep the dataset alive. |
This was closed by #206 |
I'm playing around more with reading arrow tables from pyogrio and it's really exciting. It does feel like having some API to yield batches would be helpful to work with larger datasets. @jorisvandenbossche wrote in #155:
I've never touched ogr bindings before, but naively it seems the easiest way to do this is by using a context manager:
would that work? just putting a yield here?
The text was updated successfully, but these errors were encountered: