Skip to content

Releases: geopandas/pyogrio

Version 0.9.0

17 Jun 20:09
Compare
Choose a tag to compare

Improvements

  • Add on_invalid parameter to read_dataframe (#422).

Bug fixes

  • Fixed bug transposing longitude and latitude when writing files with
    coordinate transformation from EPSG:4326 (#421).
  • Fix bug preventing reading from file paths containing hashes in read_dataframe (#412).

Packaging

  • MacOS wheels are now only available for macOS 12+. For older unsupported macOS
    versions, pyogrio can still be built from source (requires GDAL to be installed) (#417).
  • Remove usage of deprecated distutils in setup.py (#416).

Version v0.8.0

06 May 22:08
46c35a7
Compare
Choose a tag to compare

Improvements

  • Support for writing based on Arrow as the transfer mechanism of the data
    from Python to GDAL (requires GDAL >= 3.8). This is provided through the
    new pyogrio.raw.write_arrow function, or by using the use_arrow=True
    option in pyogrio.write_dataframe (#314, #346).
  • Add support for fids filter to read_arrow and open_arrow, and to
    read_dataframe with use_arrow=True (#304).
  • Add some missing properties to read_info, including layer name, geometry name
    and FID column name (#365).
  • read_arrow and open_arrow now provide
    GeoArrow-compliant extension metadata,
    including the CRS, when using GDAL 3.8 or higher (#366).
  • The open_arrow function can now be used without a pyarrow dependency. By
    default, it will now return a stream object implementing the
    Arrow PyCapsule Protocol
    (i.e. having an __arrow_c_stream__method). This object can then be consumed
    by your Arrow implementation of choice that supports this protocol. To keep
    the previous behaviour of returning a pyarrow.RecordBatchReader, specify
    use_pyarrow=True (#349).
  • Warn when reading from a multilayer file without specifying a layer (#362).
  • Allow writing to a new in-memory datasource using io.BytesIO object (#397).

Bug fixes

  • Fix error in write_dataframe if input has a date column and
    non-consecutive index values (#325).
  • Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI
    Shapefiles using UTF-8 by default on all platforms (#361).
  • Raise exception in read_arrow or read_dataframe(..., use_arrow=True) if
    a boolean column is detected due to error in GDAL reading boolean values for
    FlatGeobuf / GPKG drivers (#335, #387); this has been fixed in GDAL >= 3.8.3.
  • Properly ignore fields not listed in columns parameter when reading from
    the data source not using the Arrow API (#391).
  • Properly handle decoding of ESRI Shapefiles with user-provided encoding
    option for read, read_dataframe, and open_arrow, and correctly encode
    Shapefile field names and text values to the user-provided encoding for
    write and write_dataframe (#384).
  • Fixed bug preventing reading from bytes or file-like in read_arrow /
    open_arrow (#407).

Packaging

  • The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.

Potentially breaking changes

  • Using a where expression combined with a list of columns that does not include
    the column referenced in the expression is not recommended and will now
    return results based on driver-dependent behavior, which may include either
    returning empty results (even if non-empty results are expected from where parameter)
    or raise an exception (#391). Previous versions of pyogrio incorrectly
    set ignored fields against the data source, allowing it to return non-empty
    results in these cases.

Version 0.7.2

30 Oct 19:11
71acde5
Compare
Choose a tag to compare

Bug fixes

  • Add packaging as a dependency (#320).
  • Fix conversion of WKB to geometries with missing values when using
    pandas.ArrowDtype (#321).

Version 0.7.1

26 Oct 23:25
97d9dee
Compare
Choose a tag to compare

Bug fixes

  • Fix unspecified dependency on packaging (#318).

Version 0.7.0

25 Oct 19:21
f0c82b6
Compare
Choose a tag to compare

Improvements

  • Support reading and writing datetimes with timezones (#253).
  • Support writing dataframes without geometry column (#267).
  • Calculate feature count by iterating over features if GDAL returns an
    unknown count for a data layer (e.g., OSM driver); this may have signficant
    performance impacts for some data sources that would otherwise return an
    unknown count (count is used in read_info, read, read_dataframe) (#271).
  • Add arrow_to_pandas_kwargs parameter to read_dataframe + reduce memory usage
    with use_arrow=True (#273)
  • In read_info, the result now also contains the total_bounds of the layer as well
    as some extra capabilities of the data source driver (#281).
  • Raise error if read or read_dataframe is called with parameters to read no
    columns, geometry, or fids (#280).
  • Automatically detect supported driver by extension for all available
    write drivers and addition of detect_write_driver (#270).
  • Addition of mask parameter to open_arrow, read, read_dataframe,
    and read_bounds functions to select only the features in the dataset that
    intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that
    intersect the bounding box of the mask when using the Arrow interface for
    some drivers; this has been fixed in GDAL 3.8.0.
  • Removed warning when no features are read from the data source (#299).
  • Add support for force_2d=True with use_arrow=True in read_dataframe (#300).

Other changes

  • test suite requires Shapely >= 2.0

  • using skip_features greater than the number of features available in a data
    layer now returns empty arrays for read and an empty DataFrame for
    read_dataframe instead of raising a ValueError (#282).

  • enabled skip_features and max_features for read_arrow and
    read_dataframe(path, use_arrow=True). Note that this incurs overhead
    because all features up to the next batch size above max_features (or size
    of data layer) will be read prior to slicing out the requested range of
    features (#282).

  • The use_arrow=True option can be enabled globally for testing using the
    PYOGRIO_USE_ARROW=1 environment variable (#296).

Bug fixes

  • Fix int32 overflow when reading int64 columns (#260)
  • Fix fid_as_index=True doesn't set fid as index using read_dataframe with
    use_arrow=True (#265)
  • Fix errors reading OSM data due to invalid feature count and incorrect
    reading of OSM layers beyond the first layer (#271)
  • Always raise an exception if there is an error when writing a data source
    (#284)

Potentially breaking changes

  • In read_info (#281):
    • the features property in the result will now be -1 if calculating the
      feature count is an expensive operation for this driver. You can force it to be
      calculated using the force_feature_count parameter.
    • for boolean values in the capabilities property, the values will now be
      booleans instead of 1 or 0.

Packaging

  • The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.

Version 0.6.0

27 Apr 08:01
Compare
Choose a tag to compare

Improvements

  • Add automatic detection of 3D geometries in write_dataframe (#223, #229)
  • Add "driver" property to read_info result (#224)
  • Add support for dataset open options to read, read_dataframe, and
    read_info (#233)
  • Add support for pandas' nullable data types in write_dataframe, or
    specifying a mask manually for missing values in write (#219)
  • Standardized 3-dimensional geometry type labels from "2.5D " to
    " Z" for consistency with well-known text (WKT) formats (#234)
  • Failure error messages from GDAL are no longer printed to stderr (they were
    already translated into Python exceptions as well) (#236).
  • Failure and warning error messages from GDAL are no longer printed to
    stderr: failures were already translated into Python exceptions
    and warning messages are now translated into Python warnings (#236, #242).
  • Add access to low-level pyarrow RecordBatchReader via
    pyogrio.raw.open_arrow, which allows iterating over batches of Arrow
    tables (#205).
  • Add support for writing dataset and layer metadata (where supported by
    driver) to write and write_dataframe, and add support for reading
    dataset and layer metadata in read_info (#237).

Packaging

  • The GDAL library included in the wheels is updated from 3.6.2 to GDAL 3.6.4.
  • Wheels are now available for Linux aarch64 / arm64.

Version 0.5.1

27 Jan 04:42
Compare
Choose a tag to compare

Bug fixes

  • Fix memory leak in reading files (#207)
  • Fix to only use transactions for writing records when supported by the
    driver (#203)

Version 0.5.0

16 Jan 20:58
Compare
Choose a tag to compare

Major enhancements

  • Support for reading based on Arrow as the transfer mechanism of the data
    from GDAL to Python (requires GDAL >= 3.6 and pyarrow to be installed).
    This can be enabled by passing use_arrow=True to pyogrio.read_dataframe
    (or by using pyogrio.raw.read_arrow directly), and provides a further
    speed-up (#155, #191).
  • Support for appending to an existing data source when supported by GDAL by
    passing append=True to pyogrio.write_dataframe (#197).

Potentially breaking changes

  • In floating point columns, NaN values are now by default written as "null"
    instead of NaN, but with an option to control this (pass nan_as_null=False
    to keep the previous behaviour) (#190).

Improvements

  • It is now possible to pass GDAL's dataset creation options in addition
    to layer creation options in pyogrio.write_dataframe (#189).
  • When specifying a subset of columns to read, unnecessary IO or parsing
    is now avoided (#195).

Packaging

  • The GDAL library included in the wheels is updated from 3.4 to GDAL 3.6.2,
    and is now built with GEOS and sqlite with rtree support enabled
    (which allows writing a spatial index for GeoPackage).
  • Wheels are now available for Python 3.11.
  • Wheels are now available for MacOS arm64.

Version 0.4.2

06 Oct 20:02
b1bbecd
Compare
Choose a tag to compare

Improvements

  • new get_gdal_data_path() utility funtion to check the path of the data
    directory detected by GDAL (#160)

Bug fixes

  • register GDAL drivers during initial import of pyogrio (#145)
  • support writing "not a time" (NaT) values in a datetime column (#146)
  • fixes an error when reading GPKG with bbox filter (#150)
  • properly raises error when invalid where clause is used on a GPKG (#150)
  • avoid duplicate count of available features (#151)

v0.4.1

25 Jul 20:12
f16009e
Compare
Choose a tag to compare
Update changes for 0.4.1