Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix write of kml lon/lat transpose #421

Merged
merged 21 commits into from
Jun 14, 2024

Conversation

nicholas-ys-tan
Copy link
Contributor

@nicholas-ys-tan nicholas-ys-tan commented Jun 9, 2024

Closes #420

My cython is atrocious.

Followed what was done in fiona, qgis and gdal documentation to find this is probably what needs to be done. But I am probably doing it too broadly and may have unintended consequences in other filetypes.

Still need to write tests.

Need to investigate how this may impact other file types.

@nicholas-ys-tan nicholas-ys-tan marked this pull request as ready for review June 10, 2024 07:25
Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @nicholas-ys-tan !

Given that Fiona is setting traditional order after instantiating the CRS object, it seems reasonable for us to do the same, though I'm not aware of what the negative side effects of doing so would be.

A few comments below.

Would you mind adding a test case for GeoJSON written with RFC7946 as described here

Something like this should work:

def test_write_geojson_rfc7946_coordinates(tmp_path, use_arrow=use_arrow):
    points = [Point(10, 20), Point(30, 40), Point(50, 60)]
    gdf = gp.GeoDataFrame(geometry=points, crs="EPSG:4326")
    output_path = tmp_path / "test.geojson"
    write_dataframe(
        gdf,
        output_path,
        layer="tmp_layer",
        driver="GeoJSON",
        RFC7946=True,
        use_arrow=use_arrow,
    )

    gdf_in = read_dataframe(output_path, use_arrow=use_arrow)

    assert np.array_equal(gdf_in.geometry.values, points)

pyogrio/_io.pyx Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)

nicholas-ys-tan and others added 8 commits June 11, 2024 22:21
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
@nicholas-ys-tan
Copy link
Contributor Author

Thanks @brendan-ward ,

Have added your changes and test, will just need to do this one tomorrow:

We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)

Copy link
Member

@theroggy theroggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small detail: can you add an entry to CHANGES.md for you PR.

pyogrio/_io.pyx Outdated Show resolved Hide resolved
Co-authored-by: Pieter Roggemans <pieter.roggemans@gmail.com>
Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to mention that the tests needed @pytest.mark.requires_arrow_write_api since they are writing using the Arrow API

This should fix the test failures on CI.

pyogrio/tests/test_geopandas_io.py Show resolved Hide resolved
pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved
nicholas-ys-tan and others added 2 commits June 13, 2024 21:33
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
@nicholas-ys-tan
Copy link
Contributor Author

I am currently investigating this:

We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)

There seems to be an issue unrelated to this ticket - I was testing the appending to a .kml file, and it introduces a Z-dimension. I am not sure if it is known behaviour but I couldn't find an existing ticket about it. Interestingly, it has nothing to do with append but happens every time a .kml file is over-written.

points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file.kml'
write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)

gdf_read = read_dataframe(output_path, use_arrow=True)
print(gdf_read.geometry)
0    POINT (10 20)
1    POINT (30 40)
2    POINT (50 60)
Name: geometry, dtype: geometry
points_append = [Point(70, 80), Point(90, 100), Point(110, 120)]
gdf_append = gpd.GeoDataFrame(geometry=points_append, crs="EPSG:4326")
write_dataframe(
    gdf_append, output_path, layer='tmp_layer', driver="KML", use_arrow=True, append=True
)
gdf_read_appended = read_dataframe(output_path, use_arrow=True)

print(gdf_read_appended.geometry)
0      POINT Z (10 20 0)
1      POINT Z (30 40 0)
2      POINT Z (50 60 0)
3      POINT Z (70 80 0)
4     POINT Z (90 100 0)
5    POINT Z (110 120 0)
Name: geometry, dtype: geometry

This doesn't seem to be related to the append feature, but to over-writing a kml file.

points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file2.kml'
write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)

write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)

print(gdf_read.geometry)
0    POINT Z (10 20 0)
1    POINT Z (30 40 0)
2    POINT Z (50 60 0)
Name: geometry, dtype: geometry

On looking into the KML, can confirm it gets restructured

Before:

<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Folder><name>tmp_layer</name>
  <Placemark>
      <Point><coordinates>10,20</coordinates></Point>
  </Placemark>
  <Placemark>
      <Point><coordinates>30,40</coordinates></Point>
  </Placemark>
  <Placemark>
      <Point><coordinates>50,60</coordinates></Point>
  </Placemark>
</Folder>
</Document></kml>

After over-writing with same data:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document id="root_doc">
    <Document id="tmp_layer">
      <name>tmp_layer</name>
      <Placemark id="tmp_layer.1">
        <Point>
          <coordinates>
            10,20,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark id="tmp_layer.2">
        <Point>
          <coordinates>
            30,40,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark id="tmp_layer.3">
        <Point>
          <coordinates>
            50,60,0
          </coordinates>
        </Point>
      </Placemark>
    </Document>
  </Document>
</kml>

This happens in the main branch too, but with the reverse co-ordinates.

I assume this is not something to be addressed within this ticket. I haven't yet confirmed if this is an upstream bug but I am currently assuming it is. As a work-around, I've modified the test for the appended KML file to only compare the xy-coordinates by reading with force_2d=True.

If this isn't already a known issue, I'll do some digging as to why this is happening.

@nicholas-ys-tan
Copy link
Contributor Author

the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.

@theroggy
Copy link
Member

theroggy commented Jun 13, 2024

the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.

Not sure why this leads to appending not working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.

@nicholas-ys-tan
Copy link
Contributor Author

nicholas-ys-tan commented Jun 13, 2024

Not sure why this leads to appending nog working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.

Oh that is interesting, one thing I did note but neglected to mention in my above text about the Z-dimension is that when I run write_dataframe with LIBKML, it always writes out the Z-dimension even on first write.

This seems to suggest when there is an existing KML file, LIBKML is used to over-write the KML file, hence why the Z-dimension is introduced, and why it is failing in ubuntu-small.

points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file7.kml'
write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="LIBKML", use_arrow=True
)
gdf_read = read_dataframe(output_path, use_arrow=True)

print(gdf_read.geometry)
0    POINT Z (10 20 0)
1    POINT Z (30 40 0)
2    POINT Z (50 60 0)
Name: geometry, dtype: geometry

and outputs the same KML format as the over-written file from my earlier comment:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document id="root_doc">
    <Document id="tmp_layer">
      <name>tmp_layer</name>
      <Placemark id="tmp_layer.1">
        <Point>
          <coordinates>
            10,20,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark id="tmp_layer.2">
        <Point>
          <coordinates>
            30,40,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark id="tmp_layer.3">
        <Point>
          <coordinates>
            50,60,0
          </coordinates>
        </Point>
      </Placemark>
    </Document>
  </Document>
</kml>

I'll investigate further to confirm tomorrow morning.

@brendan-ward
Copy link
Member

It looks like always adding the Z value when using LIBKML is an artifact of that particular driver and also possibly because Z value is expected by the KML spec? Whereas the KML driver does not automatically add Z values.

Can you log a separate issue that appending a KML file also adds Z value when using the KML driver? Then we can look into that separately from this PR. I'm wondering if in the case of append GDAL switches to using the LIBKML driver if available.

CHANGES.md Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.

Not sure why this leads to appending not working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.

The KML driver might just not support appending? In that case you will have to skip the test (or the second part of the test) when LIBKML driver is not available (you can use something like "LIBKML" not in pyogrio.list_drivers())

BTW, the error message we give is a bit confusing. We are adding the suggestion "It might help to specify the correct driver explicitly by prefixing the file path with ':', e.g. 'CSV:path'", which is strange because when writing, the user already explicitly mentions the driver (which in practice is not used when appending I think, but that's not really clear for the user)

@nicholas-ys-tan
Copy link
Contributor Author

The KML driver might just not support appending? In that case you will have to skip the test (or the second part of the test) when LIBKML driver is not available (you can use something like "LIBKML" not in pyogrio.list_drivers())

Thanks @jorisvandenbossche , I think it's not just appending but also over-writing. It seems to use LIBKML if over-writing an existing file regardless of append=True which is quite interesting.

Thanks for the pointer, I was looking for how pyogrio is able to check what drivers are available. Will push the fixes shortly.

points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file2.kml'
write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)

write_dataframe(
    gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)

print(gdf_read.geometry)
0    POINT Z (10 20 0)
1    POINT Z (30 40 0)
2    POINT Z (50 60 0)
Name: geometry, dtype: geometry

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche
Copy link
Member

I think it's not just appending but also over-writing. It seems to use LIBKML if over-writing an existing file regardless of append=True which is quite interesting.

Yes, but this test it's the append case that matters (so that's the reason I mentioned it explicitly).
It seems that the KML driver essentially doesn't allow to open a file in Update mode (although I don't directly see anything in the GDAL code to support that), and when you are appending or writing to an existing file, in both cases our code in ogr_open will call GDALOpenEx with the GDAL_OF_UPDATE flag (when reading, we will call that with GDAL_OF_READONLY flag, that seems to be the main difference). Only in case of overwriting, we will then delete the layer if we could open the file and the layer exists, while in case of append=True we don't do that.

@nicholas-ys-tan
Copy link
Contributor Author

Yes, but this test it's the append case that matters (so that's the reason I mentioned it explicitly). It seems that the KML driver essentially doesn't allow to open a file in Update mode (although I don't directly see anything in the GDAL code to support that), and when you are appending or writing to an existing file, in both cases our code in ogr_open will call GDALOpenEx with the GDAL_OF_UPDATE flag (when reading, we will call that with GDAL_OF_READONLY flag, that seems to be the main difference). Only in case of overwriting, we will then delete the layer if we could open the file and the layer exists, while in case of append=True we don't do that.

Thanks Joris, that all makes sense - I'll include this information in a separate ticket to document.

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicholas-ys-tan !

@brendan-ward brendan-ward merged commit a9b26d3 into geopandas:main Jun 14, 2024
20 checks passed
@jorisvandenbossche
Copy link
Member

With this fix in, shall we do a 0.8.1 release? (this seems like an important fix to get out for when geopandas switches to use pyogrio by default)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: long, lat geometry swapped ! geopandas.to_file( "out.kml", driver="KML",engine='pyogrio' )
4 participants