Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to handle float NaN as null + add option to control it #190

Merged
merged 3 commits into from
Jan 5, 2023

Conversation

jorisvandenbossche
Copy link
Member

Closes #122

@jorisvandenbossche jorisvandenbossche added this to the 0.5.0 milestone Jan 2, 2023
Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @jorisvandenbossche !

A few minor comments to consider but otherwise this is ready to merge.

@@ -522,6 +522,72 @@ def test_read_write_null_geometry(tmp_path, ext):
assert np.array_equal(result_fields[0], field_data[0])


def test_write_float_nan_null(tmp_path):
geometry = np.array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments denoting the WKT when using geometries created from hex (I know we use the same one elsewhere but it is helpful each time).

content = f.read()
assert '{ "col": NaN }' in content

# by default, GDAL will skip the property for GeoJSON if the value is NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this before the case starting on 541 (so order is default GDAL behavior then override of that) and adding a comment that WRITE_NON_FINITE_VALUES="YES" is required to force GDAL to write that as NaN.

table = pyarrow.feather.read_table(fname)
assert table["col"].is_null().to_pylist() == [False, True]

# default nan_as_null=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# default nan_as_null=True

(comment is not correct)

[bytes.fromhex("010100000000000000000000000000000000000000")] * 2,
dtype=object,
)
field_data = [np.array([1.5, np.nan], dtype="float64")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we have it covered, would you mind parametrizing this test with both float64 and float32?

@jorisvandenbossche jorisvandenbossche merged commit da36773 into geopandas:main Jan 5, 2023
@jorisvandenbossche jorisvandenbossche deleted the nan-as-nulls branch January 5, 2023 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to handle NaNs in the input when writing? (as NaN or as Null)
2 participants