FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set #384

brendan-ward · 2024-04-05T02:44:29Z

As described in #380, GDAL attempts to auto-detect the native encoding of a shapefile and will automatically use that to decode to UTF-8 before returning strings to us. This runs into conflicts where the user provides encoding because we then apply the user's encoding to UTF-8 text, which produces incorrect text.

This sets SHAPE_ENCODING="" (GDAL config option) to disable auto-decoding before opening the shapefile. Setting the dataset open option ENCODING="" also works, but is specific to Shapefiles, and we don't know the path resolves to a Shapefile until after opening it - which is too late to set the option.

This now correctly handles encoding="cp936" for the shapefiles described in #380. I'm still trying to create some test cases here, but running into issues saving to the correct native encoding to prove that.

theroggy · 2024-04-05T07:35:19Z

This sets SHAPE_ENCODING="" (GDAL config option) to disable auto-decoding before opening the shapefile. Setting the dataset open option ENCODING="" also works, but is specific to Shapefiles, and we don't know the path resolves to a Shapefile until after opening it - which is too late to set the option.

The only disadvantage I see from using the config option vs the dataset open option is that it can lead to inter-thread side effects. So if someone in one thread uses the encoding parameter, all other threads running in the same process at the same time to read/write data can be influenced by this and get encoding issues.
This can be avoided by first opening the file to check the driver when encoding is specified to then be able to use the dataset open option... but I do agree it sounds like a far fetched scenario, even for a multithreading racing condition.

A final alternative is to pass the encoding specified to GDAL as a dataset open option, but this also needs a driver check to be sure it is a shapefile. Not sure if GDAL does some special/other things involving support of encodings compared to what we do in pyogrio... but if this is the case it might be "safer" just to delegate this to GDAL.

brendan-ward · 2024-04-05T15:42:10Z

Fiona used to use CPLSetThreadLocalConfigOption to set this option specifically within a thread; we can do that here too.

My concern with opening the file twice is that it might incur a lot of overhead for remote sources (e.g., shapefile in a zip on S3), but I don't have remote examples easy at hand to measure / prove that increase in overhead, it's just a guess. In contrast, setting the config option (so long as it doesn't negatively clobber other threads) should be low overhead.

stonereese · 2024-04-05T16:53:31Z

Fiona used to use CPLSetThreadLocalConfigOption to set this option specifically within a thread; we can do that here too.

My concern with opening the file twice is that it might incur a lot of overhead for remote sources (e.g., shapefile in a zip on S3), but I don't have remote examples easy at hand to measure / prove that increase in overhead, it's just a guess. In contrast, setting the config option (so long as it doesn't negatively clobber other threads) should be low overhead.

Thank you so much for your help. When I noticed garbled text in the feedback, I realized that I forgot to mention that the shp file also displayed garbled text when using the sql parameter in read_dataframe(), regardless of the specified encoding parameter. I believe your fix has also resolved this issue. I am currently on vacation today, but I will verify it tomorrow.

brendan-ward · 2024-04-06T01:30:48Z

Unfortunately I'm finding that using sql with a non-UTF-8 encoded file is returning incorrect column names / values when encoding is not also specified, even if GDAL detects the correct native encoding and should be decoding that for us. We might be missing a decoding step somewhere; investigating now.

I'm also finding that it is hard to use non-UTF-8 encodings with use_arrow=True because we don't decode the individual values; the raw values go directly into an Arrow table and immediately break certain things because they can't be decoded nicely. I was able to work around this for shapefiles by letting GDAL do the decoding for us, but I'm not sure how to do this for other drivers. Writing non-UTF values for some of those drivers is likely invalid, so it might not be an issue for those, but it seems theoretically possible for some of them (e.g., GeoJSON).

We can either disable the encoding option altogether for ogr_open_arrow - though this does work for shapefiles because GDAL decodes those to UTF-8 for us (per e738f7b).

@jorisvandenbossche I'm unfamiliar with handling of non-UTF-8 column names / text values in pyarrow - should we even be trying to allow non-UTF-8 content through the Arrow API? I'm not seeing how we would intercept and decode them on our end like we do for the non-Arrow API.

brendan-ward · 2024-04-06T03:48:04Z

So it turns out that - at least for shapefiles - the OGRLayer instance returned when using an SQL always fails the test for UTF-8 capabilities - even if there is a valid .cpg file present.

The workaround was to always test capabilities for shapefiles on the only layer that is present in the shapefile (e.g., the base layer underneath the SQL query layer).

I'm not sure the degree to which this might be present for other drivers; I think most of those already return true when testing for UTF-8 capabilities, so we didn't run into the same issue.

theroggy · 2024-04-06T07:55:45Z

Fiona used to use CPLSetThreadLocalConfigOption to set this option specifically within a thread; we can do that here too.

Yep, that would be great.

My concern with opening the file twice is that it might incur a lot of overhead for remote sources (e.g., shapefile in a zip on S3), but I don't have remote examples easy at hand to measure / prove that increase in overhead, it's just a guess. In contrast, setting the config option (so long as it doesn't negatively clobber other threads) should be low overhead.

Hmm... didn't considered this, also don't have any experience with using remote files.

Not really relevant anymore because of the threadsafe solution above, but if the overhead would be in the driver detection, it can be solved by specifying in GDALOpenEx already that it is an "ESRI Shapefile", so the driver detection is avoided/minimized. If this would be a significant overhead it might be a good idea in general to offer the option to specify the driver(s) to take in consideration to open the file.

theroggy · 2024-04-06T08:43:08Z

So it turns out that - at least for shapefiles - the OGRLayer instance returned when using an SQL always fails the test for UTF-8 capabilities - even if there is a valid .cpg file present.

Sounds like a bug in GDAL? The problem occurs for both sql dialects ('OGRSQL' and 'SQLITE')?

The workaround was to always test capabilities for shapefiles on the only layer that is present in the shapefile (e.g., the base layer underneath the SQL query layer).

I'm not sure the degree to which this might be present for other drivers; I think most of those already return true when testing for UTF-8 capabilities, so we didn't run into the same issue.

brendan-ward · 2024-04-06T14:05:59Z

Sounds like a bug in GDAL?

Not sure, it could also be a bug in how we are trying to use it on our end in this case. The GDAL Python bindings also return False for the capability but still return the correctly decoded value:

from osgeo import ogr

drv = ogr.GetDriverByName("ESRI Shapefile")
ds = drv.Open("/tmp/test.shp", 0)

lyr = ds.GetLayerByIndex(0)
print(f"Supports UTF-8: {lyr.TestCapability(ogr.OLCStringsAsUTF8)}")
# True
print(lyr.schema[0].name)
# 中文

lyr = ds.ExecuteSQL("select * from test where \"中文\" = '中文' ", None, "")
print(f"Supports UTF-8: {lyr.TestCapability(ogr.OLCStringsAsUTF8)}")
# False
print(lyr.schema[0].name)
# 中文

The Python bindings are a bit hard to trace through, I'm still trying to find where it determines the schema / field name in this case to see if it is doing something different than we are here.

…improve docs

pyogrio/_io.pyx

theroggy · 2024-04-08T20:44:45Z

pyogrio/_io.pyx

+            encoding_b = encoding.encode("UTF-8")
+            encoding_c = encoding_b


Most likely there is a good reason, but I wonder... why isn't this moved inside the function as well?

I originally wanted to keep override_threadlocal_config_option so that it only worked with C types, in case we ever wanted to pass in C values equivalent to what you'd get back from the function. Otherwise, I think we'd want to have it encode both the key and value, even if the key passed in is a string defined in code (already a const char* I think?) rather than runtime.

even if the key passed in is a string defined in code (already a const char* I think?)

Hmm, not entirely sure that is correct, cython docs say those are str (unicode) instances but yet show examples like this: cdef char* hello_world = 'hello world' which implies that we at least get the conversion without extra overhead when defining string literals.

pyogrio/_io.pyx

jorisvandenbossche

Very clear and informative comments and tests!

What I don't fully understand is how this works for non-Shapefiles. From the recent discussions/PRs I had thought that eventually this encoding keyword actually only makes sense for shapefiles. But you document and test that you can use it (in theory) also for other formats. But since for other formats there is no such thing as SHAPE_ENCODING, GDAL will still assume it is reading those files as UTF-8 and is returning UTF-8 values to us, right? How does that not give errors within GDAL?

Something else I am wondering: do we actually test a format that does not have OLCStringsAsUTF8 (or where we override GDAL like OSM and xlsx and GeoSJONseq)? i.e. a format where we actually end up using locale.getpreferredencoding()
CSV is an example for that?

jorisvandenbossche · 2024-04-12T08:45:17Z

CHANGES.md

@@ -19,6 +18,9 @@
 -   Raise exception in `read_arrow` or `read_dataframe(..., use_arrow=True)` if
    a boolean column is detected due to error in GDAL reading boolean values (#335)
    this has been fixed in GDAL >= 3.8.3.
+-   Properly handle decoding of ESRI Shapefiles with user-provided `encoding`
+    option for `read`, `read_dataframe`, and `open_arrow`, and correctly encode
+    Shapefile field names and text values to the user-provided `encoding` (#384).


What does this last sentence mean exactly? Is that for writing?

Yes, that was for writing. Will make more explicit.

jorisvandenbossche · 2024-04-12T09:39:45Z

@jorisvandenbossche I'm unfamiliar with handling of non-UTF-8 column names / text values in pyarrow - should we even be trying to allow non-UTF-8 content through the Arrow API? I'm not seeing how we would intercept and decode them on our end like we do for the non-Arrow API.

By definition Arrow strings are UTF-8, using anything else would be considered as invalid Arrow data. The only solution one has it to store it as a column with variable size binary data type (and keep track of the actual encoding as a user). But that only works for column values, I don't think there is a workaround for column names.

Given that you already solved this for Shapefiles, and that for other file formats this seems a dubious use case anyway(?), it's probably fine to not support this for reading with Arrow?
I assume the only option to get this working is to have an upstream change in GDAL, for example by having an encoding option in OGR_L_GetArrowStream to allow the user to override this and let GDAL decode the read values to UFT-8 before putting that in the Arrow data.

jorisvandenbossche · 2024-04-12T09:45:17Z

But since for other formats there is no such thing as SHAPE_ENCODING, GDAL will still assume it is reading those files as UTF-8 and is returning UTF-8 values to us, right? How does that not give errors within GDAL?

Or GDAL just reads the data and passes them on, assuming it is UTF-8 but not actually doing anything with it? So as long as we then decode those bytes with the user provided encoding instead of UTF-8, this just happens to "work"? (but so eg looking at that file with ogrinfo would still given you garbage, because at that point GDAL will assume UTF-8?)

brendan-ward · 2024-04-12T15:59:52Z

Or GDAL just reads the data and passes them on, assuming it is UTF-8 but not actually doing anything with it? So as long as we then decode those bytes with the user provided encoding instead of UTF-8, this just happens to "work"

This is my assumption, based on the testing here. GDAL and other readers of those other drivers (GPKG, FGB, etc) will show miscoded field names / values. So it isn't terribly useful - seems like it enables breaking other tools in the ecosystem - but we don't (yet) specifically prevent it. I don't know if other tools are able to validly write non-UTF-8 encodings to some of these, but it appears to be possible in practice by converting shapefiles to some of them without correctly accounting for alternative encodings (e.g., GDAL #7458).

It would take a bit of research to determine the list, but it would in theory be possible to define a list of drivers that we allow to be non-UTF-8 on write (e.g., shapefile, maybe CSV, XLSX?), and raise an error on others.

We can leave it a bit more open-ended for reading, in part to allow users to specifically force a recoding to cleanup data. But we have to restrict that to the non-Arrow API.

theroggy · 2024-04-13T09:18:43Z

My current general understanding is as follows. There are 2 general types of format:

"rude" formats that don't have a fixed encoding and don't have metadata with them to indicate the encoding.
- OLCStringsAsUTF8 is False
- e.g. "CSV"
- for writing: GDAL doesn't do any conversions, it just writes the string/column data as it is.
- for reading: GDAL doesn't do any conversions, it just writes the string/column data as it is.
formats that support one or more encodings and have metadata about it, or the encoding can be deduced by GDAL.
- OLCStringsAsUTF8 is True
- e.g. "ESRI Shapefile", most xml or json formats,...
- for writing: GDAL expects to receive UTF-8 and will write it in the default encoding of the format. As far as I know/found, "ESRI Shapefile" is the only one that supports to specify the encoding to use for writing (via layer creation option "ENCODING"), and then GDAL will do the conversion to the format specified.
- for reading: GDAL will by default (try to) convert the data to UTF-8 based on the metadata found in the file. Depending on the file format, there are dataset open options to specify the to be used for reading, and then GDAL will do the conversion to UTF-8.
  - For "ESRI Shapefile", this options is called ENCODING, and you can specify ENCODING="" to avoid any recoding. Probably it is a pity this name is the same as the encoding parameter in pyogrio. If this wasn't the case, we could have just said: please specify the SHP_ENCODING open option.
  - For "DXF", this option is called DXF_ENCODING, and you can specify DXF_ENCODING="UTF-8" to avoid any recoding.
  - For most XML-based files (e.g. "GPX", "KML",...), the encoding should be correct in the xml file, you cannot overrule it software.
  - For json-based formats, I suppose GDAL autodetects the encoding based on the content.

To conclude, I think the "encoding" parameter is strictly speaking necessary for the first type of files (not sure if there are any others than CSV)... for the other formats everything could be controlled via open options, but due to the naming clash with the one for shapefile...

jorisvandenbossche · 2024-04-13T09:43:18Z

For "ESRI Shapefile", this options is called ENCODING, and you can specify ENCODING="" to avoid any recoding. Probably it is a pity this name is the same as the encoding parameter in pyogrio. If this wasn't the case, we could have just said: please specify the SHP_ENCODING open option.

BTW if there is a clash with one of our own keywords, you can always explicitly pass the GDAL layer/dataset open option through layer_options=dict(encoding="..")

theroggy · 2024-04-13T18:57:24Z

BTW if there is a clash with one of our own keywords, you can always explicitly pass the GDAL layer/dataset open option through layer_options=dict(encoding="..")

Indeed, but only for write_dataframe at the moment, not (yet) for read_dataframe.

jorisvandenbossche · 2024-04-17T08:31:13Z

Indeed, but only for write_dataframe at the moment, not (yet) for read_dataframe.

Ah, that's something we should probably add then.

For this PR, shall we merge it? (it just needs some fixing merge conflicts)

theroggy · 2024-04-17T19:37:30Z

Am I right that if a user passes encoding= for a file format having OLCStringsAsUTF8 == True that is not "ESRI Shapefile", it will be ignored?
If so, maybe it is useful to show a warning in that case?

brendan-ward · 2024-04-17T20:25:40Z

if a user passes encoding= for a file format having OLCStringsAsUTF8 == True that is not "ESRI Shapefile", it will be ignored

For read (non-Arrow API) of other drivers than shapefile with the encoding parameter, we will attempt to use that encoding to decode the field names and string values regardless of OLCStringsAsUTF8 support. This breaks using the Arrow API because pyarrow expects everything in UTF-8 (per above).

For write of other drivers than shapefile (where we force UTF-8), we use the encoding parameter passed by the user to encode the field names and string values. This then renders them as potentially unreadable in other tools that expect UTF-8.

I'm not sure if we should equate support for OLCStringsAsUTF8 as an indication that the driver expects everything in UTF-8, only an indication that GDAL knows how to get to / from a different encoding and UTF-8 (isn't that what you were getting at above?).

If we know that a given driver must be in UTF-8, we could raise an Exception when attempting to write an alternative encoding.

Are you saying we should raise a warning if OLCStringsAsUTF8 == True that is not "ESRI Shapefile" during read? i.e., they know the data source is in a different encoding regardless of GDAL's capabilities to convert to UTF-8. Since `encoding' is opt-in, it seems like that is a deliberate choice and the warning just nags them about that, right? Maybe I don't follow what you are getting at here.

theroggy · 2024-04-17T21:38:17Z

if a user passes encoding= for a file format having OLCStringsAsUTF8 == True that is not "ESRI Shapefile", it will be ignored

For read (non-Arrow API) of other drivers than shapefile with the encoding parameter, we will attempt to use that encoding to decode the field names and string values regardless of OLCStringsAsUTF8 support. This breaks using the Arrow API because pyarrow expects everything in UTF-8 (per above).

For write of other drivers than shapefile (where we force UTF-8), we use the encoding parameter passed by the user to encode the field names and string values. This then renders them as potentially unreadable in other tools that expect UTF-8.
I'm not sure if we should equate support for OLCStringsAsUTF8 as an indication that the driver expects everything in UTF-8, only an indication that GDAL knows how to get to / from a different encoding and UTF-8 (isn't that what you were getting at above?).

Yes, sorry, not sure why I wrote that, but it being ignored is indeed not at all the case... It is used, but I think the cases where it leads to something useful are very limited. Not 100% sure, but I personally think indeed that OLCStringsAsUTF8 is an indication that the driver almost always expects to receive UTF-8 data. The only possible exceptions I see are cases where the user specifies some specific GDAL options instructing GDAL not to do any encoding changes (e.g. https://gdal.org/drivers/vector/dxf.html#character-encodings).
If data is provided in another encoding and the GDAL driver writes "UTF-8" by default anyway it won't recode and won't crash on it, as is the case I think in your tests. But if a recoding is needed and data is not provided to GDAL in UTF-8 I guess this can lead to errors.

If we know that a given driver must be in UTF-8, we could raise an Exception when attempting to write an alternative encoding.

I think this is the case... but as there are that many drivers, I'm not 100% sure, which is why I was thinking about a warning.

Are you saying we should raise a warning if OLCStringsAsUTF8 == True that is not "ESRI Shapefile" during read? i.e., they know the data source is in a different encoding regardless of GDAL's capabilities to convert to UTF-8. Since `encoding' is opt-in, it seems like that is a deliberate choice and the warning just nags them about that, right? Maybe I don't follow what you are getting at here.

True I suppose about the nagging. I'm afraid that few users will really know what they are doing when specifying "encoding=for aOLCStringsAsUTF8 == True` file, but if they would know, it is probably annoying that the warning is shown.

Possibly/probably it is best to just "let it go" and solve any issues if they would surface...

brendan-ward · 2024-04-17T21:59:20Z

I've tried to address some of the discussion in the latest changes:

For the Arrow API (open / read arrow) it will raise an exception if an alternative encoding is provided and driver is not shapefile

We now specifically raise exception for shapefiles if both encoding parameter and "ENCODING"="<whatever>" open / layer creation options are combined, because this leads to order of operations issues because we are specifically setting SHAPE_ENCODING or ENCODING on the user's behalf. Since SHAPE_ENCODING may be set in advance as a configuration option by the user (possibly re-used across many data sources), we don't raise an exception, we just silently override it (so there isn't a case where they can set it to SHAPE_ENCODING="" to disable recoding by GDAL and then pass in an alternative encoding parameter). This seems like advanced territory, so I didn't think it necessary to raise a warning until we start seeing issues.

This does not disable writing to an alternative encoding if provided by the user (hopefully they know the right thing to do for the target driver); I think we can leave this as is until we see issues.

brendan-ward · 2024-04-17T22:56:15Z

Added a specific test to verify that locale.getpreferredencoding() is used for CSV files (only runs on Windows though) when not providing encoding parameter.

brendan-ward · 2024-04-26T01:38:12Z

Added more tests for the Arrow write API, slimmed down the comments and repeated code around encoding in _io.pyx, and fixed an issue that would have bitten us immediately with the forthcoming GDAL 3.9: FlatGeobuff no longer allows writing non-UTF-8 values. We're already likely in unsupported territory with the tests that prove we can write non-UTF-8 values to drivers that likely expect UTF-8, but I left the other drivers in place for now because it proves the encoding works. Alternatively, we could simply block writing non-UTF-8 values for non-shapefile drivers where OLCStringsAsUTF8 is True.

jorisvandenbossche · 2024-04-30T08:00:26Z

Didn't notice it before (and haven't yet look in detail what might be causing it), but merging this caused a bunch failures when testing the wheels for ubuntu and older macos: https://github.com/geopandas/pyogrio/actions/runs/8848749467/job/24299655017

Use SHAPE_ENCODING to disable auto-decoding of shapefiles

2bfc862

brendan-ward added this to the 0.8.0 milestone Apr 5, 2024

Correctly handle I/O for non-UTF-8 shapefiles

2b38f94

Use thread local config, add more tests

e738f7b

Fix detection of encoding for shapefile layers via SQL

3362dae

brendan-ward changed the title ~~FIX: disable GDAL auto-decoding of shapefiles to UTF-8~~ FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set Apr 6, 2024

brendan-ward mentioned this pull request Apr 6, 2024

UTF8 layer capability returns false for SQL result layer of shapefile but true for base layer OSGeo/gdal#9648

Closed

brendan-ward added 3 commits April 6, 2024 08:19

Expand encoding test cases

b991038

Use ANSI code pages for alternative encodings (not DOS) and slightly …

ecb56f6

…improve docs

Improve docs a little more

994be50

brendan-ward marked this pull request as ready for review April 6, 2024 21:45

brendan-ward mentioned this pull request Apr 7, 2024

DO NOT MERGE: Refactor dataframe where tests, reduce IF conditional compilation blocks #386

Closed

theroggy reviewed Apr 7, 2024

View reviewed changes

pyogrio/_io.pyx Outdated Show resolved Hide resolved

pyogrio/_io.pyx Outdated Show resolved Hide resolved

pyogrio/_io.pyx Outdated Show resolved Hide resolved

theroggy reviewed Apr 7, 2024

View reviewed changes

pyogrio/_io.pyx Outdated Show resolved Hide resolved

brendan-ward added 3 commits April 8, 2024 06:48

consolidate operations per PR feedback

901320f

verify SHAPE_ENCODING global option is retained

4313fca

Merge branch 'main' into issue380

5c4657d

theroggy reviewed Apr 8, 2024

View reviewed changes

Cleanup duplicate override to UTF-8

ef602d5

brendan-ward mentioned this pull request Apr 10, 2024

Write Arrow Table/RecordBatchReader to GDAL #346

Merged

jorisvandenbossche reviewed Apr 12, 2024

View reviewed changes

Merge branch 'main' into issue380

0e7e932

Merge branch 'main' into issue380

146967d

brendan-ward added 2 commits April 17, 2024 14:41

prevent combining encoding parameter and ENCODING open / creation option

2bdf4dc

Fix missing pyarrow constraint for test

b8cc902

brendan-ward added 3 commits April 17, 2024 15:33

Try to verify that platform default encoding is used for CSV by default

e7db258

split CSV platform encoding test out to verify it executes on Windows

eaecb18

Fix bug in skip of CSV encoding test

a623501

brendan-ward added 7 commits April 25, 2024 15:37

Merge branch 'main' into issue380

d615fd7

cleanup, add arrow I/O tests

d2b43f3

Fix missing test annotation

815620a

Don't fail in error handler if message cannot be decoded to UTF-8

6eca265

Fix bug in attempted fix

34bacc3

Fix failing test for GDAL >= 3.9

e088311

Fix other failing FlatGeobuff tests

58d1d3f

jorisvandenbossche approved these changes Apr 26, 2024

View reviewed changes

jorisvandenbossche merged commit db58724 into main Apr 26, 2024
20 checks passed

jorisvandenbossche deleted the issue380 branch April 26, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set #384

FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set #384

brendan-ward commented Apr 5, 2024

theroggy commented Apr 5, 2024 •

edited

Loading

brendan-ward commented Apr 5, 2024

stonereese commented Apr 5, 2024

brendan-ward commented Apr 6, 2024

brendan-ward commented Apr 6, 2024

theroggy commented Apr 6, 2024 •

edited

Loading

theroggy commented Apr 6, 2024

brendan-ward commented Apr 6, 2024

theroggy Apr 8, 2024

brendan-ward Apr 8, 2024

brendan-ward Apr 8, 2024

jorisvandenbossche left a comment

jorisvandenbossche Apr 12, 2024

brendan-ward Apr 17, 2024

jorisvandenbossche commented Apr 12, 2024

jorisvandenbossche commented Apr 12, 2024

brendan-ward commented Apr 12, 2024

theroggy commented Apr 13, 2024 •

edited

Loading

jorisvandenbossche commented Apr 13, 2024

theroggy commented Apr 13, 2024

jorisvandenbossche commented Apr 17, 2024

theroggy commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

theroggy commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

brendan-ward commented Apr 26, 2024

jorisvandenbossche commented Apr 30, 2024

		encoding_b = encoding.encode("UTF-8")
		encoding_c = encoding_b

FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set #384

FIX: correctly use GDAL auto-decoding of shapefiles when encoding is set #384

Conversation

brendan-ward commented Apr 5, 2024

theroggy commented Apr 5, 2024 • edited Loading

brendan-ward commented Apr 5, 2024

stonereese commented Apr 5, 2024

brendan-ward commented Apr 6, 2024

brendan-ward commented Apr 6, 2024

theroggy commented Apr 6, 2024 • edited Loading

theroggy commented Apr 6, 2024

brendan-ward commented Apr 6, 2024

theroggy Apr 8, 2024

Choose a reason for hiding this comment

brendan-ward Apr 8, 2024

Choose a reason for hiding this comment

brendan-ward Apr 8, 2024

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Apr 12, 2024

Choose a reason for hiding this comment

brendan-ward Apr 17, 2024

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 12, 2024

jorisvandenbossche commented Apr 12, 2024

brendan-ward commented Apr 12, 2024

theroggy commented Apr 13, 2024 • edited Loading

jorisvandenbossche commented Apr 13, 2024

theroggy commented Apr 13, 2024

jorisvandenbossche commented Apr 17, 2024

theroggy commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

theroggy commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

brendan-ward commented Apr 17, 2024

brendan-ward commented Apr 26, 2024

jorisvandenbossche commented Apr 30, 2024

theroggy commented Apr 5, 2024 •

edited

Loading

theroggy commented Apr 6, 2024 •

edited

Loading

theroggy commented Apr 13, 2024 •

edited

Loading