-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pygmt.which: Refactor to get rid of temporary files #3148
Conversation
pygmt/src/which.py
Outdated
if not path: | ||
result = lib.virtualfile_to_dataset(vfname=vouttbl, output_type="pandas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set output_type to numpy
to simplify the code below no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to think what's the most efficient way to get the output from which
. It seems a bit overkill going virtualfile
-> pandas
-> numpy
-> list[str]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not very efficient. Instead, we can add another internal type output_type="string"
, which returns the vector of the trailing text only. It's useful if we know that a module only outputs text strings like which
. Actually, GMT provides an API function GMT_Get_Strings
which does exactly the same thing (if I understand it correctly). We can also wrap that API function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we can add another internal type
output_type="string"
, which returns the vector of the trailing text only. It's useful if we know that a module only outputs text strings likewhich
.
Done in PR #3157.
Actually, GMT provides an API function
GMT_Get_Strings
which does exactly the same thing (if I understand it correctly). We can also wrap that API function instead.
This API function requires the data family to be GMT_IS_VECTOR
/GMT_IS_MATRIX
, so it cannot be used in this case (family is GMT_IS_DATASET
). xref: https://github.com/GenericMappingTools/gmt/blob/fcb795ef196714fe51f7e5e68d30a18e981294d0/src/gmt_api.c#L15785
CodSpeed Performance ReportMerging #3148 will degrade performances by 46.35%Comparing Summary
Benchmarks breakdown
|
…ay of trailing texts
Ping @GenericMappingTools/pygmt-maintainers for final reviews. The |
The cache GMT artifacts workflow is failing at https://github.com/GenericMappingTools/pygmt/actions/runs/8680209408/job/23800357490: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/runner/work/pygmt/pygmt/pygmt/helpers/caching.py", line 103, in cache_data
which(fname=datasets, download="a")
File "/Users/runner/work/pygmt/pygmt/pygmt/helpers/decorators.py", line 607, in new_module
return module_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/pygmt/pygmt/pygmt/helpers/decorators.py", line 780, in new_module
return module_func(*bound.args, **bound.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/pygmt/pygmt/pygmt/src/which.py", line 67, in which
paths = lib.virtualfile_to_dataset(vfname=vouttbl, output_type="strings")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/pygmt/pygmt/pygmt/clib/session.py", line 1940, in virtualfile_to_dataset
return result.to_strings()
^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/pygmt/pygmt/pygmt/datatypes/dataset.py", line 156, in to_strings
return np.char.decode(textvector) if textvector else np.array([], dtype=str)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/numpy/core/defchararray.py", line 615, in decode
_vec_string(a, object_, 'decode', _clean_args(encoding, errors)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: string operation on non-string array The error doesn't really indicate what happened, so I added a
It seems like already downloaded files might return as None? |
It should return the path if the file is already downloaded. Not sure where |
It seems to happen when I have a mix of downloaded and not downloaded files. I get |
Now I can reproduce the issue locally. It doesn't always happen for a mix of downloaded/undownloaded files, for example:
|
Here is a minimum example to reproduce the issue. From what I can see, the issue only occurs if we try to download a tile like
|
Address #2730.
Output to a pandas.DataFrame via virtualfiles and then convert the DataFrame to a list.