Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor plot and plot3d to use virtualfile_from_data #990

Merged
merged 8 commits into from
Mar 8, 2021

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Mar 2, 2021

Description of proposed changes

Added an additional extra_arrays parameter to thevirtualfile_from_data function to accept optional numpy arrays from the plot and plot3d functions.

TODO: May need to tweak the data_kind function a little, to output matrix for true numpy array matrices and vectors for numpy arrays with mixed dtypes. This is to fix a segmentation fault on test_plot3d_matrix_color. Edit: see comment at #990 (comment)

Part of #949. Once this is done, it will simplify the geopandas integration (#608) (which will be done in a follow-up Pull Request at #1000).

Also fixes #1021

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

Added an additional `extra_arrays` parameter to the
`virtualfile_from_data` function to accept optional
numpy arrays from the plot and plot3d functions.
@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Mar 2, 2021
More efficient to pass in whole 2D numpy array matrix
as a virtualfile to GMT, and this fixes the segmentation
fault crash on test_plot3d_matrix_color when the data
was passed in via virtualfile_from_vectors instead.
Comment on lines 1443 to 1456
elif kind == "matrix": # turn 2D arrays into list of vectors
try:
# pandas.DataFrame and xarray.Dataset types
_data = [array for _, array in data.items()]
except AttributeError:
# Python lists, tuples, and numpy ndarray types
_data = np.atleast_2d(np.asanyarray(data).T)
try:
# Just use virtualfile_from_matrix for 2D
# numpy.ndarray which are not datetime (M) types
assert data.ndim == 2 and not data.dtype.kind == "M"
_virtualfile_from = self.virtualfile_from_matrix
_data = (data,)
except (AssertionError, AttributeError):
# Python lists, tuples, and numpy ndarray types
_data = np.atleast_2d(np.asanyarray(data).T)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would appreciate advice on improving/refactoring this chunk of code. I'm half thinking whether to move some of the logic to pygmt/helpers/utils.py, i.e. have a new 'kind' besides file/grid/matrix/vectors:

if isinstance(data, str):
kind = "file"
elif isinstance(data, xr.DataArray):
kind = "grid"
elif data is not None:
kind = "matrix"
else:
kind = "vectors"

Or we could just keep things like this as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the codes carefully, but I think the "matrix" kind is quite confusing.

The GMT API function GMT_Put_Matrix(), the PyGMT wrapper put_matrix() and the virtualfile function virtualfile_from_matrix() all require a simple 2d matrix with a single dtype (e.g., np.float or np.double).

However, currently, data types like pandas.DataFrame are also "matrix". So we have to check the data types to choose either virtualfile_from_vector or virtualfile_from_matrix.

I agree with you that we can/shoudl add a new kind to distinguish between a GMT-compatible "matrix" and a more complicated data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep this logic as is for now to resolve #1021 (which helps with #1020), and refactor things later to have more specific 'matrix' types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@seisman seisman mentioned this pull request Mar 8, 2021
@seisman seisman added this to the 0.3.1 milestone Mar 8, 2021
@weiji14 weiji14 marked this pull request as ready for review March 8, 2021 19:00
@@ -1378,6 +1380,9 @@ def virtualfile_from_data(self, check_kind=None, data=None, x=None, y=None, z=No
raster grid, a vector matrix/arrays, or other supported data input.
x/y/z : 1d arrays or None
x, y and z columns as numpy arrays.
extra_arrays : list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extra_arrays : list
extra_arrays : list of 1d arrays

try:
# Just use virtualfile_from_matrix for 2D
# numpy.ndarray which are not datetime (M) types
assert data.ndim == 2 and not data.dtype.kind == "M"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert data.ndim == 2 and not data.dtype.kind == "M"

The GMT_Put_Matrix() function only supports a few numeric data types, but numpy.dtype.kind can have values other than M, for example, data.dtype.kind == "O" will pass the assert statement but is not supported by virtualfile_from_matrix. Perhaps check data.dtype.kind in 'iuf'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll try data.dtype.kind in 'iuf' and see if it works on the test suite.

@seisman
Copy link
Member

seisman commented Mar 8, 2021

One more comment: perhaps we should add a new test to make sure #1021 is fixed.

Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge after all tests pass and the styling issue is fixed.

Co-Authored-By: Dongdong Tian <seisman.info@gmail.com>
@weiji14 weiji14 force-pushed the virtualfile_from_data/plot_plot3d branch from dbc8ded to b0ab97d Compare March 8, 2021 22:42
@weiji14 weiji14 merged commit c2684ba into master Mar 8, 2021
@weiji14 weiji14 deleted the virtualfile_from_data/plot_plot3d branch March 8, 2021 23:11
@weiji14 weiji14 mentioned this pull request Mar 22, 2021
5 tasks
sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022
…Tools#990)

Added an additional `extra_arrays` parameter to the
`virtualfile_from_data` function to accept optional
numpy arrays from the plot and plot3d functions.

* Just use virtualfile_from_matrix for non-datetime 2D numpy arrays

More efficient to pass in whole 2D numpy array matrix
as a virtualfile to GMT, and this fixes the segmentation
fault crash on test_plot3d_matrix_color when the data
was passed in via virtualfile_from_vectors instead.

* Add docstring on extra_arrays parameter in virtualfile_from_data
* Use virtualfile_from_matrix on int/uint/float types and add a test

Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 2D list as input
2 participants