-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 5th No.35】为 Paddle 新增 histogramdd API -part #57880
Changes from 10 commits
685c3c9
00883e3
8f9ea47
78500a0
90488ae
40ea529
0ef396b
3d9762b
28587d0
885eac3
e008db3
c7df6d7
10cb3ac
74e1953
5b027a2
d6ad6e9
11d6bce
12ce04f
9bb3115
6db3d71
d1dff5e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,8 @@ | |
|
||
__all__ = [] | ||
|
||
_range = range | ||
|
||
# Consistent with kDefaultDim from C++ Backend | ||
K_DEFAULT_DIM = 9 | ||
|
||
|
@@ -3724,3 +3726,219 @@ def cdist( | |
return paddle.linalg.norm( | ||
x[..., None, :] - y[..., None, :, :], p=p, axis=-1 | ||
) | ||
|
||
|
||
def histogramdd( | ||
sample, bins=10, range=None, density=False, weights=None, name=None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. according to API naming conventions, enter the name of Tensor using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I will modify it soon~ |
||
): | ||
r""" | ||
Computes a multi-dimensional histogram of the values in a tensor. | ||
Interprets the elements of an input tensor whose innermost dimension has size N as a collection of N-dimensional points. Maps each of the points into a set of N-dimensional bins and returns the number of points (or total weight) in each bin. | ||
input must be a tensor with at least 2 dimensions. If input has shape (M, N), each of its M rows defines a point in N-dimensional space. If input has three or more dimensions, all but the last dimension are flattened. | ||
Each dimension is independently associated with its own strictly increasing sequence of bin edges. Bin edges may be specified explicitly by passing a sequence of 1D tensors. Alternatively, bin edges may be constructed automatically by passing a sequence of integers specifying the number of equal-width bins in each dimension. | ||
cocoshe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Args: | ||
sample (Tensor): The input tensor. | ||
bins (Tensor[], int[], or int): If Tensor[], defines the sequences of bin edges. If int[], defines the number of equal-width bins in each dimension. If int, defines the number of equal-width bins for all dimensions. | ||
range (sequence of float, optional): Defines the leftmost and rightmost bin edges in each dimension. If is None, set the minimum and maximum as leftmost and rightmost edges for each dimension. | ||
density (bool, optional): If False (default), the result will contain the count (or total weight) in each bin. If True, each count (weight) is divided by the total count (total weight), then divided by the volume of its associated bin. | ||
weights (Tensor, optional): By default, each value in the input has weight 1. If a weight tensor is passed, each N-dimensional coordinate in input contributes its associated weight towards its bin’s result. The weight tensor should have the same shape as the input tensor excluding its innermost dimension N. | ||
name (str, optional): For details, please refer to :ref:`api_guide_Name`. Generally, no setting is required. Default: None. | ||
|
||
Returns: | ||
N-dimensional Tensor containing the values of the histogram. bin_edges(Tensor[]): sequence of N 1D Tensors containing the bin edges. | ||
cocoshe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
Examples: | ||
.. code-block:: python | ||
|
||
>>> import paddle | ||
>>> x = paddle.to_tensor([[0., 1.], [1., 0.], [2.,0.], [2., 2.]]) | ||
>>> bins = [3,3] | ||
>>> weights = paddle.to_tensor([1., 2., 4., 8.]) | ||
>>> paddle.histogramdd(x, bins=bins, weights=weights) | ||
(Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[[0., 1., 0.], | ||
[2., 0., 0.], | ||
[4., 0., 8.]]), [Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[0. , 0.66666669, 1.33333337, 2. ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[0. , 0.66666669, 1.33333337, 2. ])]) | ||
|
||
|
||
>>> y = paddle.to_tensor([[0., 0.], [1., 1.], [2., 2.]]) | ||
>>> bins = [2,2] | ||
>>> range = [0., 1., 0., 1.] | ||
>>> density = True | ||
>>> paddle.histogramdd(y, bins=bins, range=range, density=density) | ||
(Tensor(shape=[2, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[[2., 0.], | ||
[0., 2.]]), [Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[0. , 0.50000000, 1. ]), Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True, | ||
[0. , 0.50000000, 1. ])]) | ||
|
||
|
||
""" | ||
|
||
def __check_sample(sample): | ||
assert ( | ||
len(sample.shape) >= 2 | ||
), "input sample must be a tensor with at least 2 dimensions." | ||
check_variable_and_dtype( | ||
sample, | ||
'sample', | ||
[ | ||
'float32', | ||
'float64', | ||
], | ||
'histogramdd', | ||
) | ||
|
||
def __check_bins(bins, sample): # when Tensor[], check dtype | ||
for bins_tensor in bins: | ||
bins_tensor = paddle.to_tensor(bins_tensor) | ||
check_variable_and_dtype( | ||
bins_tensor, | ||
'bins', | ||
[ | ||
'float32', | ||
'float64', | ||
], | ||
'histogramdd', | ||
) | ||
assert ( | ||
bins_tensor.dtype == sample.dtype | ||
), "When bins is Tensor[], the dtype of bins must be the same as sample.\n" | ||
|
||
def __check_weights(sample, weights): | ||
if weights is None: | ||
return | ||
sample_shape, weights_shape = sample.shape, weights.shape | ||
assert len(sample_shape) == len(weights_shape) + 1, ( | ||
"if weight tensor is provided," | ||
"it should have the same shape as the input tensor excluding its innermost dimension.\n" | ||
) | ||
for i, _ in enumerate(weights_shape): | ||
assert weights_shape[i] == sample_shape[i], ( | ||
"if weight tensor is provided," | ||
"it should have the same shape as the input tensor excluding its innermost dimension.\n" | ||
) | ||
check_variable_and_dtype( | ||
weights, | ||
'weights', | ||
[ | ||
'float32', | ||
'float64', | ||
], | ||
'histogramdd', | ||
) | ||
assert ( | ||
weights.dtype == sample.dtype | ||
), "The dtype of weights must be the same as sample.\n" | ||
|
||
def __check_range(D, range): | ||
if range is None: | ||
return | ||
check_type(range, 'range', (list, tuple), 'histogramdd') | ||
assert D * 2 == len(range), "The length of range list must be %d\n" % ( | ||
D * 2 | ||
) | ||
|
||
check_type(density, 'density', bool, 'histogramdd') | ||
|
||
__check_sample(sample) | ||
# weights | ||
__check_weights(sample, weights) | ||
D = sample.shape[-1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 是否需要对sample的shape有判断?是所有的shape都能支持吗 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 支持维度大于等于2,新加了判断,辛苦review~ |
||
reshaped_input = sample.reshape([-1, D]) | ||
N = reshaped_input.shape[0] | ||
reshaped_weights = None | ||
if weights is not None: | ||
weights = weights.astype('float32') | ||
reshaped_weights = weights.reshape([N]) | ||
assert reshaped_weights.shape[0] == N, ( | ||
"The size of weight must be %d" % N | ||
) | ||
# range | ||
__check_range(D, range) | ||
if range is None: | ||
range = paddle.zeros([D, 2], dtype=paddle.float32) | ||
maxv = paddle.max(reshaped_input, axis=0).reshape([-1]) | ||
minv = paddle.min(reshaped_input, axis=0).reshape([-1]) | ||
|
||
if paddle.in_dynamic_mode(): | ||
range[:, 0] = minv | ||
range[:, 1] = maxv | ||
else: | ||
range = paddle.static.setitem(range, (slice(None), 0), minv) | ||
range = paddle.static.setitem(range, (slice(None), 1), maxv) | ||
else: | ||
range = paddle.to_tensor(range, dtype=paddle.float32).reshape([D, 2]) | ||
# bins to edges | ||
edges = [] | ||
hist_shape = [] | ||
dedges = [] | ||
if isinstance(bins, (int, list)): # int or int[] | ||
if isinstance(bins, int): | ||
bins = [bins] * D | ||
assert len(bins) == D, ( | ||
"The length of bins must be %d when bins is a list.\n" % D | ||
) | ||
for idx, r in enumerate(range): | ||
if not isinstance(bins[idx], int): | ||
raise ValueError( | ||
"The type of %d-th element in bins list must be int." % idx | ||
) | ||
e = paddle.linspace(r[0], r[1], bins[idx] + 1, 'float32') | ||
edges.append(e) | ||
dedges.append(e.diff()) | ||
elif isinstance( | ||
bins, tuple | ||
): # tuple with D tensors for each innermost dimension | ||
__check_bins(bins, sample) | ||
for bin in bins: | ||
bin = paddle.to_tensor(bin) | ||
edges.append(bin) | ||
dedges.append(bin.diff()) | ||
else: | ||
raise ValueError("Input bins must be Tensor[], int[], or int.") | ||
hist_shape = [edge.shape[0] + 1 for edge in edges] | ||
index_list = [] | ||
# edges shape: [D, linspaced] | ||
# index_list shape: [D, N] | ||
for idx, edge in enumerate(edges): | ||
edge = paddle.to_tensor(edge) | ||
index_list.append( | ||
paddle.searchsorted(edge, reshaped_input[:, idx], right=True) | ||
) | ||
index_list = paddle.to_tensor(index_list) | ||
for i in _range(D): | ||
on_edge = reshaped_input[:, i] == edges[i][-1] | ||
if paddle.in_dynamic_mode(): | ||
index_list[i][on_edge] -= 1 | ||
else: | ||
index_list = paddle.static.setitem( | ||
index_list, (i, on_edge), index_list[i][on_edge] - 1 | ||
) | ||
index_list = tuple(index_list) | ||
lut = paddle.arange(paddle.to_tensor(hist_shape).prod()).reshape(hist_shape) | ||
flattened_index = lut[index_list] | ||
hist = paddle.bincount( | ||
flattened_index, | ||
reshaped_weights, | ||
minlength=paddle.to_tensor(hist_shape).prod(), | ||
) | ||
hist = hist.reshape(hist_shape) | ||
hist = hist.astype('float32') | ||
|
||
core = D * (slice(1, -1),) | ||
hist = hist[core] | ||
|
||
if density: | ||
s = hist.sum() | ||
for i in _range(D): | ||
shape = D * [1] | ||
shape[i] = hist_shape[i] - 2 | ||
hist = hist / dedges[i].reshape(shape) | ||
hist /= s | ||
|
||
return (hist, edges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This writing method affects the readability of the code. If it is to avoid conflicts with the name of the
range
parameter inhistogramdd
, therange
parameter can be adjusted toranges