【Hackathon 5th No.35】为 Paddle 新增 histogramdd API -part #57880

cocoshe · 2023-10-05T21:28:29Z

PR types

New features

PR changes

APIs

Description

histogramdd rfc：

paddle-bot · 2023-10-05T21:28:33Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

python/paddle/tensor/linalg.py

zxcd · 2023-10-07T08:45:49Z

test/legacy_test/test_histogramdd_op.py

+            self.bins = tuple([paddle.to_tensor(bin) for bin in self.bins])
+        hist, edges = paddle.histogramdd(self.sample_dy, bins=self.bins, weights=self.weights_dy, range=self.range, density=self.density)
+
+        np.testing.assert_allclose(self.expect_hist, hist.numpy(), rtol=1e-4, atol=1e-4)


精度不能完全一致吗？

单侧中的expect_hist是手动跑torch然后把结果贴过来的，看样子torch是自动做了精度精确，例如：
torch中计算结果

>>> import torch >>> sample = torch.tensor([[0., 1.], [1., 0.], [2., 0.], [2., 2.]]) >>> bins = [3, 3] >>> weights = torch.tensor([1., 2., 4., 8.]) >>> torch.histogramdd(sample, bins=bins, weight=weights) torch.return_types.histogramdd( hist=tensor([[0., 1., 0.], [2., 0., 0.], [4., 0., 8.]]), bin_edges=(tensor([0.0000, 0.6667, 1.3333, 2.0000]), tensor([0.0000, 0.6667, 1.3333, 2.0000])))

paddle计算结果：

>>> import paddle >>> sample = paddle.to_tensor([[0., 1.], [1., 0.], [2., 0.], [2., 2.]]) >>> bins = [3,3] >>> weights = paddle.to_tensor([1., 2., 4., 8.]) >>> paddle.histogramdd(sample, bins=bins, weights=weights) W1007 19:39:17.011324 1623 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8 W1007 19:39:17.042716 1623 gpu_resources.cc:149] device: 0, cuDNN Version: 8.6. (Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[0., 1., 0.], [2., 0., 0.], [4., 0., 8.]]), [Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True, [0. , 0.66666669, 1.33333337, 2. ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True, [0. , 0.66666669, 1.33333337, 2. ])])

现在我重新在torch.set_printoptions中设置了打印精度为8位，但是在assert_allclose中还是需要设置一下atol（打印相同，但是底层值不同，atol不能达到默认的0），如果要完全一致的话，应该就要用numpy中的api来计算，但是numpy中的histogramdd比pytorch中支持的情况更少一些：例如numpy中仅支持2d输入，但是pytorch支持多维（2d及以上）；numpy中bins仅支持int和int[]，pytorch支持int,int[],和tuple of tensors。所以目前是通过pytorch的打印输出直接作为目标输出，会有绝对误差。
现改为：

np.testing.assert_allclose(hist_out, self.expect_hist, atol=1e-8)

python/paddle/tensor/linalg.py

zxcd · 2023-10-07T08:57:15Z

python/paddle/tensor/linalg.py

+
+    # weights
+    __check_weights(sample, weights)
+    D = sample.shape[-1]


是否需要对sample的shape有判断？是所有的shape都能支持吗

支持维度大于等于2，新加了判断，辛苦review~

zxcd · 2023-10-12T03:53:49Z

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

cocoshe · 2023-10-12T04:26:12Z

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

感谢review，我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛？如果是的话一般怎么判断通过测试呢？（输入如果错误的话，直接就报错了）

paddle-ci-bot · 2023-10-15T03:09:35Z

Sorry to inform you that 90488ae's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

cocoshe · 2023-10-17T08:02:43Z

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

补充了一些type检测，并且添加了error test，辛苦review~

luotao1 · 2023-10-19T07:32:37Z

需要通过 PR-CI-Codestyle-Check 流水线的格式检查

paddle-ci-bot · 2023-10-27T03:10:18Z

Sorry to inform you that 0ef396b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

zxcd · 2023-11-02T03:48:40Z

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

感谢review，我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛？如果是的话一般怎么判断通过测试呢？（输入如果错误的话，直接就报错了）

可以尝试使用类似test/legacy_test/test_reduce_op.py中的方法
self.assertRaises(TypeError, paddle.sum, x2)

cocoshe · 2023-11-02T03:54:50Z

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

代码中建议添加数据类型支持判断，即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型，最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

感谢review，我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛？如果是的话一般怎么判断通过测试呢？（输入如果错误的话，直接就报错了）

可以尝试使用类似test/legacy_test/test_reduce_op.py中的方法 self.assertRaises(TypeError, paddle.sum, x2)

嗯嗯谢谢，在前面的add some type check && add error test
这个commit已经添加了，辛苦review~

zxcd

LGTM

jeff41404 · 2023-11-20T07:11:48Z

python/paddle/tensor/linalg.py

+
+
+def histogramdd(
+    sample, bins=10, range=None, density=False, weights=None, name=None


according to API naming conventions, enter the name of Tensor using x, and the rfc should also be modified synchronously

OK, I will modify it soon~

jeff41404 · 2023-11-20T07:49:17Z

python/paddle/tensor/linalg.py

+_range = range
+


This writing method affects the readability of the code. If it is to avoid conflicts with the name of the range parameter in histogramdd, the range parameter can be adjusted to ranges

… histogramdd_coco_dev

cocoshe · 2023-11-22T09:09:27Z

@jeff41404 I then removed all atol param in assert_allclose in unittest, I think it should be aligned to numpy.

jeff41404

LGTM

luotao1 · 2023-12-05T02:16:10Z

请提交对应的中文文档，CodeStyle 流水线没过，可以等 @sunzhongkai588 的文档review意见后一起修改。

luotao1 · 2023-12-05T03:40:53Z

这里的example示例与pytorch的一样，请 @sunzhongkai588 看下是否可以？

python/paddle/tensor/linalg.py

sunzhongkai588 · 2023-12-05T07:09:56Z

python/paddle/tensor/linalg.py

+
+            >>> x = paddle.to_tensor([[0., 1.], [1., 0.], [2.,0.], [2., 2.]])
+            >>> bins = [3,3]
+            >>> weights = paddle.to_tensor([1., 2., 4., 8.])
+            >>> paddle.histogramdd(x, bins=bins, weights=weights)
+            (Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [[0., 1., 0.],
+                    [2., 0., 0.],
+                    [4., 0., 8.]]), [Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [0.        , 0.66666669, 1.33333337, 2.        ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [0.        , 0.66666669, 1.33333337, 2.        ])])
+
+
+            >>> y = paddle.to_tensor([[0., 0.], [1., 1.], [2., 2.]])
+            >>> bins = [2,2]
+            >>> ranges = [0., 1., 0., 1.]
+            >>> density = True
+            >>> paddle.histogramdd(y, bins=bins, ranges=ranges, density=density)
+            (Tensor(shape=[2, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [[2., 0.],
+                    [0., 2.]]), [Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [0.        , 0.50000000, 1.        ]), Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [0.        , 0.50000000, 1.        ])])


加上 Examples 和 code block，并注意缩进，参考 API示例代码

如果分成多个代码块，要加上 :name:

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

sunzhongkai588 · 2023-12-06T03:30:13Z

python/paddle/tensor/linalg.py

+                   [0.        , 0.66666669, 1.33333337, 2.        ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+                   [0.        , 0.66666669, 1.33333337, 2.        ])])
+
+            :name: examp2


参考 API示例代码得这么写，同时注意缩进～

已修改，preview看上去ok啦，辛苦review~

sunzhongkai588

LGTM，请提供中文

add histogramdd api

685c3c9

paddle-bot bot added the contributor External developers label Oct 5, 2023

add some tests for different bins type

00883e3

Ligoml mentioned this pull request Oct 7, 2023

【PaddlePaddle Hackathon 5th】开源贡献个人挑战赛 #57262

Open