Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.35】为 Paddle 新增 histogramdd API -part #57880

Merged
merged 21 commits into from
Dec 11, 2023
Merged

【Hackathon 5th No.35】为 Paddle 新增 histogramdd API -part #57880

merged 21 commits into from
Dec 11, 2023

Conversation

cocoshe
Copy link
Contributor

@cocoshe cocoshe commented Oct 5, 2023

@paddle-bot
Copy link

paddle-bot bot commented Oct 5, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 5, 2023
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
self.bins = tuple([paddle.to_tensor(bin) for bin in self.bins])
hist, edges = paddle.histogramdd(self.sample_dy, bins=self.bins, weights=self.weights_dy, range=self.range, density=self.density)

np.testing.assert_allclose(self.expect_hist, hist.numpy(), rtol=1e-4, atol=1e-4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

精度不能完全一致吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单侧中的expect_hist是手动跑torch然后把结果贴过来的,看样子torch是自动做了精度精确,例如:
torch中计算结果

>>> import torch
>>> sample = torch.tensor([[0., 1.], [1., 0.], [2., 0.], [2., 2.]])
>>> bins = [3, 3]
>>> weights = torch.tensor([1., 2., 4., 8.])
>>> torch.histogramdd(sample, bins=bins, weight=weights)
torch.return_types.histogramdd(
hist=tensor([[0., 1., 0.],
        [2., 0., 0.],
        [4., 0., 8.]]),
bin_edges=(tensor([0.0000, 0.6667, 1.3333, 2.0000]), tensor([0.0000, 0.6667, 1.3333, 2.0000])))

paddle计算结果:

>>> import paddle
>>> sample = paddle.to_tensor([[0., 1.], [1., 0.], [2., 0.], [2., 2.]])
>>> bins = [3,3]
>>> weights = paddle.to_tensor([1., 2., 4., 8.])
>>> paddle.histogramdd(sample, bins=bins, weights=weights)
W1007 19:39:17.011324  1623 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8
W1007 19:39:17.042716  1623 gpu_resources.cc:149] device: 0, cuDNN Version: 8.6.
(Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0., 1., 0.],
        [2., 0., 0.],
        [4., 0., 8.]]), [Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.        , 0.66666669, 1.33333337, 2.        ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.        , 0.66666669, 1.33333337, 2.        ])])

现在我重新在torch.set_printoptions中设置了打印精度为8位,但是在assert_allclose中还是需要设置一下atol(打印相同,但是底层值不同,atol不能达到默认的0),如果要完全一致的话,应该就要用numpy中的api来计算,但是numpy中的histogramdd比pytorch中支持的情况更少一些:例如numpy中仅支持2d输入,但是pytorch支持多维(2d及以上);numpy中bins仅支持int和int[],pytorch支持int,int[],和tuple of tensors。所以目前是通过pytorch的打印输出直接作为目标输出,会有绝对误差。
现改为:

np.testing.assert_allclose(hist_out, self.expect_hist, atol=1e-8)

python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved

# weights
__check_weights(sample, weights)
D = sample.shape[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否需要对sample的shape有判断?是所有的shape都能支持吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

支持维度大于等于2,新加了判断,辛苦review~

@zxcd
Copy link
Contributor

zxcd commented Oct 12, 2023

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

@cocoshe
Copy link
Contributor Author

cocoshe commented Oct 12, 2023

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

感谢review,我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛?如果是的话一般怎么判断通过测试呢?(输入如果错误的话,直接就报错了)

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 15, 2023

Sorry to inform you that 90488ae's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@cocoshe
Copy link
Contributor Author

cocoshe commented Oct 17, 2023

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。

测试代码中需要加入对应报错的验证。

补充了一些type检测,并且添加了error test,辛苦review~

@luotao1
Copy link
Contributor

luotao1 commented Oct 19, 2023

需要通过 PR-CI-Codestyle-Check 流水线的格式检查

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 27, 2023

Sorry to inform you that 0ef396b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@zxcd
Copy link
Contributor

zxcd commented Nov 2, 2023

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

感谢review,我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛?如果是的话一般怎么判断通过测试呢?(输入如果错误的话,直接就报错了)

可以尝试使用类似test/legacy_test/test_reduce_op.py中的方法
self.assertRaises(TypeError, paddle.sum, x2)

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 2, 2023

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

代码中建议添加数据类型支持判断,即过滤不支持的数据类型。另外需增加sample和weight数据类型是否一致的判断。除了已经check_type的数据类型,最好其他的输入也能check下。
测试代码中需要加入对应报错的验证。

感谢review,我想问一下"测试代码中需要加入对应报错的验证。"是指给错误的类型能正常报错嘛?如果是的话一般怎么判断通过测试呢?(输入如果错误的话,直接就报错了)

可以尝试使用类似test/legacy_test/test_reduce_op.py中的方法 self.assertRaises(TypeError, paddle.sum, x2)

嗯嗯谢谢,在前面的add some type check && add error test
这个commit已经添加了,辛苦review~

Copy link
Contributor

@zxcd zxcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM



def histogramdd(
sample, bins=10, range=None, density=False, weights=None, name=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to API naming conventions, enter the name of Tensor using x, and the rfc should also be modified synchronously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will modify it soon~

Comment on lines 32 to 33
_range = range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writing method affects the readability of the code. If it is to avoid conflicts with the name of the range parameter in histogramdd, the range parameter can be adjusted to ranges

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 22, 2023

@jeff41404 I then removed all atol param in assert_allclose in unittest, I think it should be aligned to numpy.

jeff41404
jeff41404 previously approved these changes Dec 5, 2023
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Dec 5, 2023

请提交对应的中文文档,CodeStyle 流水线没过,可以等 @sunzhongkai588 的文档review意见后一起修改。

@luotao1
Copy link
Contributor

luotao1 commented Dec 5, 2023

这里的example示例与pytorch的一样,请 @sunzhongkai588 看下是否可以?

python/paddle/tensor/linalg.py Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved
Comment on lines 3894 to 3916

>>> x = paddle.to_tensor([[0., 1.], [1., 0.], [2.,0.], [2., 2.]])
>>> bins = [3,3]
>>> weights = paddle.to_tensor([1., 2., 4., 8.])
>>> paddle.histogramdd(x, bins=bins, weights=weights)
(Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[[0., 1., 0.],
[2., 0., 0.],
[4., 0., 8.]]), [Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0. , 0.66666669, 1.33333337, 2. ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0. , 0.66666669, 1.33333337, 2. ])])


>>> y = paddle.to_tensor([[0., 0.], [1., 1.], [2., 2.]])
>>> bins = [2,2]
>>> ranges = [0., 1., 0., 1.]
>>> density = True
>>> paddle.histogramdd(y, bins=bins, ranges=ranges, density=density)
(Tensor(shape=[2, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[[2., 0.],
[0., 2.]]), [Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0. , 0.50000000, 1. ]), Tensor(shape=[3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0. , 0.50000000, 1. ])])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上 Examples 和 code block,并注意缩进,参考 API示例代码

如果分成多个代码块,要加上 :name:

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
cocoshe and others added 3 commits December 5, 2023 15:18
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
[0. , 0.66666669, 1.33333337, 2. ]), Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0. , 0.66666669, 1.33333337, 2. ])])

:name: examp2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考 API示例代码 得这么写,同时注意缩进~
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,preview看上去ok啦,辛苦review~
image

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,请提供中文

@luotao1 luotao1 merged commit 47953ac into PaddlePaddle:develop Dec 11, 2023
28 of 29 checks passed
@luotao1 luotao1 changed the title 【Hackathon 5th No.35】为 Paddle 新增 histogramdd API 【Hackathon 5th No.35】为 Paddle 新增 histogramdd API -part Dec 11, 2023
@cocoshe cocoshe deleted the histogramdd_coco_dev branch December 11, 2023 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants