Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.18】Add Binomial and Poisson API -part #57856

Merged
merged 29 commits into from
Dec 22, 2023

Conversation

NKNaN
Copy link
Contributor

@NKNaN NKNaN commented Sep 29, 2023

@paddle-bot
Copy link

paddle-bot bot commented Sep 29, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Sep 29, 2023
@NKNaN NKNaN changed the title 【Hackathon 5th】No.19: Add Binomial and Poisson API 【Hackathon 5th No.19】Add Binomial and Poisson API Oct 1, 2023
@NKNaN NKNaN changed the title 【Hackathon 5th No.19】Add Binomial and Poisson API 【Hackathon 5th No.18】Add Binomial and Poisson API Oct 11, 2023
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 17, 2023

Sorry to inform you that 4c1bfd4's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link

paddle-ci-bot bot commented Nov 1, 2023

Sorry to inform you that 8a1f336's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@NKNaN
Copy link
Contributor Author

NKNaN commented Nov 16, 2023

可以麻烦两位老师看一下 CI-Coverage 和 Paddle-Doc-Preview 是哪里的问题吗?

@luotao1
Copy link
Contributor

luotao1 commented Nov 17, 2023

image Coverage可以点进去看覆盖率不够,Doc-Preview可不用管

@NKNaN
Copy link
Contributor Author

NKNaN commented Nov 20, 2023

请问两位老师还有什么需要修改的地方吗?如果没什么问题我就开始写中文文档。

@luotao1
Copy link
Contributor

luotao1 commented Nov 21, 2023

如果没什么问题我就开始写中文文档。

可以先同步写中文文档

"""

def __init__(self, total_count, probability):
self.dtype = 'float32'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(probability, np.ndarray):
probability = paddle.to_tensor(probability)
total_count = paddle.cast(total_count, dtype=self.dtype)
probability = paddle.cast(probability, dtype=self.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_tensor 可以指定dtype参数,此处代码可以简化

if isinstance(total_count, np.ndarray):
total_count = paddle.to_tensor(total_count)
if isinstance(probability, np.ndarray):
probability = paddle.to_tensor(probability)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上述文档中描述参数类型是 float|Tensor,实际支持类型和文档描述保持一致;此处建议去掉numpy类型支持

Tensor: Sampled data with shape `sample_shape` + `batch_shape`.
"""
if not isinstance(shape, Iterable):
raise TypeError('sample shape must be Iterable object.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterable 和 Sequence 语义略有区别 https://stackoverflow.com/questions/72157296/what-is-the-difference-between-type-hinting-a-variable-as-an-iterable-versus-a-s ,此处应该是只需要传入Sequence

self.probability, shape=output_shape
)
if in_dynamic_mode():
broadcast_func = np.frompyfunc(binomial_sample, nin=2, nout=1)
Copy link
Contributor

@cxxly cxxly Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用Paddle原生采样算法,numpy仅支持cpu采样; Paddle GPU底层基础采样算法使用curand进行加速,大规模数据采样性能远超过CPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binomial的原生采样算法我目前没有找到直接对应哪个API,如果用 paddle.bernoulli() 来做的话,对于这种情况: Binomial( [ n_1, n_2 ], [ p_1, p_2 ] ) 的采样可能会比较麻烦,不知道这块要怎么做好一点

Copy link
Contributor

@cxxly cxxly Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binomial可以通过均匀分布逆采样实现, 参考pytorch/aten/src/ATen/native/Distributions.h sample_binomial;你的代码其实已经基于numpy实现了类似逻辑

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个题目涉及到新增c++ Kernel,难度会有所提升,已和组委会同学沟通 @luotao1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我试着写一下

break
u -= f
y += 1
f *= g / y - p_q
Copy link
Contributor

@cxxly cxxly Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用Paddle原生采样算法及逆变换

/ 166320.0
)
if A <= tmp:
return n - y if p > 0.5 else y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

"""

def __init__(self, rate):
self.dtype = 'float32'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

super().__init__(batch_shape)

def _to_tensor(self, rate):
"""Convert the input parameters into tensors with dtype = float32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with global default dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

rate = paddle.to_tensor([rate], dtype=self.dtype)
if isinstance(rate, np.ndarray):
rate = paddle.to_tensor(rate)
rate = paddle.cast(rate, dtype=self.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

x=output_rate,
out=output,
)
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,使用paddle原生采样算法及逆变换实现

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,用 paddle.poisson()

# step H
if c * abs(u) <= py * math.exp(px + E) - fy * math.exp(fx + E):
break
return K
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

np.ndarray, binomial samples.
"""
broadcast_func = np.frompyfunc(poisson_sample, nin=1, nout=1)
return np.array(broadcast_func(rr), dtype='float32')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

self._np_mean(),
rtol=config.RTOL.get(str(self.probability.dtype)),
atol=config.ATOL.get(str(self.probability.dtype)),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用 scipy.stats.binom.mean

self._np_variance(),
rtol=config.RTOL.get(str(self.probability.dtype)),
atol=config.ATOL.get(str(self.probability.dtype)),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用 scipy.stats.binom.var

@NKNaN
Copy link
Contributor Author

NKNaN commented Dec 19, 2023

请解决下冲突

已解决

cxxly
cxxly previously approved these changes Dec 19, 2023
Copy link
Contributor

@cxxly cxxly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

[2.94053698, 3.00781751, 2.51124287])
"""

def __init__(self, total_count, probability):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use probs instead of probability will be better? consistent with other similar APIs like paddle.distribution.Bernoulli, paddle.distribution.Geometric and paddle.distribution.Multinomial?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rfc use probs which is inconsistent with the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改


In the above equation:

* :math:`total_count = n`: is the size, meaning the total number of Bernoulli experiments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :math:`total_count = n`: is the size, meaning the total number of Bernoulli experiments.
* :math:`total\_count = n`: is the size, meaning the total number of Bernoulli experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 changed the title 【Hackathon 5th No.18】Add Binomial and Poisson API 【Hackathon 5th No.18】Add Binomial and Poisson API -part Dec 21, 2023
@luotao1 luotao1 merged commit ac69600 into PaddlePaddle:develop Dec 22, 2023
29 checks passed
@NKNaN NKNaN deleted the ayase/develop branch February 2, 2024 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants