-
Notifications
You must be signed in to change notification settings - Fork 253
/
20230926_api_design_for_binomial.md
565 lines (452 loc) · 22.1 KB
/
20230926_api_design_for_binomial.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
# paddle.distribution.binomial 设计文档
|API名称 | paddle.distribution.binomial |
|---|---|
|提交作者<input type="checkbox" class="rowselector hidden"> | NKNaN |
|提交时间<input type="checkbox" class="rowselector hidden"> | 2023-09-26 |
|版本号 | V1.2 |
|依赖飞桨版本<input type="checkbox" class="rowselector hidden"> | develop版本 |
|文件名 | 20230926_api_design_for_binomial.md<br> |
# 一、概述
## 1、相关背景
在机器学习中, 概率编程是一个重要的分支, 它常被用于贝叶斯推理和一般的统计推断, 因此需要提升飞桨的概率分布 API 丰富度, 补充 `paddle.distribution.binomial` API
Binomial 二项分布是一种最基础的概率分布, 它可以被看作是多次投掷一枚可能是不公平的硬币而得到正面次数的概率分布, 其随机变量的取值结果也可以被看作是一系列彼此独立的伯努利实验结果的和; 它对于分析重复独立试验的结果很有用, 特别是分析在给定特定错误率的情况下达到特定阈值的概率, 因此二项分布也常被用于确定统计量的显著性
## 2、功能目标
参考 Paddle 现有 distribution,增加 Binomial 分布类的概率统计与随机采样,包括如下方法:
- mean 计算均值
- variance 计算方差
- sample 随机采样
- prob 概率密度
- log_prob 对数概率密度
- entropy 熵计算
- kl_divergence 相对熵计算
## 3、意义
丰富 Paddle 能够提供的分布类型,进一步完善 Paddle 框架以用于概率编程。
# 二、飞桨现状
Paddle 框架内定义了 Distribution 抽象基类,通过继承 Distribution,框架实现了 Uniform、Normal 等概率分布。目前 Paddle 中暂无 Binomial 概率分布,需要单独开发实现,实现思路与其他概率分布的相同。
# 三、业内方案调研
### Pytorch
PyTorch 中有 API `torch.distributions.binomial.Binomial(total_count=1, probs=None, logits=None, validate_args=None)`
```python
class Binomial(Distribution):
r"""
Creates a Binomial distribution parameterized by :attr:`total_count` and
either :attr:`probs` or :attr:`logits` (but not both). :attr:`total_count` must be
broadcastable with :attr:`probs`/:attr:`logits`.
Example::
>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> m = Binomial(100, torch.tensor([0 , .2, .8, 1]))
>>> x = m.sample()
tensor([ 0., 22., 71., 100.])
>>> m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8]))
>>> x = m.sample()
tensor([[ 4., 5.],
[ 7., 6.]])
Args:
total_count (int or Tensor): number of Bernoulli trials
probs (Tensor): Event probabilities
logits (Tensor): Event log-odds
"""
arg_constraints = {
"total_count": constraints.nonnegative_integer,
"probs": constraints.unit_interval,
"logits": constraints.real,
}
has_enumerate_support = True
def __init__(self, total_count=1, probs=None, logits=None, validate_args=None):
if (probs is None) == (logits is None):
raise ValueError(
"Either `probs` or `logits` must be specified, but not both."
)
if probs is not None:
(
self.total_count,
self.probs,
) = broadcast_all(total_count, probs)
self.total_count = self.total_count.type_as(self.probs)
else:
(
self.total_count,
self.logits,
) = broadcast_all(total_count, logits)
self.total_count = self.total_count.type_as(self.logits)
self._param = self.probs if probs is not None else self.logits
batch_shape = self._param.size()
super().__init__(batch_shape, validate_args=validate_args)
def expand(self, batch_shape, _instance=None):
new = self._get_checked_instance(Binomial, _instance)
batch_shape = torch.Size(batch_shape)
new.total_count = self.total_count.expand(batch_shape)
if "probs" in self.__dict__:
new.probs = self.probs.expand(batch_shape)
new._param = new.probs
if "logits" in self.__dict__:
new.logits = self.logits.expand(batch_shape)
new._param = new.logits
super(Binomial, new).__init__(batch_shape, validate_args=False)
new._validate_args = self._validate_args
return new
def _new(self, *args, **kwargs):
return self._param.new(*args, **kwargs)
@constraints.dependent_property(is_discrete=True, event_dim=0)
def support(self):
return constraints.integer_interval(0, self.total_count)
@property
def mean(self):
return self.total_count * self.probs
@property
def mode(self):
return ((self.total_count + 1) * self.probs).floor().clamp(max=self.total_count)
@property
def variance(self):
return self.total_count * self.probs * (1 - self.probs)
@lazy_property
def logits(self):
return probs_to_logits(self.probs, is_binary=True)
@lazy_property
def probs(self):
return logits_to_probs(self.logits, is_binary=True)
@property
def param_shape(self):
return self._param.size()
def sample(self, sample_shape=torch.Size()):
shape = self._extended_shape(sample_shape)
with torch.no_grad():
return torch.binomial(
self.total_count.expand(shape), self.probs.expand(shape)
)
def log_prob(self, value):
if self._validate_args:
self._validate_sample(value)
log_factorial_n = torch.lgamma(self.total_count + 1)
log_factorial_k = torch.lgamma(value + 1)
log_factorial_nmk = torch.lgamma(self.total_count - value + 1)
# k * log(p) + (n - k) * log(1 - p) = k * (log(p) - log(1 - p)) + n * log(1 - p)
# (case logit < 0) = k * logit - n * log1p(e^logit)
# (case logit > 0) = k * logit - n * (log(p) - log(1 - p)) + n * log(p)
# = k * logit - n * logit - n * log1p(e^-logit)
# (merge two cases) = k * logit - n * max(logit, 0) - n * log1p(e^-|logit|)
normalize_term = (
self.total_count * _clamp_by_zero(self.logits)
+ self.total_count * torch.log1p(torch.exp(-torch.abs(self.logits)))
- log_factorial_n
)
return (
value * self.logits - log_factorial_k - log_factorial_nmk - normalize_term
)
def entropy(self):
total_count = int(self.total_count.max())
if not self.total_count.min() == total_count:
raise NotImplementedError(
"Inhomogeneous total count not supported by `entropy`."
)
log_prob = self.log_prob(self.enumerate_support(False))
return -(torch.exp(log_prob) * log_prob).sum(0)
def enumerate_support(self, expand=True):
total_count = int(self.total_count.max())
if not self.total_count.min() == total_count:
raise NotImplementedError(
"Inhomogeneous total count not supported by `enumerate_support`."
)
values = torch.arange(
1 + total_count, dtype=self._param.dtype, device=self._param.device
)
values = values.view((-1,) + (1,) * len(self._batch_shape))
if expand:
values = values.expand((-1,) + self._batch_shape)
return values
```
`torch.distributions.binomial.Binomial`继承自 `torch.distributions.Distribution`
### TensorFlow
TensorFlow 中有 API `tfp.distributions.Binomial(
total_count,
logits=None,
probs=None,
validate_args=False,
allow_nan_stats=True,
name=None
)`
```python
class Binomial(
distribution.DiscreteDistributionMixin,
distribution.AutoCompositeTensorDistribution):
"""Binomial distribution.
This distribution is parameterized by `probs`, a (batch of) probabilities for
drawing a `1`, and `total_count`, the number of trials per draw from the
Binomial.
def __init__(self,
total_count,
logits=None,
probs=None,
validate_args=False,
allow_nan_stats=True,
name=None):
"""Initialize a batch of Binomial distributions.
Args:
total_count: Non-negative floating point tensor with shape broadcastable
to `[N1,..., Nm]` with `m >= 0` and the same dtype as `probs` or
`logits`. Defines this as a batch of `N1 x ... x Nm` different Binomial
distributions. Its components should be equal to integer values.
logits: Floating point tensor representing the log-odds of a
positive event with shape broadcastable to `[N1,..., Nm]` `m >= 0`, and
the same dtype as `total_count`. Each entry represents logits for the
probability of success for independent Binomial distributions. Only one
of `logits` or `probs` should be passed in.
probs: Positive floating point tensor with shape broadcastable to
`[N1,..., Nm]` `m >= 0`, `probs in [0, 1]`. Each entry represents the
probability of success for independent Binomial distributions. Only one
of `logits` or `probs` should be passed in.
validate_args: Python `bool`, default `False`. When `True` distribution
parameters are checked for validity despite possibly degrading runtime
performance. When `False` invalid inputs may silently render incorrect
outputs.
allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
(e.g., mean, mode, variance) use the value "`NaN`" to indicate the
result is undefined. When `False`, an exception is raised if one or
more of the statistic's batch members are undefined.
name: Python `str` name prefixed to Ops created by this class.
"""
parameters = dict(locals())
if (probs is None) == (logits is None):
raise ValueError(
'Construct `Binomial` with `probs` or `logits`, but not both.')
with tf.name_scope(name or 'Binomial') as name:
dtype = dtype_util.common_dtype([total_count, logits, probs], tf.float32)
self._total_count = tensor_util.convert_nonref_to_tensor(
total_count, dtype=dtype, name='total_count')
self._logits = tensor_util.convert_nonref_to_tensor(
logits, dtype=dtype, name='logits')
self._probs = tensor_util.convert_nonref_to_tensor(
probs, dtype=dtype, name='probs')
super(Binomial, self).__init__(
dtype=dtype,
reparameterization_type=reparameterization.NOT_REPARAMETERIZED,
validate_args=validate_args,
allow_nan_stats=allow_nan_stats,
parameters=parameters,
name=name)
@classmethod
def _parameter_properties(cls, dtype, num_classes=None):
return dict(
total_count=parameter_properties.ParameterProperties(
default_constraining_bijector_fn=parameter_properties
.BIJECTOR_NOT_IMPLEMENTED),
logits=parameter_properties.ParameterProperties(),
probs=parameter_properties.ParameterProperties(
default_constraining_bijector_fn=sigmoid_bijector.Sigmoid,
is_preferred=False))
@property
def total_count(self):
"""Number of trials."""
return self._total_count
@property
def logits(self):
"""Input argument `logits`."""
return self._logits
@property
def probs(self):
"""Input argument `probs`."""
return self._probs
def _event_shape_tensor(self):
return tf.constant([], dtype=tf.int32)
def _event_shape(self):
return tf.TensorShape([])
@distribution_util.AppendDocstring(_binomial_sample_note)
def _log_prob(self, counts):
total_count = tf.convert_to_tensor(self.total_count)
if self._logits is not None:
unnorm = _log_unnormalized_prob_logits(self._logits, counts, total_count)
else:
unnorm = _log_unnormalized_prob_probs(self._probs, counts, total_count)
norm = _log_normalization(counts, total_count)
return unnorm - norm
@distribution_util.AppendDocstring(_binomial_sample_note)
def _prob(self, counts):
return tf.exp(self._log_prob(counts))
def _cdf(self, counts):
total_count = tf.convert_to_tensor(self.total_count)
probs = self._probs_parameter_no_checks(total_count=total_count)
probs, counts = _maybe_broadcast(probs, counts)
return _bdtr(k=counts, n=total_count, p=probs)
@distribution_util.AppendDocstring(_binomial_sample_note)
def _sample_n(self, n, seed=None):
seed = samplers.sanitize_seed(seed, salt='binomial')
total_count = tf.convert_to_tensor(self._total_count)
if self._probs is None:
probs = self._probs_parameter_no_checks(total_count=total_count)
else:
probs = tf.convert_to_tensor(self._probs)
return _random_binomial(
shape=ps.convert_to_shape_tensor([n]),
counts=total_count,
probs=probs,
output_dtype=self.dtype,
seed=seed)[0]
def _mean(self, probs=None, total_count=None):
if total_count is None:
total_count = tf.convert_to_tensor(self._total_count)
if probs is None:
probs = self._probs_parameter_no_checks(total_count=total_count)
return total_count * probs
def _variance(self):
total_count = tf.convert_to_tensor(self._total_count)
probs = self._probs_parameter_no_checks(total_count=total_count)
return self._mean(probs=probs, total_count=total_count) * (1. - probs)
@distribution_util.AppendDocstring(
"""Note that when `(1 + total_count) * probs` is an integer, there are
actually two modes. Namely, `(1 + total_count) * probs` and
`(1 + total_count) * probs - 1` are both modes. Here we return only the
larger of the two modes.""")
def _mode(self):
total_count = tf.convert_to_tensor(self._total_count)
probs = self._probs_parameter_no_checks(total_count=total_count)
return tf.math.minimum(
total_count, tf.floor((1. + total_count) * probs))
def logits_parameter(self, name=None):
"""Logits computed from non-`None` input arg (`probs` or `logits`)."""
with self._name_and_control_scope(name or 'logits_parameter'):
return self._logits_parameter_no_checks()
def _logits_parameter_no_checks(self):
if self._logits is None:
probs = tf.convert_to_tensor(self._probs)
return tf.math.log(probs) - tf.math.log1p(-probs)
return tensor_util.identity_as_tensor(self._logits)
def probs_parameter(self, name=None):
"""Probs computed from non-`None` input arg (`probs` or `logits`)."""
with self._name_and_control_scope(name or 'probs_parameter'):
return self._probs_parameter_no_checks()
def _probs_parameter_no_checks(self, total_count=None):
if self._logits is None:
probs = tensor_util.identity_as_tensor(self._probs)
else:
probs = tf.math.sigmoid(self._logits)
# Suppress potentially nasty probs like `nan` b/c they don't matter where
# total_count == 0.
if total_count is None:
total_count = self.total_count
return tf.where(total_count > 0, probs, 0)
def _default_event_space_bijector(self):
return
def _parameter_control_dependencies(self, is_init):
if not self.validate_args:
return []
assertions = []
if is_init != tensor_util.is_ref(self.total_count):
total_count = tf.convert_to_tensor(self.total_count)
msg1 = 'Argument `total_count` must be non-negative.'
msg2 = 'Argument `total_count` cannot contain fractional components.'
assertions += [
assert_util.assert_non_negative(total_count, message=msg1),
distribution_util.assert_integer_form(total_count, message=msg2),
]
if self._probs is not None:
if is_init != tensor_util.is_ref(self._probs):
probs = tf.convert_to_tensor(self._probs)
one = tf.constant(1., probs.dtype)
assertions += [
assert_util.assert_non_negative(
probs, message='probs has components less than 0.'),
assert_util.assert_less_equal(
probs, one, message='probs has components greater than 1.')
]
return assertions
def _sample_control_dependencies(self, counts):
"""Check counts for proper values."""
assertions = []
if not self.validate_args:
return assertions
assertions.append(distribution_util.assert_casting_closed(
counts, target_dtype=tf.int32,
message='counts cannot contain fractional components.'))
assertions.append(assert_util.assert_non_negative(
counts, message='counts must be non-negative.'))
assertions.append(
assert_util.assert_less_equal(
counts, self.total_count,
message=('Sampled counts must be itemwise less than '
'or equal to `total_count` parameter.')))
return assertions
```
`tfp.distributions.Binomial` 继承自 `tfp.distribution.DiscreteDistributionMixin` 和 `tfp.distribution.AutoCompositeTensorDistribution`
从 tfp 的注释来看 `DiscreteDistributionMixin` 类表示分布为离散型分布, 在对离散型随机变量的 distribution 进行 transformation 后计算 log_prob 的计算逻辑与连续型随机变量的不同。因为针对连续型随机变量 $X$ 做变换 $Y=T(X)$, $T$是双射, $X$ 的概率密度函数为 $f(\cdot)$ , 则有 $\int_{D} f(y) dy = \int_{T^{-1}(D)} f(x) |\frac{dy}{dx}| dx$ , 因此对于连续型分布变换后的 log_prob(y) 等于 log_prob( $T^{-1}(y)$ ) + inverse_log_det_jacobian(y) 但对于离散型随机变量并不存在 $|\frac{dy}{dx}|$ , 它变换后的 log_prob(y) 就等于 log_prob( $T^{-1}(y)$ )
# 四、对比分析
Pytorch 与 Tensorflow_probability 实现方式大体类似, 都是通过基本的概率计算得到相应的概率属性。而在 Tensorflow_probability 中 transformed distribution 对离散型随机变量和连续型随机变量的 log_prob 计算方法有所区分, Pytorch 对此目前并未做区分, Paddle 现有 API 目前也并未做区分。 所以建议先不加以区分, 后续如果需要再对所有离散型随机变量统一调整。
# 五、设计思路与实现方案
## 命名与参数设计
```python
paddle.distribution.binomial(total_count, probs)
```
参数 `total_count`, `probs` 分别为 Binomial 分布的两个参数。
例如,随机变量 $X$ 服从 Binomial 分布,即 $X \sim Binomial(n, p)$ ,对应的参数 `total_count`$=n$ , `probs`$=p$ 。
## 底层OP设计
底层需要新增OP - binomial_op 二项分布采样。
1. 涉及到修改或新增的文件有:
```
paddle/phi/api/yaml/ops.yaml
paddle/phi/infermeta/binary.cc
paddle/phi/infermeta/binary.h
paddle/phi/kernels/binomial_kernel.h
paddle/phi/kernels/cpu/binomial_kernel.cc
paddle/phi/kernels/funcs/binomial_functor.h
paddle/phi/kernels/gpu/binomial_kernel.cu
python/paddle/tensor/random.py
test/legacy_test/test_binomial_op.py
```
2.二项分布采样OP的API设计:
c++ API:
- 入参:count,prob,均为 Tensor 类型,且需要保持形状一致;cpu kernel可接受 float,double 类型,gpu kernel可接受 float,double,float16,bfloat16 类型。
- 返回类型:Tensor,数据类型为 int64。
python API:
- 入参:count,prob,均为 Tensor 类型,形状可以不一致,但需要能够 broadcast 为相同的形状。
- 返回类型:Tensor,数据类型为 int64。
3.二项分布采样OP实现:
cpu_kernel 与 gpu_kernel 的实现相似,都是在 $n\cdot\min{(p, 1-p)}<10$ 时利用 uniform 随机变量进行逆采样,在 $n\cdot\min{(p, 1-p)}>=10$ 时通过 BTRS 算法进行采样。
gpu_kernel 的写法参照 ``poisson_kernel.cu`` ,用 ``CUDA_KERNEL_LOOP_TYPE`` 对输入的两个 Tensor 进行逐元素迭代,计算得到对应位置的采样结果。
## API实现方案
新增 `Binomial` 类
```python
class Binomial(Distribution):
def __init__(self, total_count, probs):
super().__init__(batch_shape=self.probs.shape, event_shape=())
...
```
`Binomial` 类的初始化参数有 `total_count` 和 `probs` ,类包含的方法及实现方案如下:
记参数 `total_count`$=n$ , `probs`$=p$ 。
- `mean` 计算均值
均值的计算方法: $n p$
- `variance` 计算方差
方差的计算方法: $n p (1 - p)$
- `entropy` 熵计算
熵的计算方法: $H = - \sum_x f(x) \log{f(x)}$
- `kl_divergence` 相对熵计算
相对熵的计算方法: $D_{KL}(n_1, p_1, n_2, p_2) = \sum_x f_1(x) \log{\frac{f_1(x)}{f_2(x)}}$
- `sample` 随机采样
采样方法:实现 ``paddle.binomial`` 原生采样方法后直接调用。
- `prob` 概率密度
概率密度计算方法: $f(x;n,p) = \frac{n!}{x!(n-x)!}p^{x}(1-p)^{n-x}$
- `log_prob` 对数概率密度
对数概率密度计算方法: $\log[f(x;n,p)] = \log[\frac{n!}{x!(n-x)!}] + x \log p + (n-x) \log (1-p)$
# 六、测试和验收的考量
`Binomial` 类测试以 Scipy 作为辅助,验证API的正确性。
1. `mean` 和 `variance` 直接验证即可。
2. `entropy`、`prob`、`log_prob` 分别用 `scipy.stats.binom.entropy`、`scipy.stats.binom.pmf`、`scipy.stats.binom.logpmf` 进行验证。
3. 使用 `Binomial` 类的 `sample` 方法生成5000个样本,测试这些这样的均值和标准差是否正确。(参考的是目前 `geometric`、`gumbel`、`laplace`、`lognormal`、`multinomial`、`normal` 的测试方法)
4. `kl_divergence` 通过 `scipy.stats.binom.logpmf` 重写kl散度的计算逻辑来进行验证。
# 七、可行性分析和排期规划
- 排期规划
9月27日~10月4日完成API开发与调试。
10月5日~10月12日完成测试代码的开发。
# 八、影响面
本次任务影响的模块如下:
1. `paddle.distribution`
新增 binomial.py 文件。
2. `./test/distribution`
新增 test_distribution_binomial.py 和 test_distribution_binomial_static.py 文件。
# 名词解释
- Binomial 分布
若随机变量 $X \sim Binomial(n, p)$,则 $X$ 的概率密度函数为
$$f(x;n,p) = \frac{n!}{x!(n-x)!}p^{x}(1-p)^{n-x}$$
# 附件及参考资料
1. [Tensorflow 的 Binomial 文档](https://tensorflow.google.cn/probability/api_docs/python/tfp/distributions/Binomial)
2. [Pytorch 的 Binomial 文档](https://pytorch.org/docs/stable/distributions.html#binomial)
3. [BTRS](https://research.wu.ac.at/files/18967500/document.pdf)
4. [Numpy 的 Binomial 文档](https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html)