-
Using Anglican as a library: Anglican is on clojars. If you want to use Anglican to develop your algorithms and applications, include [anglican "X.Y.Z"] (with a recent version instead of "X.Y.Z") into your project.
-
Proposing patches:
- Fork Anglican.
- Make changes in the fork. The code map explains the source tree layout and module contents.
- Create a pull request.
- If the pull request resolves an issue, refer to the issue in the comment.
- Use issue tracker to report bugs and suggest features.
When suggesting fixes/changes/improvements, stick to the following rules, or discuss before breaking them knowingly.
- Keep the line width within the limit of 80 characters strictly, below 70 characters whenever possible.
- Use consistent indentation. Whatever your editor (Vim, Emacs, LightTable) suggests is most probably good enough, but do not override the indentation manually on a case-by-case basis.
- In Lisp, a closing bracket or parenthesis does not traditionally start a line. Put closing brackets/parentheses at the end of expressions they close.
- Every namespace must have a documentation string describing the purpose and essential functionality of the namespace.
- Every function must have a documentation string explaining what the function does and returns.
- Do not leave dead code (commented out code fragments) in the committed source code. Comments are for humans. Use timbre (https://github.com/ptaoussanis/timbre) if you need debugging printouts in the code.
- Comments that take up their own line (or start after an opening square bracket) should start with a double semicolon (three, four, five for headers).
- Inline comments should use a single semicolon.
- Prepare enough tests to ensure that the code works correctly, and changes that break the code are immediately identified. Place unit tests for module anglican.foo into test/anglican/foo_test.clj (namespace anglican.foo-test).
- All tests much pass (lein test) before a change to the public repository.
Two abstractions of random sources are used in Anglican, a distribution and a random process, the former corresponding to 'elementary random procedure' (ERP), the latter related to 'exchangeable random procedure' (XRP).
Distributions and random processes are defined through
implementation of protocols anglican.runtime/distribution
and
anglican.runtime/random-process
. In addition, a multivariate
distribution may optionally implement protocol
anglican.runtime/multivariate-distribution
. Several
distributions are defined in anglican.runtime
, and other
distributions may be defined in terms of the 'basic'
distributions.
For example, the Bernoulli distribution can be defined in terms of uniform-continuous distribution:
(defn bernoulli
"Bernoulli distribution"
[p]
(let [dist (uniform-continuous 0. 1.)]
(reify
distribution
(sample* [this] (if (< (sample* dist) p) 1 0))
(observe* [this value]
(Math/log (case value
1 p
0 (- 1. p)
0.))))))
where sample*
and observe*
are two methods of the
distribution
protocol that must be provided.
A better and easier way to implement a distribution is macro
defdist
. In addition to defining the distribution function,
defdist
assigns each distribution a record type, as well
as arranges for pretty-printing of the distribution instances.
The above declaration using defdist
is:
(defdist bernoulli
"Bernoulli distribution"
[p] [dist (uniform-continuous 0. 1.)]
(sample* [this] (if (< (sample* dist) p) 1 0))
(observe* [this value]
(Math/log (case value
1 p
0 (- 1. p)
0.))))))
The first square brackets define the parameter list of the
function that creates the distribution instance. The second square
brackets define additional bindings (which may depend on the
parameters) used by the methods. Behind the scenes, defdist
does more than the reify
-based definition above: it also
defines a record type bernoulli-distribution
, and instantiates
print-method
for the type so that the distribution instance is
printed nicely. Consult the source code in
src/anglican/runtime.clj
for the
implementation of defdist
.
Likewise, defproc
is the macro for implementing random
processes. The two methods that must be implemented are
produce
and absorb
. produce
returns a distribution
corresponding to the current state of the process instance.
absorb
receives a sample and returns a new process instance
updated with the sample. For example, the Chinese Restaurant
process can be defined in the following way:
(defproc CRP
"Chinese Restaurant process"
[alpha] [counts []]
(produce [this]
(let [dist (discrete (conj counts alpha))]
(reify
distribution
(sample* [this] (sample* dist))
(observe* [this value]
(observe* dist (min (count counts) value))))))
(absorb [this sample]
(CRP alpha
(-> counts
;; Fill the counts with alpha (corresponding to
;; the zero count) until the new sample.
(into (repeat (+ (- sample (count counts)) 1) alpha))
(update-in [sample] inc)))))
Of course, instead of reifying the distribution inside the
produce
method, one can define a new distribution using
defdist
(as in the implementation of CRP in
src/anglican/runtime.clj).
An inference algorithm must implement the
anglican.inference/infer
multimethod. The method dispatches
on a keyword. If the algorithm is defined in a namespace
anglican.foo
, and the keyword is :foo
, the algorithm's
namespace will be loaded automatically by either
anglican.core/m!
or mrepl.core/doquery
. However, an algorithm
may be implemented in any namespace and loaded explicitly before
infer is called.
The simplest algorithm to implement is importance sampling:
(ns anglican.importance
(:refer-clojure :exclude [rand rand-int rand-nth])
(:use [anglican state inference]))
(derive ::algorithm :anglican.inference/algorithm)
(defmethod infer :importance [_ prog value & {}]
(letfn [(sample-seq []
(lazy-seq
(cons (:state (exec ::algorithm
prog value initial-state))
(sample-seq))))]
(sample-seq)))
For more examples, look at implementations of SMC, Particle Gibbs, Lightweight Metropolis-Hastings. The code map points at the Clojure modules containing the implementations.
Although not required, a convenient function for implementing
an inference algorithm is anglican.inference/exec
. This function
runs the probabilistic program until a so-called checkpoint,
a point in execution that requires intervention of the inference
algorithm. There are three types of checkpoints:
anglican.trap.sample
anglican.trap.observe
anglican.trap.result
corresponding to sample
and observe
probabilistic forms, as
well as to returning the final result of a program execution,
which encapsulates, among other things, the list of predicts
and the log weight of the sample. The multimethod
anglican.inference/checkpoint
should be used together with
exec
. Default implementations of the multimethod for each type
of checkpoint are provided by the anglican.inference
namespace,
and correspond to actions performed during importance sampling.