Notation

    Bayes' theorem

    Given Bayes formula

    we we define four different names representing each term: prior, posterior, likelihood and marginal likelihood.

    Prior

    The prior distribution represents our knowledge about our uncertain quantity (parameters) before some evidence is taken into account.

    Posterior

    The posterior distribution represents our knowledge about our uncertain quantity (parameters) after some evidence is taken into account.

    Likelihood

    The likelihood distribution describes how likely the data is given some uncertain quantity (parameter). It is a function of the parameters of the chosen statistical model, given by our prior, that describes the data we are interested in.

    Marginal likelihood

    The marginal likelihood may be referred to as the evidence. We can see that we get this distribution by marginalizing out theta from — integrating out theta. Thus we can write

    In the case we have updated our prior with our posterior the formula is turned into

    where represents the old data and the data we want to predict.

    The marginal likelihood is generally difficult to compute, except for a small number of distributions that have the relation conjugate prior. When this is not the case, we could use some kind of numerical integration, discretization and Monte Carlo method among others.

    Prior predictive

    The prior predictive density is the marginal likelihood using the prior

    Posterior predictive

    The posterior predictive density is the marginal likelihood using the posterior

    Both the prior predictive and the posterior predictive has a simple closed form if we have a conjugacy.

    Conjugacy

    If the posterior and the prior is of the same probability distribution family we say that we have a conjugacy and the prior and posterior distributions are called conjugate distributions. The prior is called a conjugate prior for the likelihood function.

    Some of the most common conjugacies:

    • Beta-Binomial
    • Exponential-Gamma
    • Multinomial-Dirichlet
    • Poisson-Gamma
    • Normal-Gamma
    • Normal-Normal

    Proportionality

    When calculating the posterior we can write

    where means proportional to theta to express that two expressions are identical ignoring any factor not involving theta. This is very useful because as we have concluded, can be tricky to compute. We could do this trick because the posterior will always integrate to 1, so there would be no loss in information if we multiply or divide by factors that do not depend on . These factors could be inserted again at the end of our proportional to calculations to fulfill the requirement that the posterior should integrate to 1.