Random variable

    A random variable is a variable whose outcome depends on a random event. In probability theory a random variable is understood as a measurable function defined on the probability space . A random variable maps from the sample space to any measurable space with some probability.

    Probability mass function

    The probability mass function, also known as the discrete density function, is a function that gives the exact probability of a discrete random variable to some value. It differs from the probability density function in that it is associated with discrete random variables instead of continuous random variables.

    Probability density function

    The probability density function (PDF) must be integrated over an interval to yield the probability. It is defined as follows

    In the continuous case the probability of a point always gives the probability 0, , which is why we need to evaluate it over an interval instead.

    Stochastic process

    A stochastic process is a random process that is usually defined as a family of random variables. Thus, each random variable takes the value from the same mathematical space known as the state space . There are two types of stochastic processes, that is, discrete-time and continuous-time stochastic processes. Examples of stochastic processes are the Bernoulli process [1] and random walk among others. The Bernoulli process can be looked as flipping a coin multiple times where the sequence of flipped coins represents several independent and identically distributed (i.i.d) Bernoulli random variables.

    Statistical inference

    Statistical inference is the process of inferring properties of an underlying distribution of probability using data analysis. Creating logical claims that is justified by the data.

    Classical inference

    In classical inference (Frequentist) parameters are fixed or non-random quantities and the probability only concerns the data. For a Frequentist the probability of an event is the proportion of that event in the long run.

    Bayesian inference

    Bayesian inference is a method used to update the probability of a model using Bayes' theorem

    Contrary to how classical inference work, Bayesian inference take into account the uncertainty of the parameters when creating the model. The parameters themselves are random variables. The Bayesian approach bases its decision on prior knowledge.

    Kolmogorov axioms

    The Kolmogorov axioms consist of of three axioms that is the foundation of probability theory.

    First axiom

    The probability of an event is always positive.

    where is the event space.

    Second axiom

    The probability that at least one of outcomes in the sample space will occur has the probability of 1.

    where is the sample space.

    Third axiom

    Any countable sequence of mutually exclusive events satisfies

    Conditional probability

    The conditional probability of event occurring after event is defined as

    Independent events

    Two events are independent if

    Thus the following holds for conditional independent events

    Total law of probability

    Given an event , what is the probability of given every single ? The total law of probability states that if we have a sequence of events that partitions the sample space the following holds

    Joint distributions

    Given multiple different random variables defined on the same probability space is a probability distribution that gives the probability that each random variable falls into a particular set of values.

    It could be written in terms of conditional probabilities with the chain rule property

    Chain rule

    The chain rule of probabilities can be described by the following example

    Expectation

    Expectation is the expected value a distribution takes on, the most common outcome.

    Discrete

    Continuous

    Conditional discrete

    Conditional continuous

    Total law of expectation discrete

    Total law of expectation continuous

    In both the discrete and the continuous case they could be written as

    Linearity of expectation

    Linearity of expectation is a property that states that the expected value of the sum of random variables is equal to individually sum the expectation of each random variable regardless if they are independent.

    More generally the following holds

    Variance

    Variance is defined as

    Total law of variance

    Covariance

    Covariance is defined as

    However this is susceptible to catastrophic cancellation [2] , which means that subtracting good approximations of two nearby numbers may yield a bad approximation to the difference of the original numbers.

    Correlation

    Correlation is defined as

    where , , and represents the standard deviation.

    Order statistics

    The kth order statistic of a statistical sample is equal to its kth-smallest value.

    Hoeffding inequality

    https://en.wikipedia.org/wiki/Hoeffding%27s_inequality

    Hoeffding inequality states an upper bound on the probability that the sum

    Boole's inequality

    https://en.wikipedia.org/wiki/Boole%27s_inequality

    Boole's inequality is also known as the union bound. It states that for any finite set of events the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events.

    References