Random variable
A random variable is a variable whose outcome depends on a random event. In probability theory a random variable is understood as a measurable function defined on the probability space . A random variable maps from the sample space to any measurable space with some probability.
Probability mass function
The probability mass function, also known as the discrete density function, is a function that gives the exact probability of a discrete random variable to some value. It differs from the probability density function in that it is associated with discrete random variables instead of continuous random variables.
Probability density function
The probability density function (PDF) must be integrated over an interval to yield the probability. It is defined as follows
In the continuous case the probability of a point always gives the probability 0, , which is why we need to evaluate it over an interval instead.
Stochastic process
A stochastic process is a random process that is usually defined as a family of random variables. Thus, each random variable takes the value from the same mathematical space known as the state space . There are two types of stochastic processes, that is, discrete-time and continuous-time stochastic processes. Examples of stochastic processes are the Bernoulli process [1] and random walk among others. The Bernoulli process can be looked as flipping a coin multiple times where the sequence of flipped coins represents several independent and identically distributed (i.i.d) Bernoulli random variables.
Statistical inference
Statistical inference is the process of inferring properties of an underlying distribution of probability using data analysis. Creating logical claims that is justified by the data.
Classical inference
In classical inference (Frequentist) parameters are fixed or non-random quantities and the probability only concerns the data. For a Frequentist the probability of an event is the proportion of that event in the long run.
Bayesian inference
Bayesian inference is a method used to update the probability of a model using Bayes' theorem
Contrary to how classical inference work, Bayesian inference take into account the uncertainty of the parameters when creating the model. The parameters themselves are random variables. The Bayesian approach bases its decision on prior knowledge.
Kolmogorov axioms
The Kolmogorov axioms consist of of three axioms that is the foundation of probability theory.
First axiom
The probability of an event is always positive.
where is the event space.
Second axiom
The probability that at least one of outcomes in the sample space will occur has the probability of 1.
where is the sample space.
Third axiom
Any countable sequence of mutually exclusive events satisfies
Conditional probability
The conditional probability of event occurring after event is defined as
Independent events
Two events are independent if
Thus the following holds for conditional independent events
Total law of probability
Given an event , what is the probability of given every single ? The total law of probability states that if we have a sequence of events that partitions the sample space the following holds
Joint distributions
Given multiple different random variables defined on the same probability space is a probability distribution that gives the probability that each random variable falls into a particular set of values.
It could be written in terms of conditional probabilities with the chain rule property
Chain rule
The chain rule of probabilities can be described by the following example
Expectation
Expectation is the expected value a distribution takes on, the most common outcome.
Discrete
Continuous
Conditional discrete
Conditional continuous
Total law of expectation discrete
Total law of expectation continuous
In both the discrete and the continuous case they could be written as
Linearity of expectation
Linearity of expectation is a property that states that the expected value of the sum of random variables is equal to individually sum the expectation of each random variable regardless if they are independent.
More generally the following holds
Variance
Variance is defined as
Total law of variance
Covariance
Covariance is defined as
However this is susceptible to catastrophic cancellation [2] , which means that subtracting good approximations of two nearby numbers may yield a bad approximation to the difference of the original numbers.
Correlation
Correlation is defined as
where , , and represents the standard deviation.
Order statistics
The kth order statistic of a statistical sample is equal to its kth-smallest value.
Hoeffding inequality
https://en.wikipedia.org/wiki/Hoeffding%27s_inequality
Hoeffding inequality states an upper bound on the probability that the sum
Boole's inequality
https://en.wikipedia.org/wiki/Boole%27s_inequality
Boole's inequality is also known as the union bound. It states that for any finite set of events the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events.