Markov chains

    A markov chain is a sequence of random variables, , with the following property

    for all that is included in the state space of the Markov chain. The state space is a discrete set.

    Stochastic matrix

    A stochastic matrix is a square matrix, , that satisfy

    1. for all
    2. For each row

    N-step transition matrix

    Let be a Markov chain with transition matrix . Then is the n-step transition matrix and we can calculate the probability that we go from state to state in n steps

    Distribution of Markov chains

    The sequence of random variables is generally not identically distributed random variables in the Markov chain. If our Markov chains has the transition matrix and the initial distribution , the distribution for is

    Namely,

    Markov property

    The Markov property states that the past and future is independent given the present. The present here could be looked as the most recent past. Let be a Markov chain. Then for all

    for all and

    Joint distribution

    The marginal distributions of Markov chains are determined by the initial distribution and the transition matrix . If we consider the join probability of

    The resulting event is then moving to in five steps, then to in one step and then to in three steps. The resulting probability is calculated with

    This is obtained by combining the Markov property with conditional probability and time-homogeneity. Formally, let be a Markov chain with transition matrix and initial distribution . Then for all and states

    Stationary distribution

    A stationary distribution is such a distribution , that if the distribution over states at step is , then also the distribution over states at step is . That is,

    To find a stationary distribution, the above equation is redundant, and we must use the fact that . Then we are able to obtained the unique solution.

    Limiting distribution

    A limiting distribution is such a distribution that no matter what the initial distribution is, the distribution over states converges to as the number of steps goes to infinity:

    Also when a limiting distribution exists, it is always a stationary distribution. However, the converse is not true, a stationary distribution is not always a limiting distribution. Think of a state that is stationary but it is not certain that the chain will converge to that state given some other initial distribution.

    Positive matrix

    A matrix is said to be positive if all the entries of the matrix is positive.

    Regular transition matrix

    A transition matrix is said to be regular if some power of is positive.

    Limit theorem for regular Markov chains

    If the transition matrix is regular, a limiting distribution exists. It is unique as well. All of the limiting probabilities are positive.

    Communication class

    If a Markov chain has exactly one communication class, all states communicate with each other. Think of it as if every state can eventually communicate with each other. If we have multiple communication classes one state may not be able to communicate with another state in steps.

    Closed communication class

    A communication class is closed if it consists of all recurrent states.

    Irreducibility

    A Markov chain is called irreducible if it has exactly one communication class. Thus, if the matrix is regular we know it is also irreducible. Finite irreducible Markov chains have unique positive stationary distributions if it is aperiodic as well.

    Limit theorem for finite irreducible Markov chains

    Let be the expected return time to j. Then and the vector with is a stationary distribution. All finite regular Markov chains are finite irreducible Markov chains. Furthermore,

    Recurrent state

    A recurrent state has the property that a Markov chain starting at this state eventually returns to that state.

    Transient state

    A transient state has the property that a Markov chain starting at this state has a positive probability of never returning to this state.

    Periodicity

    The states of a communication class all have the same period. The period of a state is defined as

    Thus, if a Markov chain is irreducible and all states have a period greater to one, the Markov chain is periodic.

    Aperiodic

    When the period is the state is said to be aperiodic. Thus, if a Markov chain is irreducible and all states have a period equal to one, the Markov chain is aperiodic.

    Ergodic

    A Markov chain is said to be ergodic if it is irreducible, aperiodic and all states have finite expected return times (all states are positive recurrent). Ergodic Markov chains have positive limiting distributions. That is, let be an ergodic Markov chain. Then there exists a unique positive stationary distribution which also is the limiting distribution for the Markov chain.

    Fundamental limit theorem of ergodic Markov chains

    There exists a unique positive stationary distribution that is the limiting distribution of the Markov chain.

    Time reversibility

    An irreducible Markov chain is said to be time reversible if

    where is a stationary distribution and is the transition matrix. The equation above is called the detailed balance condition.

    Absorbing chains

    A Markov chain is called an absorbing chain if it has at least one absorbing state, that is, a state that is . When dealing with absorbing Markov chains we usually split the matrix into different partitions and write it like

    where is a matrix, is a matrix, is a matrix full of 0s, and is a identity matrix.

    Fundamental matrix

    The fundamental matrix of an absorbing Markov chain is

    The fundamental matrix describes the expected number of visits from to .

    Absorption probability

    The probability that the Markov chain is absorbed in state when starting in state is given by

    Absorption time

    The expected number of steps until the Markov chain is absorbed when starting in state is given by

    First hitting time for irreducible chain

    First hitting time for irreducible chain is given by modifying the transition matrix so that the we are interested in, is an absorbing state.

    Continuous Markov chains

    Markov Property

    A continuous-time stochastic process with discrete state space, , is a continuous-time Markov chain if

    for all . If the process does not depend on it is said to be time-homogeneous.

    for .

    Transition function

    The transition probabilities can be arranged in a matrix function for each that is called the transition function

    Champman-Kolmogorov Equations

    For a continuous Markov chain with transition ,

    for .

    Holding times

    The holding time, at a state is the length of time that a continuous-time Markov chain stays in before transitioning to a new state. has an exponential distribution.

    Absorbing state

    For each state , let be the parameter of the exponential distribution for the holding time . If is defined to be in the interval , a continuous-time process with , is said to be an absorbing state. This is because when the process visits state it never leaves.

    Explosive

    For each state , let be the parameter of the exponential distribution for the holding time . If is defined to be in the interval , a continuous-time process with is said to be an explosive. This is because the process may visit state infinitely many times at one time instant.

    Embedded chain

    The embedded chain in a continuous-time Markov chain is the discrete-time Markov chain with the transition probabilities for each state. The transition matrix for the embedded chain is a stochastic matrix with diagonal entries 0.

    Transition rates

    The is called the transition rates for a continuous-time process. With the transition rates we may obtain the embedded chain transition probabilities and the holding time parameters.

    Absorbing chain

    In a continuous-time Markov chain we write the matrix in the following form

    where is a matrix.

    Fundamental matrix

    The fundamental matrix for continuous-time Markov chain is defined as

    Mean time until absorption

    The mean time until absorption for a chain that started in is

    Stationary distribution with generator matrix

    A continuous-time Markov chain has a stationary distribution if and only if

    To compute this we need to use fact that . One of the equations in is therefore redundant and we may remove whichever.

    Global balance

    If is a stationary distribution of a continuous-time Markov chain. From we get

    This is called the global balance equations. They say that the transition rates in and out from any state are the same when stationary.

    Time reversibility

    A continuous-time Markov chain with generator matrix and a unique stationary distribution is time reversible if

    This is called the local balance or detailed balance equations, and the states that the long-term transition rate from to is equal to the long-term transition rate from to .

    Little's formula

    In a queueing system we can describe the long-term properties by the following formula

    where is the long-term average number of customers in the system, is the rate of arrivals, is the long-term average time that a customer spends in the system.

    Branching process

    In a branching process all nonzero states are transient.

    Mean generation size

    In a branching process the size of the nth generation is the sum of the individuals in the previous generation.

    The long-term generation size could be divided into three cases

    Variance of the generation size

    By the total law of variance the following holds

    Probability generating function

    In the discrete case, the probability generating function of the discrete random variable is

    We can see that . If we do successive differentiations we obtain

    This is useful for computing the probability given the generating function

    So if we know the generating function of a distribution we can use this to find out what distribution we have.

    Sums of independent random variables

    If we let , the probability generating function of is

    If also is identically distributed we can simplify

    Moments

    We may find the mean and the variance with the probability generating function

    which gives

    Extinction forever

    We can find the probability that a branching process may go extinct in the case , (when the probability of going extinct is 1) In general we can write the generating function for the nth generation as

    However it is not as useful in practical terms, only for proving that the probability of eventual extinction is the smallest root of the equation

    Markov chains Monte Carlo

    Instead of looking at a Markov chain and learn what happens when then number of steps approaches infinity, the limiting distribution, we know start with a target distribution, the limiting distribution and then derive the Markov chain from that. If we can get enough samples of the Markov chain we know that we have an approximate sample from our target distribution. Computing the marginal likelihood function is challenging in many different models.

    The law of large numbers

    The law of large numbers is central in probability theory. It states that is a sequence of independent and identically distributed random variables with a common mean of , then the following holds with probability 1

    Also, if is a random variable with the same distribution as the sequence and is a bounded, real-valued function, then the sequence is also an independent and identically distributed sequence with finite mean and a probability of 1 that the following holds

    Strong law of large numbers

    Let be an ergodic Markov chain with stationary distribution . Let be a random variable with distribution . Let be a bounded, real-valued function. Then

    where . When using this in practice, we may ignore the first elements in the sequence before computing the average to improve accuracy. This technique is called burn-in.

    Metropolis-Hastings algorithm

    MMetropolis-Hastings algorithm is one of the most common methods when using Markov chain Monte Carlo. It is a method for obtaining a sequence of random samples from a probability distribution where direct sampling is difficult [1] . The sequence is used to approximate the distribution. Metropolis-Hasting works quite well in with multidimensional data and there are other methods which is better when working with single-dimensional distributions. The algorithm constructs a reversible Markov chain whose distribution is , where is a discrete probability distribution. Thus, the goal of the algorithm is to construct the Markov chain , with stationary by simulating .

    Poisson process

    A Poisson process is a special type of counting process. Events arrive at specific time instants, starting at . The we count the number of arrivals that has occurred by the time . With Poisson processes we may focus on (i) the number of events that occurred between a fixed time interval, (ii) when events occurred, (iii) the behavior of individual events.

    Counting process

    A counting process a collection of positive integer valued random variables such that . Contrary to Markov chains, that operate with a sequence of random variables, a counting process is an uncountable collection indexed over a continuous time interval.

    Definition

    A Poisson process is a counting process with the following definition: Let be the parameter of a Poisson process that is a counting process with the following properties

    1. has a Poisson distribution with parameter for all
    2. has the same distribution as for .
    3. and are independent random variables for .

    Stationary increments

    Stationary increments is the third rule in the definition above. The distribution of the number of arrivals in an interval only depends on the length of the interval.

    Independent increments

    Independent increments is the fourth rule in the definition above. The number of arrivals on disjoint intervals are independent random variables.

    First arrival times

    If we let denote the first arrival time, then if and only if there are no arrivals in the interval . We have

    We can see that has an exponential distribution with parameters .

    Nth arrival times

    Let be the time of the nth arrival in a Poisson process with parameter , then has a gamma distribution with parameters and according to

    Distribution of arrival times

    Let be the arrival times of a Poisson process with parameter . The joint distribution of , conditional on , is the distribution of the order statistics of independent and identically distributed random variables on . We have

    If we have uniformly distributed random variables that are independent and identically distributed on , conditional on , they have the same distribution as .

    Memorylessness

    Memorylessness means that the waiting time distributions are the same for all observers, and all observers will wait, on average, the same amount of time. Formally, a random variable is memoryless if

    for all .

    Thinning

    A thinned Poisson process is a kind of a subprocess to another Poisson process that is independent to another thinned process of the same parent process.

    Superposition process

    If we have independent Poisson processes with respective parameters , then let for . is then a Poisson process with parameters .

    Spatial Poisson process

    A spatial Poisson process is a collection of random variables with parameter if

    1. has a Poisson distribution with parameter for each bounded set .
    2. and are independent random variables if and are disjoint sets.

    Brownian motion

    Brownian motion is a continuous stochastic process that has the following properties

    1. , for
    2. , for
    3. is independent from , for
    4. The function is continuous with probability 1

    Martingale

    A stochastic process is a martingale if for all

    1. for all

    Undirected weighted graphs

    Limiting distribution

    The limiting distribution for undirected weighted graphs is given by the balance functions. In the following example it is

    where . Thus, the sum of all the edges from each node divided by the sum of the total number of edges.

    References