Probability

Probability

Probability

Cover: The exponential distribution

Axioms of Probability

We start with the axioms of probability. The reason is that everything else can be derived from the axioms, so it’s important to know the basics well.

The sample space ω\omega is a non-empty set of outcomes, and the event space E\mathcal{E} be a set containing some subsets of ω\omega.

  1. AEAcEA \in \mathcal{E} \quad \Rightarrow \quad A^{c} \in \mathcal{E}
  2. A1,A2,Ei=1AiEA_{1}, A_{2}, \ldots \in \mathcal{E} \quad \Rightarrow \quad \bigcup_{i=1}^{\infty} A_{i} \in \mathcal{E}
  3. E is non-empty \mathcal{E} \text { is non-empty }.

If E\mathcal{E} satisfies these properties, then (ω,E)(\omega, \mathcal{E}) is said to be a measurable space.

Using measure theory, (which we will gladly skip over), a function P:E[0,1]P: \mathcal{E} \to [0, 1] satisfies the axioms of probability if

  1. P(Ω)=1P(\Omega)=1
  2. A1,A2,E,AiAj=i,jP(i=1Ai)=i=1P(Ai)A_{1}, A_{2}, \ldots \in \mathcal{E}, A_{i} \cap A_{j}=\varnothing \forall i, j \quad \Rightarrow \quad P\left(\cup_{i=1}^{\infty} A_{i}\right)=\sum_{i=1}^{\infty} P\left(A_{i}\right)

PP is called a probability distribution. The tuple is called (Ω,E,P)(\Omega, \mathcal{E}, P) is called a probability space.

One can prove a bunch of facts using the axioms such as

P(A)=1P(Ac)P(A) = 1 - P(A^c)P(AB)=P(A)+P(B)(A,B mutually exclusive)P ( A \cup B) = P (A) + P(B) \quad (A, B \text{ mutually exclusive})P(AB)=P(A)P(B)(A,B independent)P ( A \cap B) = P (A)P(B) \quad (A, B \text{ independent})

We will get to the generalizations of these rules soon!

Probability distributions

Almost everytime we will pick E=P(Ω)\mathcal{E} = \mathcal{P}(\Omega), where P\mathcal{P} is the powerset.

Probability mass functions

For Ω= discrete sample space \Omega = \text { discrete sample space } and E=P(Ω)\mathcal{E}=\mathcal{P}(\Omega), a distribution then only needs to satisfy

  1. p:Ω[0,1]p: \Omega \rightarrow[0,1]

  2. ωΩp(ω)=1\sum_{\omega \in \Omega} p(\omega)=1

The probability of any event AEA \in \mathcal{E} is then:

P(A)=ωAp(ω)P(A)=\sum_{\omega \in A} p(\omega)

such a probability distribution is called a probability mass function.

Examples of probability mass functions:

  • Bernoulli distribution: Ω={S,F}α(0,1)\Omega=\{S, F\} \quad \alpha \in(0,1)

    p(ω)={αω=S1αω=Fp(\omega)=\left\{\begin{array}{ll}{\alpha} & {\omega=S} \\ {1-\alpha} & {\omega=F}\end{array}\right.

    or alternatively if we pick Ω={0,1}\Omega=\{0,1\},

    p(k)=αk(1α)1kkΩp(k)=\alpha^{k} \cdot(1-\alpha)^{1-k} \quad \forall k \in \Omega
  • Poisson distribution: Ω={0,1,}λ(0,)\Omega=\{0,1, \ldots\} \lambda \in(0, \infty)

    p(k)=λkeλk!kΩp(k)=\frac{\lambda^{k} e^{-\lambda}}{k !} \quad \forall k \in \Omega

Probability density functions

For Ω= continuous sample space \Omega = \text { continuous sample space } and E=B(Ω)\mathcal{E}=\mathcal{B}(\Omega), (B\mathcal{B} is the Borel field, which we will not get into, but you can think of like the continuous analogue of a power set operation, defined so that everything works) a distribution then only needs to satisfy

  1. p:Ω[0,)p: \Omega \rightarrow[0, \infty)

  2. Ωp(ω)dω=1\int_{\Omega} p(\omega) d \omega=1

The probability of any event AEA \in \mathcal{E} is then:

P(A)=Ap(ω)dωP(A)=\int_{A} p(\omega) d \omega

such a probability distribution is called a probability density function.

In a probability mass function and Ω= discrete sample space \Omega=\text { discrete sample space }, we can always take about any singleton event {ω}E,ωΩ\{\omega\} \in \mathcal{E}, \omega \in \Omega:

P({ω})=p(ω)P(\{\omega\})=p(\omega)

But for Ω\Omega a continuous space, this makes no sense as singleton sets have measure zero. Hence whenever we are talking about probability density functions, we should always ask about the probability of some event in an interval.

For example, say the stopping time of a car, is in an interval [3,15][3,15]. What is the probability of seeing a stopping time of exactly 3.141596? (How much mass in [3,15]?) It’s much more reasonable to ask the probability of stopping between 3 to 3.5 seconds.

For a probability density function, the continuous analogoue is Ω= continuous sample space \Omega=\text { continuous sample space },A=[x,x+Δx]A=[x, x+\Delta x], and for small Δ\Delta,

P(A)=xx+Δxp(ω)dωp(x)Δx\begin{aligned} P(A) &=\int_{x}^{x+\Delta x} p(\omega) d \omega \\ & \approx p(x) \Delta x \end{aligned}

Examples of probability density functions:

  • Uniform distribution: Ω=[a,b]\Omega=[a, b]

    p(ω)=1baω[a,b]p(\omega)=\frac{1}{b-a} \quad \forall \omega \in[a, b]

  • Gaussian distribution: Ω=RμR,σR+\Omega=\mathbb{R} \quad \mu \in \mathbb{R}, \sigma \in \mathbb{R}^{+}

    p(ω)=12πσ2e12σ2(ωμ)2ωRp(\omega)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{1}{2 \sigma^{2}}(\omega-\mu)^{2}} \quad \forall \omega \in \mathbb{R}

  • Exponential distribution: Ω=[0,)λ>0\Omega=[0, \infty) \quad \lambda>0

    p(ω)=λeλωω0p(\omega)=\lambda e^{-\lambda \omega} \quad \forall \omega \geq 0

    exponential

Random Variables

The confusing thing about random variables is that they are neither random nor a variable.

A random variable is defined:

  1. X:ΩΩXX: \Omega \rightarrow \Omega_{X}.
  2. AB(ΩX) it holds that {ω:X(ω)A}E\forall A \in \mathcal{B}\left(\Omega_{X}\right) \text { it holds that }\{\omega: X(\omega) \in A\} \in \mathcal{E}.

To give an example, let’s say Ω\Omega is a set of people. Suppose we want to compute the probability that a randomly selected person ωΩ\omega \in \Omega has a cold.

Define, A={ωΩ: Disease (ω)=cold}A=\{\omega \in \Omega: \text { Disease }(\omega)=\text{cold}\}. Disease is our new random variable, P(Disease=cold)P(\text{Disease} = \text{cold}). Hence, Disease\text{Disease} is a function that maps one sample space Ω=set of people\Omega = \text {set of people} to another sample space ΩX={ cold, not cold }\Omega_X = \{\text { cold, not cold }\}, which I’ll call the target space.

Essentially, the main point of a random variable is that it transforms one sample space into another. In practice instead of calling the target space the target space, we call it the random variable. That is, we sometimes refer to ΩX\Omega_X as XX.

Multiple random variables

When we have two or more random variables, we have three extra distributions. Suppose the sample spaces are X,Y\mathcal{X}, \mathcal{Y}.

The first of which is the joint distribution, or multivariate distribution. A random variable becomes a random vector, X=(X1,X2,,Xd)\boldsymbol{X}=\left(X_{1}, X_{2}, \ldots, X_{d}\right) with vector-valued outcomes x=(x1,x2,,xd)\boldsymbol{x}=\left(x_{1}, x_{2}, \ldots, x_{d}\right), and each xix_i comes from Xi\mathcal{X}_{i}.

To make things simple, for two variables, there is a PP such that

p(x,y)= def P(X=x,Y=y)p(x, y) \stackrel{\text { def }}{=} P(X=x, Y=y)

and

xXyYp(x,y)=1\sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x, y)=1

It is easy to generalize the above. The multivariate joint distribution must satisfy:

  1. p:X1×X2××Xd[0,1]p: \mathcal{X}_{1} \times \mathcal{X}_{2} \times \ldots \times \mathcal{X}_{d} \rightarrow[0,1]

  2. For discrete variables:

    x1X1x2X2xdXdp(x1,x2,,xd)=1\sum_{x_{1} \in \mathcal{X}_{1}} \sum_{x_{2} \in \mathcal{X}_{2}} \cdots \sum_{x_{d} \in \mathcal{X} d} p\left(x_{1}, x_{2}, \ldots, x_{d}\right)=1

    For continuous variables:

    X1X2Xdp(x1,x2,,xd)dx1dx2dxd=1\int_{\mathcal{X}_{1}} \int_{\mathcal{X}_{2}} \cdots \int_{\mathcal{X} d} p\left(x_{1}, x_{2}, \ldots, x_{d}\right) d x_{1} d x_{2} \ldots d x_{d}=1

If we know the joint distribution, we also get two more distributions for free. The next distribution is the marginal distribution. It is defined for a subset of X=(X1,X2,,Xd)\boldsymbol{X}=\left(X_{1}, X_{2}, \ldots, X_{d}\right) by summing or integrating the remaining variables.

For the discrete case:

p(x)== def yYp(x,y)p(x) = \stackrel{\text { def }}{=} \sum_{y \in \mathcal{Y}}p(x, y)

For the continuous case:

p(x)== def yYp(x,y)p(x) = \stackrel{\text { def }}{=} \int_{y \in \mathcal{Y}}p(x, y)

This is how p(x)p(x) and p(y)p(y) are related to p(x,y)p(x, y): they are the marginal distributions of the joint distribution p(x,y)p(x, y).

We can easily extend this to more than two variables. For the discrete case:

p(xi)= def x1X1xi1Xi1xi+1Xi+1xdXdp(x1,,xi1,xi,xi+1,,xd)p\left(x_{i}\right) \stackrel{\text { def }}{=} \sum_{x_{1} \in \mathcal{X}_{1}} \cdots \sum_{x_{i-1} \in \mathcal{X}_{i-1} x_{i+1} \in \mathcal{X}_{i+1}} \cdots \sum_{x_{d} \in \mathcal{X}_{d}} p\left(x_{1}, \ldots, x_{i-1}, x_{i}, x_{i+1}, \ldots, x_{d}\right)

For the continuous case:

p(xi)= def X1Xi1Xi+1Xdp(x1,,xi1,xi,xi+1,,xd)dx1dxi1dxi+1dxdp\left(x_{i}\right) \stackrel{\text { def }}{=} \int_{\mathcal{X}_{1}} \cdots \int_{\mathcal{X}_{i-1}} \int_{\mathcal{X}_{i+1}} \cdots \int_{\mathcal{X}_{d}} p\left(x_{1}, \ldots, x_{i-1}, x_{i}, x_{i+1}, \ldots, x_{d}\right) d x_{1} \ldots d x_{i-1} d x_{i+1} \ldots d x_{d}

Conditional distributions

To make things simple, let’s say we are dealing with two random variables only. From the joint distribution p(x,y)p(x, y) we can define the conditional distribution:

p(yx)= def p(x,y)p(x)p(y | x) \stackrel{\text { def }}{=} \frac{p(x, y)}{p(x)}

As before for events, we must integrate the distribution.

P(YAX=x)={yAp(yx)Y: discrete Ap(yx)dyY: continuous P(Y \in A | X=x)=\left\{\begin{array}{ll}{\sum_{y \in A} p(y | x)} & {Y: \text { discrete }} \\ {\int_{A} p(y | x) d y} & {Y: \text { continuous }}\end{array}\right.

We get the following formula for free:

p(x,y)=p(xy)p(y)=p(yx)p(x)p(x, y)=p(x | y) p(y)=p(y | x) p(x)

which is known as the product rule. The generalization to multiple variables is straightforward:

p(x1,,xd)=p(xdx1,,xd1)p(x1,,xd1)p\left(x_{1}, \ldots, x_{d}\right)=p\left(x_{d} | x_{1}, \ldots, x_{d-1}\right) p\left(x_{1}, \ldots, x_{d-1}\right)

and recursively applying the product rule, we obtain:

p(x1,,xd)=p(xdx1,,xd1)p(x1,,xd1)=p(xdx1,,xd1)p(xd1x1,,xd2)p(x1,,xd2)=p(xdx1,,xd1)p(xd1x1,,xd2)p(x2x1)p(x1)\begin{aligned} p\left(x_{1}, \ldots, x_{d}\right) &=p\left(x_{d} | x_{1}, \ldots, x_{d-1}\right) p\left(x_{1}, \ldots, x_{d-1}\right) \\ &=p\left(x_{d} | x_{1}, \ldots, x_{d-1}\right) p\left(x_{d-1} | x_{1}, \ldots, x_{d-2}\right) p\left(x_{1}, \ldots, x_{d-2}\right) \\ & \vdots \\ &=p\left(x_{d} | x_{1}, \ldots, x_{d-1}\right) p\left(x_{d-1} | x_{1}, \ldots, x_{d-2}\right) \ldots p\left(x_{2} | x_{1}\right) p\left(x_{1}\right) \end{aligned}

which can be written in the more compact form

p(x1,,xd)=p(x1)i=2dp(xix1,,xi1)p\left(x_{1}, \ldots, x_{d}\right)=p\left(x_{1}\right) \prod_{i=2}^{d} p\left(x_{i} | x_{1}, \ldots, x_{i-1}\right)

known as the general product rule.

Now, using the definition of conditional probability:

p(x,y)=p(xy)p(y)=p(yx)p(x)p(x, y)=p(x | y) p(y)=p(y | x) p(x)

we obtain Bayes’ theorem for free:

p(xy)=p(yx)p(x)p(y)p(x | y)=\frac{p(y | x) p(x)}{p(y)}

Independent random variables

If the variables are independent if the joint distribution can be written as the product of marginal distributions

p(x,y)=p(x)p(y)p(x, y) = p(x)p(y)

The reason is that if

p(xy)=p(x)p(x | y)=p(x)

then it implies that the value of YY does not affect the distribution of xx, so the variables are independent.

For more than two variables, the distribution factors neatly into the components:

p(x1,x2,,xd)=p(x1)p(x2)p(xd)p\left(x_{1}, x_{2}, \ldots, x_{d}\right)=p\left(x_{1}\right) p\left(x_{2}\right) \ldots p\left(x_{d}\right)

There is one more form of independence. Two variables are conditionally independent if in the presence of a third variable, they are independent:

p(x,yz)=p(xz)p(yz)p(x, y | z)=p(x | z) p(y | z)

the two forms of independence are unrelated: neither one implies the other.

Expected value

The expected value or mean of a random variable XX is the average of a repeatedly sampled XX. It’s defined as:

E[X]= def {xXxp(x) if X is discrete Xxp(x)dx if X is continuous \mathbb{E}[X] \stackrel{\text { def }}{=}\left\{\begin{array}{ll}{\sum_{x \in \mathcal{X}} x p(x)} & {\text { if } X \text { is discrete }} \\ {\int_{\mathcal{X}} x p(x) d x} & {\text { if } X \text { is continuous }}\end{array}\right.

The expected value of f(X)f(X) is then:

E[f(X)]={xXf(x)p(x) if X is discrete Xf(x)p(x)dx if X is continuous \mathbb{E}[f(X)]=\left\{\begin{array}{ll}{\sum_{x \in \mathcal{X}} f(x) p(x)} & {\text { if } X \text { is discrete }} \\ {\int_{\mathcal{X}} f(x) p(x) d x} & {\text { if } X \text { is continuous }}\end{array}\right.

Doing conditional expectations is very similar. Since XX is fixed, we have

E[YX=x]={yYyp(yx)Y: discrete Yyp(yx)dyY: continuous \mathbb{E}[Y | X=x]=\left\{\begin{array}{ll}{\sum_{y \in \mathcal{Y}} y p(y | x)} & {Y: \text { discrete }} \\ {\int_{\mathcal{Y}} y p(y | x) d y} & {Y: \text { continuous }}\end{array}\right.

For multivariate distributions:

E[X]={xXxp(x)X: discrete Xxp(x)dxX: continuous \mathbb{E}[\boldsymbol{X}]=\left\{\begin{array}{ll}{\sum_{\boldsymbol{x} \in \mathcal{X}} \boldsymbol{x} p(\boldsymbol{x})} & {\boldsymbol{X}: \text { discrete }} \\ {\int_{\mathcal{X}} \boldsymbol{x} p(\boldsymbol{x}) d \boldsymbol{x}} & {\boldsymbol{X}: \text { continuous }}\end{array}\right.

One useful application is called the variance. If we pick f(X)=(XE[X])f(X) = (X - \mathbb{E}[X]), then we have

Var(X)=E[(XE[X])2]=E[X22XE[X]+E[X]2]=E[X2]2E[X]E[X]+E[X]2=E[X2]E[X]2\begin{aligned} \operatorname{Var}(X) &=\mathbb{E}\left[(X-\mathbb{E}[X])^{2}\right] \\ &=\mathbb{E}\left[X^{2}-2 X \mathbb{E}[X]+\mathbb{E}[X]^{2}\right] \\ &=\mathbb{E}\left[X^{2}\right]-2 \mathbb{E}[X] \mathbb{E}[X]+\mathbb{E}[X]^{2} \\ &=\mathbb{E}\left[X^{2}\right]-\mathbb{E}[X]^{2} \end{aligned}

Then, for multivariate distributions, we can define a multivariate variance called the covariance:

Cov[X,Y]=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\begin{aligned} \operatorname{Cov}[X, Y] &=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] \\ &=\mathbb{E}[X Y]-\mathbb{E}[X] \mathbb{E}[Y] \end{aligned}

We can use this to measure how correlated two variables are,

Corr[X,Y]=Cov[X,Y]V[X]V[Y]\operatorname{Corr}[X, Y]=\frac{\operatorname{Cov}[X, Y]}{\sqrt{V[X] \cdot V[Y]}}

Lastly, covariance for more than two variables. If X=[X1,,Xd]\boldsymbol{X}=\left[X_{1}, \ldots, X_{d}\right]

Σij=Cov[Xi,Xj]=E[(XiE[Xi])(XjE[Xj])]\begin{aligned} \Sigma_{i j} &=\operatorname{Cov}\left[X_{i}, X_{j}\right] \\ &=\mathbb{E}\left[\left(X_{i}-\mathbb{E}\left[X_{i}\right]\right)\left(X_{j}-\mathbb{E}\left[X_{j}\right]\right)\right] \end{aligned}

Using matrix notation, this becomes

Σ=Cov[X,X]Rd×d=E[(XE[X])(XE(X)]=E[XX]E[X]E[X]\begin{aligned} \boldsymbol{\Sigma} &=\operatorname{Cov}[\boldsymbol{X}, \boldsymbol{X}] \in \mathbb{R}^{d \times d} \\ &=\mathbb{E}\left[(\boldsymbol{X}-\mathbb{E}[\boldsymbol{X}])\left(\boldsymbol{X}-\mathbb{E}(\boldsymbol{X})^{\top}\right]\right.\\ &=\mathbb{E}\left[\boldsymbol{X} \boldsymbol{X}^{\top}\right]-\mathbb{E}[\boldsymbol{X}] \mathbb{E}[\boldsymbol{X}]^{\top} \end{aligned}

Note that here we are using the outer product. If the dot product or inner product is is defined by:

xy=[x1x2xd][x1x2xd]=i=1dxiyi\mathbf{x}^{\top}\mathbf{y}= \begin{bmatrix}x_1 & x_2 & \ldots & x_d \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \\ \vdots \\x_d \end{bmatrix}=\sum_{i=1}^{d} x_{i} y_{i}

Then the outer product is

xy=[x1y1x1y2x1ydx2y1x2y2x2ydxdy1xdy2xdyd]\mathbf{x y}^{\top}=\left[\begin{array}{cccc}{x_{1} y_{1}} & {x_{1} y_{2}} & {\dots} & {x_{1} y_{d}} \\ {x_{2} y_{1}} & {x_{2} y_{2}} & {\dots} & {x_{2} y_{d}} \\ {\vdots} & {\vdots} & {} & {\vdots} \\ {x_{d} y_{1}} & {x_{d} y_{2}} & {\dots} & {x_{d} y_{d}}\end{array}\right]

As an example, suppose we have only two random variables XX and YY. What is the covariance matrix?

Σ=Cov[X,X]=E[XX]E[X]E[X]\boldsymbol{\Sigma} = \operatorname{Cov}[\mathbf{X}, \mathbf{X}] = \mathbb{E}\left[\boldsymbol{X} \boldsymbol{X}^{\top}\right]-\mathbb{E}[\boldsymbol{X}] \mathbb{E}[\boldsymbol{X}]^{\top}=E[X12X1X2X2X1X22][E[X1]2E[X1]E[X2]E[X2]E[X1]E[X2]2]= \mathbb{E}\left[\begin{array}{cc}{X_{1}^{2}} & {X_{1} X_{2}} \\ {X_{2} X_{1}} & {X_{2}^{2}}\end{array}\right]-\left[\begin{array}{cc}{\mathbb{E}\left[X_{1}\right]^{2}} & {\mathbb{E}\left[X_{1}\right] \mathbb{E}\left[X_{2}\right]} \\ {\mathbb{E}\left[X_{2}\right] \mathbb{E}\left[X_{1}\right]} & {\mathbb{E}\left[X_{2}\right]^{2}}\end{array}\right]

Finally, we conclude with some useful properties of expectation values and variances. Note that V[X]\mathrm{V}[\mathbf{X}] is short for Cov[X,X]\operatorname{Cov}[\boldsymbol{X}, \boldsymbol{X}].

  1. E[cX]=cE[X]\mathbb{E}[c \boldsymbol{X}]=c \mathbb{E}[\boldsymbol{X}]

  2. E[X+Y]=E[X]+E[Y]\mathbb{E}[\boldsymbol{X}+\boldsymbol{Y}]=\mathbb{E}[\boldsymbol{X}]+\mathbb{E}[\boldsymbol{Y}]

  3. V[c]=0\mathrm{V}[c]=0 The variance of a constant is zero

  4. V[X]0\mathrm{V}[\boldsymbol{X}] \succeq 0 (the matrix is positive semi-definite)

  5. V[cX]=c2V[X]\mathrm{V}[c \boldsymbol{X}]=c^{2} \mathrm{V}[\boldsymbol{X}]

  6. Cov[X,Y]=E[(XE[X])(YE(Y)]=E[XY]E[X]E[Y]\operatorname{Cov}[\boldsymbol{X}, \boldsymbol{Y}]=\mathbb{E}\left[(\boldsymbol{X}-\mathbb{E}[\boldsymbol{X}])\left(\boldsymbol{Y}-\mathbb{E}(\boldsymbol{Y})^{\top}\right]=\mathbb{E}\left[\boldsymbol{X} \boldsymbol{Y}^{\top}\right]-\mathbb{E}[\boldsymbol{X}] \mathbb{E}[\boldsymbol{Y}]^{\top}\right.

  7. V[X+Y]=V[X]+V[Y]+2Cov[X,Y]\mathrm{V}[\boldsymbol{X}+\boldsymbol{Y}]=\mathrm{V}[\boldsymbol{X}]+\mathrm{V}[\boldsymbol{Y}]+2 \operatorname{Cov}[\boldsymbol{X}, \boldsymbol{Y}]

Multivariate distributions

The most important multivariate distribution is the multivariate Gaussian distribution N(ωμ,Σ)\mathcal{N}(\omega | \mu, \Sigma) or sometimes written as N(μ,Σ)\mathcal{N}(\boldsymbol{\mu}, \mathbf{\Sigma}), with Ω=Rd\Omega = \mathbb{R}^d.

p(ω)= def 1(2π)dΣexp(12(ωμ)Σ1(ωμ))p(\boldsymbol{\omega}) \stackrel{\text { def }}{=} \frac{1}{\sqrt{(2 \pi)^{d}|\mathbf{\Sigma}|}} \exp \left(-\frac{1}{2}(\boldsymbol{\omega}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{\omega}-\boldsymbol{\mu})\right)

Σ|\boldsymbol{\Sigma}| is the determinant of the covariance matrix, which by a theorem in linear algebra is equal to the product of all the eigenvalues of Σ\boldsymbol{\Sigma}.

Share This:

Me On Instagram

Get The Best Of All Hands Delivered To Your Inbox

Subscribe to our newsletter and stay updated.

Copyright © 2019

Zhi Han