User:Orimosenzon/notes

Expectation

Expectation according to joint distribution equals single distribution Expectation

$E_{p(x_{1},x_{2})}(X_{1})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})(x_{1})}=\sum _{x_{1}}\sum _{x_{2}}p(x_{1})p(x_{2}|x_{1})x_{1}=$

$\sum _{x_{1}}p(x_{1})x_{1}\sum _{x_{2}}p(x_{2}|x_{1})=\sum _{x_{1}}p(x_{1})x_{1}\cdot 1=E_{p(x_{1})}(X_{1})$

Hence:

$E_{p(x_{1},x_{2})}(X_{1})=E_{p(x_{1})}(X_{1})$

Also:

$E_{p(x_{1},x_{2})}(f(X_{1}))=E_{p(x_{1})}(f(X_{1}))$

linearity

$E(X_{1}+X_{2})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})(x_{1}+x_{2})}=$

$\sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{1}}+\sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{2}}=$

$\sum _{x_{1}}{p(x_{1})x_{1}}+\sum _{x_{2}}{p(x_{2})x_{2}}=E(X_{1})+E(X_{2})$

hence:

$E(X_{1}+X_{2})=E(X_{1})+E(X_{2})$

$E(\lambda X)=\sum _{x}p(x)\lambda x=\lambda \sum _{x}p(x)x=\lambda E(X)$

hence:

$E(\lambda X)=\lambda E(X)$

Variance & Standard deviation

definitions

$V(X)=def=E([X-E(X)]^{2})$

$\sigma (X)=def={\sqrt {V(X)}}$

The meaning of standard deviation

One way to look at standard deviation is as an approximation of the "expected drift" from the expectation. The "expected drift" could be defined as:

$ED(X)=def?=E(|X-E(X)|)$

This value is probably not easy to manipulate.

Suppose that X can have only the two values $k$ and $-k$ and that $E(X)=0$ . Then:

$V(X)=def=E([X-E(X)]^{2})=E(X^{2})=k^{2}$

and

$\sigma (X)={\sqrt {V(X)}}=k$

and

$ED(X)=E(|X-E(X)|)=E(|X|)=E(k)=k=\sigma (X)$

$V$ , $\sigma$ and $ED$ doesn't change by adding a constant so any random variable $X$ that all its drifts are of the same absolute value $k$ has $\sigma (X)=ED(X)$ .

Whenever the drift values are not the same, $\sigma$ averages with bigger weights to bigger values while $ED$ keep fair plane average. *todo*: show why

Example:Suppose you are performing the following experiment: you flip a coin, if it's head you go 5 meters to the left, if it is tail, you go 5 meters to the right. The variance in this case is 25 and the standard deviation is 5. The expected drift is also 5 (all the drift values are equal). More on that example, see here.

Alternative definition of variance

$V(X)=def=E((X-E(X))^{2})=E(X^{2}+E^{2}(X)-2XE(X))=$

$E(X^{2})+E^{2}(X)-2E^{2}(X)=E(X^{2})-E^{2}(X)$

hence:

$V(X)=E(X^{2})-E^{2}(X)$

variance (and sd) doesn't change by adding a constant

$V(X+c)=E([X+c-E(X+c)]^{2})=E([X+c-E(X)-E(c)]^{2})=E([X-E(X)]^{2})=V(X)$

variance of multiplication

$V(\lambda X)=E((\lambda X)^{2})-E^{2}(\lambda X)=\lambda ^{2}E(X^{2})-\lambda ^{2}E^{2}(X)=\lambda ^{2}(E(X^{2})-E^{2}(X))=\lambda ^{2}V(X)$

hence:

$V(\lambda X)=\lambda ^{2}V(X)$

SD of multiplication

$\sigma (\lambda X)={\sqrt {V(\lambda X)}}={\sqrt {\lambda ^{2}V(X)}}=\lambda {\sqrt {V(X)}}=\lambda \sigma (X)$

hence:

$\sigma (\lambda X)=\lambda \sigma (X)$

Variance of sum of random variables

$V(X_{1}+X_{2})=E((X_{1}+X_{2})^{2})-E^{2}(X_{1}+X_{2})=$

E(X_{1}^{2})+E(X_{2}^{2})+2E(X_{1}X_{2})-

(E^{2}(X_{1})+E^{2}(X_{2})+2E(X_{1})E(X_{2}))=\cdot

V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})\cdot

hence:

$V(X_{1}+X_{2})=V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})$

When $X_{1}$ and $X_{2}$ are independent, $Cov(X_{1},X_{2})=0$ and hence:

$X_{1},X_{2}$ Independent $\Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})$

When $X_{1}$ and $X_{2}$ are i.i.d (identically independent distributed) then:

$X_{1},X_{2}$ i.i.d $\Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})=2V(X_{1})$

Or more generally:

$X_{1},X_{2},\ldots ,X_{n}$ i.i.d $\Rightarrow V(\sum _{i=1}^{n}{X_{i}})=\sum _{i=1}^{n}V(X_{i})=nV(X_{1})$

hence:

$X_{1},X_{2},\ldots ,X_{n}$ i.i.d $\Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {V(\sum _{i=1}^{n}{X_{i}})}}={\sqrt {nV(X_{1})}}={\sqrt {n}}\cdot \sigma (X_{1})$

Note the difference from summing the variable with itself (identically distributed but not independent):

$V(X_{1}+X_{1})=V(2X_{1})=$ $4$ $V(X_{1})$

and

$\sigma (X_{1}+X_{1})=\sigma (2X_{1})=$ $2$ $\sigma (X_{1})$

more on the last result

We've showed that:

$X_{1},X_{2},\ldots ,X_{n}$ i.i.d $\Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {n}}\cdot \sigma (X_{1})$

Why is this important?

$\sigma$ is a measure for expected drift. The last result shows that the expected drift goes as square root (less than linear) with successive experiments... this means that the mean drift tends to zero:

$\lim _{n\to \infty }{\frac {\sigma (\sum _{i=1}^{n}X_{i})}{n}}=\lim _{n\to \infty }{\frac {{\sqrt {n}}\cdot \sigma (X_{1})}{n}}=0$

Recall the example of the random walk +-5. Now, suppose You repeat the process $n$ times. What is the expected drift?

The standard deviation, which can be considered as a measure to that drift is: ${\sqrt {n}}\cdot 5$

The mean drift is: $5{\frac {\sqrt {n}}{n}}$

For example, for 10000 iterations, the mean drift is: $5{\frac {\sqrt {100000}}{10000}}=0.05$ meter. Instead of 5 meter in each step it is 5 centimeter. The total drift is only 500 instead of 50,000.

todo:*...example of random walk +-5 gnuplot picture. the relation to the law of big numbers... the fact that frequent ration converges is an assumption in probability theory or a result?..

misc

$V(X)=0\Leftrightarrow E([X-E(X)]^{2})=0\Leftrightarrow \forall {x},x-E(X)=0\Leftrightarrow X$ is constant.

hence:

$V(X)=0\Leftrightarrow X$ is constant.

Covariance

Alternative definition

$Cov(X_{1},X_{2})=E((X_{1}-E(X_{1}))(X_{2}-E(X_{2})))=E(X_{1}X_{2})+E(X_{1})E(X_{2})-2E(X_{1})E(X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})$

hence:

$Cov(X_{1},X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})$

A special case is a covariance of two of the same random variable: $Cov(X,X)=E(XX)-E(X)E(X)=V(X)$

Covariance of independent variables

Assume that $X_{1}$ and $X_{2}$ are independent:

$E(X_{1}X_{2})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{1}x_{2}}=\sum _{x_{1},x_{2}}{p(x_{1})p(x_{2})x_{1}x_{2}}$ $\sum _{x_{1}}p(x_{1})x_{1}\sum _{x_{2}}p(x_{2})x_{2}=E(X_{1})E(X_{2})$

And hence:

$X_{1},X_{2}$ independent $\implies Cov(X_{1},X_{2})=0$

The contrary is not true, however. For example, if X is a constant random variable then

$COV(X,X)=V(X)=0$

But of course, X and X are very much dependent.

Wiener processes

(also known as "Brownian motion")

Let Z be a stochastic process with the following properties: 1. The change $\delta Z$ in a small period of time $\delta t$ is

$\delta Z=\epsilon \cdot {\sqrt {\delta t}}$

where:

$\epsilon \sim \phi (0,1)$

Summary

Expectation

$E_{p(x_{1},x_{2})}(X_{1})=E_{p(x_{1})}(X_{1})$
$E(X_{1}+X_{2})=E(X_{1})+E(X_{2})$
$E(\lambda X)=\lambda E(X)$

Variance and standard deviation

$V(X)=E(X^{2})-E^{2}(X)$
$V(\lambda X)=\lambda ^{2}V(X)$
$\sigma (\lambda X)=\lambda \sigma (X)$
$V(X_{1}+X_{2})=V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})$
$X_{1},X_{2}$ Independent $\Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})$
$X_{1},X_{2},\ldots ,X_{n}$ i.i.d $\Rightarrow V(\sum _{i=1}^{n}{X_{i}})=nV(X_{1})$
$X_{1},X_{2},\ldots ,X_{n}$ i.i.d $\Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {n}}\cdot \sigma (X_{1})$

Covariance

$Cov(X_{1},X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})$

Misc

int main() {
  cout << "hello lord\n";
}

Determinant is the area of the Parallelogram

Let $\mathbf {x_{1}} =(x_{1},y_{1})$ and $\mathbf {x_{2}} =(x_{2},y_{2})$ be two vectors in $\mathbb {R} ^{2}$ . We will show that the determinant $x_{1}\cdot y_{2}-x_{2}\cdot y_{1}$ is equal to the area of the Parallelogram.

short way

let $\mathbf {a}$ be a vector orthogonal to $\mathbf {x_{1}}$ and of norm equal to 1:

$\mathbf {a} ={\frac {(-y1,x1)}{\|\mathbf {x_{1}} \|}}$

(a word about left/right systems? why we didn't choose $(y1,-x1)$ ?)

Let $S$ be the area of the Parallelogram:

$S=\langle \mathbf {a} ,\mathbf {x_{2}} \rangle \cdot \|\mathbf {x_{1}} \|=\langle \|\mathbf {x_{1}} \|\mathbf {a} ,\mathbf {x_{2}} \rangle =\langle (-y_{1},x_{1}),(x_{2},y_{2})\rangle =-y_{1}x_{2}+x_{1}y_{2}$

longer way

Let $\mathbf {N}$ be the vector that resembles the height of the Parallelogram:

$\mathbf {N} =\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|}}{\frac {\mathbf {x_{1}} }{\|\mathbf {x_{1}} \|}}=\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|^{2}}}\mathbf {x_{1}}$

Let $S$ be the area of the Parallelogram:

$S=\|\mathbf {N} \|\cdot \|\mathbf {x_{1}} \|=\|\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|^{2}}}\mathbf {x_{1}} \|\cdot \|\mathbf {x_{1}} \|={\frac {\left\|\|\mathbf {x_{1}} \|^{2}\mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\|}{\|\mathbf {x_{1}} \|}}={\frac {\left\|\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\|}{\|\mathbf {x_{1}} \|}}$

$S^{2}={\frac {\|\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \|^{2}}{\|\mathbf {x_{1}} \|^{2}}}={\frac {\left\langle \langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} ,\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\rangle }{\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle }}$

$={\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle ^{2}\langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle +\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle -2\cdot \langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}}{\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle }}$

$=\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle +\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}-2\cdot \langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}=\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}$ $=(x_{1}^{2}+y_{1}^{2})(x_{2}^{2}+y_{2}^{2})-(x_{1}x_{2}+y_{1}y_{2})^{2}=x_{1}^{2}x_{2}^{2}+y_{1}^{2}y_{2}^{2}+x_{1}^{2}y_{2}^{2}+y_{1}^{2}x_{2}^{2}-\left(x_{1}^{2}x_{2}^{2}+y_{1}^{2}y_{2}^{2}+2\cdot x_{1}x_{2}y_{1}y_{2}\right)$