Expectation according to joint distribution equals single distribution Expectation [ edit ]
E
p
(
x
1
,
x
2
)
(
X
1
)
=
∑
x
1
,
x
2
p
(
x
1
,
x
2
)
(
x
1
)
=
∑
x
1
∑
x
2
p
(
x
1
)
p
(
x
2
|
x
1
)
x
1
=
{\displaystyle E_{p(x_{1},x_{2})}(X_{1})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})(x_{1})}=\sum _{x_{1}}\sum _{x_{2}}p(x_{1})p(x_{2}|x_{1})x_{1}=}
∑
x
1
p
(
x
1
)
x
1
∑
x
2
p
(
x
2
|
x
1
)
=
∑
x
1
p
(
x
1
)
x
1
⋅
1
=
E
p
(
x
1
)
(
X
1
)
{\displaystyle \sum _{x_{1}}p(x_{1})x_{1}\sum _{x_{2}}p(x_{2}|x_{1})=\sum _{x_{1}}p(x_{1})x_{1}\cdot 1=E_{p(x_{1})}(X_{1})}
Hence:
E
p
(
x
1
,
x
2
)
(
X
1
)
=
E
p
(
x
1
)
(
X
1
)
{\displaystyle E_{p(x_{1},x_{2})}(X_{1})=E_{p(x_{1})}(X_{1})}
Also:
E
p
(
x
1
,
x
2
)
(
f
(
X
1
)
)
=
E
p
(
x
1
)
(
f
(
X
1
)
)
{\displaystyle E_{p(x_{1},x_{2})}(f(X_{1}))=E_{p(x_{1})}(f(X_{1}))}
E
(
X
1
+
X
2
)
=
∑
x
1
,
x
2
p
(
x
1
,
x
2
)
(
x
1
+
x
2
)
=
{\displaystyle E(X_{1}+X_{2})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})(x_{1}+x_{2})}=}
∑
x
1
,
x
2
p
(
x
1
,
x
2
)
x
1
+
∑
x
1
,
x
2
p
(
x
1
,
x
2
)
x
2
=
{\displaystyle \sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{1}}+\sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{2}}=}
∑
x
1
p
(
x
1
)
x
1
+
∑
x
2
p
(
x
2
)
x
2
=
E
(
X
1
)
+
E
(
X
2
)
{\displaystyle \sum _{x_{1}}{p(x_{1})x_{1}}+\sum _{x_{2}}{p(x_{2})x_{2}}=E(X_{1})+E(X_{2})}
hence:
E
(
X
1
+
X
2
)
=
E
(
X
1
)
+
E
(
X
2
)
{\displaystyle E(X_{1}+X_{2})=E(X_{1})+E(X_{2})}
E
(
λ
X
)
=
∑
x
p
(
x
)
λ
x
=
λ
∑
x
p
(
x
)
x
=
λ
E
(
X
)
{\displaystyle E(\lambda X)=\sum _{x}p(x)\lambda x=\lambda \sum _{x}p(x)x=\lambda E(X)}
hence:
E
(
λ
X
)
=
λ
E
(
X
)
{\displaystyle E(\lambda X)=\lambda E(X)}
Variance & Standard deviation[ edit ]
V
(
X
)
=
d
e
f
=
E
(
[
X
−
E
(
X
)
]
2
)
{\displaystyle V(X)=def=E([X-E(X)]^{2})}
σ
(
X
)
=
d
e
f
=
V
(
X
)
{\displaystyle \sigma (X)=def={\sqrt {V(X)}}}
The meaning of standard deviation [ edit ]
One way to look at standard deviation is as an approximation of the "expected drift" from the expectation. The "expected drift" could be defined as:
E
D
(
X
)
=
d
e
f
?
=
E
(
|
X
−
E
(
X
)
|
)
{\displaystyle ED(X)=def?=E(|X-E(X)|)}
This value is probably not easy to manipulate.
Suppose that X can have only the two values
k
{\displaystyle k}
and
−
k
{\displaystyle -k}
and that
E
(
X
)
=
0
{\displaystyle E(X)=0}
. Then:
V
(
X
)
=
d
e
f
=
E
(
[
X
−
E
(
X
)
]
2
)
=
E
(
X
2
)
=
k
2
{\displaystyle V(X)=def=E([X-E(X)]^{2})=E(X^{2})=k^{2}}
and
σ
(
X
)
=
V
(
X
)
=
k
{\displaystyle \sigma (X)={\sqrt {V(X)}}=k}
and
E
D
(
X
)
=
E
(
|
X
−
E
(
X
)
|
)
=
E
(
|
X
|
)
=
E
(
k
)
=
k
=
σ
(
X
)
{\displaystyle ED(X)=E(|X-E(X)|)=E(|X|)=E(k)=k=\sigma (X)}
V
{\displaystyle V}
,
σ
{\displaystyle \sigma }
and
E
D
{\displaystyle ED}
doesn't change by adding a constant so any random variable
X
{\displaystyle X}
that all its drifts are of the same absolute value
k
{\displaystyle k}
has
σ
(
X
)
=
E
D
(
X
)
{\displaystyle \sigma (X)=ED(X)}
.
Whenever the drift values are not the same,
σ
{\displaystyle \sigma }
averages with bigger weights to bigger values while
E
D
{\displaystyle ED}
keep fair plane average. *todo*: show why
Example :Suppose you are performing the following experiment: you flip a coin, if it's head you go 5 meters to the left, if it is tail, you go 5 meters to the right. The variance in this case is 25 and the standard deviation is 5. The expected drift is also 5 (all the drift values are equal). More on that example, see here .
Alternative definition of variance [ edit ]
V
(
X
)
=
d
e
f
=
E
(
(
X
−
E
(
X
)
)
2
)
=
E
(
X
2
+
E
2
(
X
)
−
2
X
E
(
X
)
)
=
{\displaystyle V(X)=def=E((X-E(X))^{2})=E(X^{2}+E^{2}(X)-2XE(X))=}
E
(
X
2
)
+
E
2
(
X
)
−
2
E
2
(
X
)
=
E
(
X
2
)
−
E
2
(
X
)
{\displaystyle E(X^{2})+E^{2}(X)-2E^{2}(X)=E(X^{2})-E^{2}(X)}
hence:
V
(
X
)
=
E
(
X
2
)
−
E
2
(
X
)
{\displaystyle V(X)=E(X^{2})-E^{2}(X)}
variance (and sd) doesn't change by adding a constant[ edit ]
V
(
X
+
c
)
=
E
(
[
X
+
c
−
E
(
X
+
c
)
]
2
)
=
E
(
[
X
+
c
−
E
(
X
)
−
E
(
c
)
]
2
)
=
E
(
[
X
−
E
(
X
)
]
2
)
=
V
(
X
)
{\displaystyle V(X+c)=E([X+c-E(X+c)]^{2})=E([X+c-E(X)-E(c)]^{2})=E([X-E(X)]^{2})=V(X)}
variance of multiplication [ edit ]
V
(
λ
X
)
=
E
(
(
λ
X
)
2
)
−
E
2
(
λ
X
)
=
λ
2
E
(
X
2
)
−
λ
2
E
2
(
X
)
=
λ
2
(
E
(
X
2
)
−
E
2
(
X
)
)
=
λ
2
V
(
X
)
{\displaystyle V(\lambda X)=E((\lambda X)^{2})-E^{2}(\lambda X)=\lambda ^{2}E(X^{2})-\lambda ^{2}E^{2}(X)=\lambda ^{2}(E(X^{2})-E^{2}(X))=\lambda ^{2}V(X)}
hence:
V
(
λ
X
)
=
λ
2
V
(
X
)
{\displaystyle V(\lambda X)=\lambda ^{2}V(X)}
SD of multiplication [ edit ]
σ
(
λ
X
)
=
V
(
λ
X
)
=
λ
2
V
(
X
)
=
λ
V
(
X
)
=
λ
σ
(
X
)
{\displaystyle \sigma (\lambda X)={\sqrt {V(\lambda X)}}={\sqrt {\lambda ^{2}V(X)}}=\lambda {\sqrt {V(X)}}=\lambda \sigma (X)}
hence:
σ
(
λ
X
)
=
λ
σ
(
X
)
{\displaystyle \sigma (\lambda X)=\lambda \sigma (X)}
Variance of sum of random variables [ edit ]
V
(
X
1
+
X
2
)
=
E
(
(
X
1
+
X
2
)
2
)
−
E
2
(
X
1
+
X
2
)
=
{\displaystyle V(X_{1}+X_{2})=E((X_{1}+X_{2})^{2})-E^{2}(X_{1}+X_{2})=}
E
(
X
1
2
)
+
E
(
X
2
2
)
+
2
E
(
X
1
X
2
)
−
{\displaystyle E(X_{1}^{2})+E(X_{2}^{2})+2E(X_{1}X_{2})-}
(
E
2
(
X
1
)
+
E
2
(
X
2
)
+
2
E
(
X
1
)
E
(
X
2
)
)
=
⋅
{\displaystyle (E^{2}(X_{1})+E^{2}(X_{2})+2E(X_{1})E(X_{2}))=\cdot }
V
(
X
1
)
+
V
(
X
2
)
+
2
C
o
v
(
X
1
,
X
2
)
⋅
{\displaystyle V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})\cdot }
hence:
V
(
X
1
+
X
2
)
=
V
(
X
1
)
+
V
(
X
2
)
+
2
C
o
v
(
X
1
,
X
2
)
{\displaystyle V(X_{1}+X_{2})=V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})}
When
X
1
{\displaystyle X_{1}}
and
X
2
{\displaystyle X_{2}}
are independent,
C
o
v
(
X
1
,
X
2
)
=
0
{\displaystyle Cov(X_{1},X_{2})=0}
and hence:
X
1
,
X
2
{\displaystyle X_{1},X_{2}}
Independent
⇒
V
(
X
1
+
X
2
)
=
V
(
X
1
)
+
V
(
X
2
)
{\displaystyle \Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})}
When
X
1
{\displaystyle X_{1}}
and
X
2
{\displaystyle X_{2}}
are i.i.d (identically independent distributed) then:
X
1
,
X
2
{\displaystyle X_{1},X_{2}}
i.i.d
⇒
V
(
X
1
+
X
2
)
=
V
(
X
1
)
+
V
(
X
2
)
=
2
V
(
X
1
)
{\displaystyle \Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})=2V(X_{1})}
Or more generally:
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\ldots ,X_{n}}
i.i.d
⇒
V
(
∑
i
=
1
n
X
i
)
=
∑
i
=
1
n
V
(
X
i
)
=
n
V
(
X
1
)
{\displaystyle \Rightarrow V(\sum _{i=1}^{n}{X_{i}})=\sum _{i=1}^{n}V(X_{i})=nV(X_{1})}
hence:
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\ldots ,X_{n}}
i.i.d
⇒
σ
(
∑
i
=
1
n
X
i
)
=
V
(
∑
i
=
1
n
X
i
)
=
n
V
(
X
1
)
=
n
⋅
σ
(
X
1
)
{\displaystyle \Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {V(\sum _{i=1}^{n}{X_{i}})}}={\sqrt {nV(X_{1})}}={\sqrt {n}}\cdot \sigma (X_{1})}
Note the difference from summing the variable with itself (identically distributed but not independent):
V
(
X
1
+
X
1
)
=
V
(
2
X
1
)
=
{\displaystyle V(X_{1}+X_{1})=V(2X_{1})=}
4
{\displaystyle 4}
V
(
X
1
)
{\displaystyle V(X_{1})}
and
σ
(
X
1
+
X
1
)
=
σ
(
2
X
1
)
=
{\displaystyle \sigma (X_{1}+X_{1})=\sigma (2X_{1})=}
2
{\displaystyle 2}
σ
(
X
1
)
{\displaystyle \sigma (X_{1})}
more on the last result [ edit ]
We've showed that:
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\ldots ,X_{n}}
i.i.d
⇒
σ
(
∑
i
=
1
n
X
i
)
=
n
⋅
σ
(
X
1
)
{\displaystyle \Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {n}}\cdot \sigma (X_{1})}
Why is this important?
σ
{\displaystyle \sigma }
is a measure for expected drift. The last result shows that the expected drift goes as square root (less than linear) with successive experiments... this means that the mean drift tends to zero:
lim
n
→
∞
σ
(
∑
i
=
1
n
X
i
)
n
=
lim
n
→
∞
n
⋅
σ
(
X
1
)
n
=
0
{\displaystyle \lim _{n\to \infty }{\frac {\sigma (\sum _{i=1}^{n}X_{i})}{n}}=\lim _{n\to \infty }{\frac {{\sqrt {n}}\cdot \sigma (X_{1})}{n}}=0}
Recall the example of the random walk +-5.
Now, suppose You repeat the process
n
{\displaystyle n}
times.
What is the expected drift?
The standard deviation, which can be considered as a measure to that drift is:
n
⋅
5
{\displaystyle {\sqrt {n}}\cdot 5}
The mean drift is:
5
n
n
{\displaystyle 5{\frac {\sqrt {n}}{n}}}
For example, for 10000 iterations, the mean drift is:
5
100000
10000
=
0.05
{\displaystyle 5{\frac {\sqrt {100000}}{10000}}=0.05}
meter. Instead of 5 meter in each step it is 5 centimeter. The total drift is only 500 instead of 50,000.
todo:*...example of random walk +-5 gnuplot picture. the relation to the law of big numbers... the fact that frequent ration converges is an assumption in probability theory or a result?..
V
(
X
)
=
0
⇔
E
(
[
X
−
E
(
X
)
]
2
)
=
0
⇔
∀
x
,
x
−
E
(
X
)
=
0
⇔
X
{\displaystyle V(X)=0\Leftrightarrow E([X-E(X)]^{2})=0\Leftrightarrow \forall {x},x-E(X)=0\Leftrightarrow X}
is constant.
hence:
V
(
X
)
=
0
⇔
X
{\displaystyle V(X)=0\Leftrightarrow X}
is constant.
Alternative definition [ edit ]
C
o
v
(
X
1
,
X
2
)
=
E
(
(
X
1
−
E
(
X
1
)
)
(
X
2
−
E
(
X
2
)
)
)
=
E
(
X
1
X
2
)
+
E
(
X
1
)
E
(
X
2
)
−
2
E
(
X
1
)
E
(
X
2
)
=
E
(
X
1
X
2
)
−
E
(
X
1
)
E
(
X
2
)
{\displaystyle Cov(X_{1},X_{2})=E((X_{1}-E(X_{1}))(X_{2}-E(X_{2})))=E(X_{1}X_{2})+E(X_{1})E(X_{2})-2E(X_{1})E(X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})}
hence:
C
o
v
(
X
1
,
X
2
)
=
E
(
X
1
X
2
)
−
E
(
X
1
)
E
(
X
2
)
{\displaystyle Cov(X_{1},X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})}
A special case is a covariance of two of the same random variable:
C
o
v
(
X
,
X
)
=
E
(
X
X
)
−
E
(
X
)
E
(
X
)
=
V
(
X
)
{\displaystyle Cov(X,X)=E(XX)-E(X)E(X)=V(X)}
Covariance of independent variables [ edit ]
Assume that
X
1
{\displaystyle X_{1}}
and
X
2
{\displaystyle X_{2}}
are independent:
E
(
X
1
X
2
)
=
∑
x
1
,
x
2
p
(
x
1
,
x
2
)
x
1
x
2
=
∑
x
1
,
x
2
p
(
x
1
)
p
(
x
2
)
x
1
x
2
{\displaystyle E(X_{1}X_{2})=\sum _{x_{1},x_{2}}{p(x_{1},x_{2})x_{1}x_{2}}=\sum _{x_{1},x_{2}}{p(x_{1})p(x_{2})x_{1}x_{2}}}
∑
x
1
p
(
x
1
)
x
1
∑
x
2
p
(
x
2
)
x
2
=
E
(
X
1
)
E
(
X
2
)
{\displaystyle \sum _{x_{1}}p(x_{1})x_{1}\sum _{x_{2}}p(x_{2})x_{2}=E(X_{1})E(X_{2})}
And hence:
X
1
,
X
2
{\displaystyle X_{1},X_{2}}
independent
⟹
C
o
v
(
X
1
,
X
2
)
=
0
{\displaystyle \implies Cov(X_{1},X_{2})=0}
The contrary is not true, however. For example, if X is a constant random variable then
C
O
V
(
X
,
X
)
=
V
(
X
)
=
0
{\displaystyle COV(X,X)=V(X)=0}
But of course, X and X are very much dependent.
(also known as "Brownian motion")
Let Z be a stochastic process with the following properties:
1. The change
δ
Z
{\displaystyle \delta Z}
in a small period of time
δ
t
{\displaystyle \delta t}
is
δ
Z
=
ϵ
⋅
δ
t
{\displaystyle \delta Z=\epsilon \cdot {\sqrt {\delta t}}}
where:
ϵ
∼
ϕ
(
0
,
1
)
{\displaystyle \epsilon \sim \phi (0,1)}
E
p
(
x
1
,
x
2
)
(
X
1
)
=
E
p
(
x
1
)
(
X
1
)
{\displaystyle E_{p(x_{1},x_{2})}(X_{1})=E_{p(x_{1})}(X_{1})}
E
(
X
1
+
X
2
)
=
E
(
X
1
)
+
E
(
X
2
)
{\displaystyle E(X_{1}+X_{2})=E(X_{1})+E(X_{2})}
E
(
λ
X
)
=
λ
E
(
X
)
{\displaystyle E(\lambda X)=\lambda E(X)}
Variance and standard deviation [ edit ]
V
(
X
)
=
E
(
X
2
)
−
E
2
(
X
)
{\displaystyle V(X)=E(X^{2})-E^{2}(X)}
V
(
λ
X
)
=
λ
2
V
(
X
)
{\displaystyle V(\lambda X)=\lambda ^{2}V(X)}
σ
(
λ
X
)
=
λ
σ
(
X
)
{\displaystyle \sigma (\lambda X)=\lambda \sigma (X)}
V
(
X
1
+
X
2
)
=
V
(
X
1
)
+
V
(
X
2
)
+
2
C
o
v
(
X
1
,
X
2
)
{\displaystyle V(X_{1}+X_{2})=V(X_{1})+V(X_{2})+2Cov(X_{1},X_{2})}
X
1
,
X
2
{\displaystyle X_{1},X_{2}}
Independent
⇒
V
(
X
1
+
X
2
)
=
V
(
X
1
)
+
V
(
X
2
)
{\displaystyle \Rightarrow V(X_{1}+X_{2})=V(X_{1})+V(X_{2})}
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\ldots ,X_{n}}
i.i.d
⇒
V
(
∑
i
=
1
n
X
i
)
=
n
V
(
X
1
)
{\displaystyle \Rightarrow V(\sum _{i=1}^{n}{X_{i}})=nV(X_{1})}
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\ldots ,X_{n}}
i.i.d
⇒
σ
(
∑
i
=
1
n
X
i
)
=
n
⋅
σ
(
X
1
)
{\displaystyle \Rightarrow \sigma (\sum _{i=1}^{n}{X_{i}})={\sqrt {n}}\cdot \sigma (X_{1})}
C
o
v
(
X
1
,
X
2
)
=
E
(
X
1
X
2
)
−
E
(
X
1
)
E
(
X
2
)
{\displaystyle Cov(X_{1},X_{2})=E(X_{1}X_{2})-E(X_{1})E(X_{2})}
int main () {
cout << "hello lord \n " ;
}
Determinant is the area of the Parallelogram [ edit ]
Let
x
1
=
(
x
1
,
y
1
)
{\displaystyle \mathbf {x_{1}} =(x_{1},y_{1})}
and
x
2
=
(
x
2
,
y
2
)
{\displaystyle \mathbf {x_{2}} =(x_{2},y_{2})}
be two vectors in
R
2
{\displaystyle \mathbb {R} ^{2}}
. We will show that the determinant
x
1
⋅
y
2
−
x
2
⋅
y
1
{\displaystyle x_{1}\cdot y_{2}-x_{2}\cdot y_{1}}
is equal to the area of the Parallelogram.
let
a
{\displaystyle \mathbf {a} }
be a vector orthogonal to
x
1
{\displaystyle \mathbf {x_{1}} }
and of norm equal to 1:
a
=
(
−
y
1
,
x
1
)
‖
x
1
‖
{\displaystyle \mathbf {a} ={\frac {(-y1,x1)}{\|\mathbf {x_{1}} \|}}}
(a word about left/right systems? why we didn't choose
(
y
1
,
−
x
1
)
{\displaystyle (y1,-x1)}
?)
Let
S
{\displaystyle S}
be the area of the Parallelogram:
S
=
⟨
a
,
x
2
⟩
⋅
‖
x
1
‖
=
⟨
‖
x
1
‖
a
,
x
2
⟩
=
⟨
(
−
y
1
,
x
1
)
,
(
x
2
,
y
2
)
⟩
=
−
y
1
x
2
+
x
1
y
2
{\displaystyle S=\langle \mathbf {a} ,\mathbf {x_{2}} \rangle \cdot \|\mathbf {x_{1}} \|=\langle \|\mathbf {x_{1}} \|\mathbf {a} ,\mathbf {x_{2}} \rangle =\langle (-y_{1},x_{1}),(x_{2},y_{2})\rangle =-y_{1}x_{2}+x_{1}y_{2}}
Let
N
{\displaystyle \mathbf {N} }
be the vector that resembles the height of the Parallelogram:
N
=
x
2
−
⟨
x
1
,
x
2
⟩
‖
x
1
‖
x
1
‖
x
1
‖
=
x
2
−
⟨
x
1
,
x
2
⟩
‖
x
1
‖
2
x
1
{\displaystyle \mathbf {N} =\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|}}{\frac {\mathbf {x_{1}} }{\|\mathbf {x_{1}} \|}}=\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|^{2}}}\mathbf {x_{1}} }
Let
S
{\displaystyle S}
be the area of the Parallelogram:
S
=
‖
N
‖
⋅
‖
x
1
‖
=
‖
x
2
−
⟨
x
1
,
x
2
⟩
‖
x
1
‖
2
x
1
‖
⋅
‖
x
1
‖
=
‖
‖
x
1
‖
2
x
2
−
⟨
x
1
,
x
2
⟩
x
1
‖
‖
x
1
‖
=
‖
⟨
x
1
,
x
1
⟩
x
2
−
⟨
x
1
,
x
2
⟩
x
1
‖
‖
x
1
‖
{\displaystyle S=\|\mathbf {N} \|\cdot \|\mathbf {x_{1}} \|=\|\mathbf {x_{2}} -{\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle }{\|\mathbf {x_{1}} \|^{2}}}\mathbf {x_{1}} \|\cdot \|\mathbf {x_{1}} \|={\frac {\left\|\|\mathbf {x_{1}} \|^{2}\mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\|}{\|\mathbf {x_{1}} \|}}={\frac {\left\|\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\|}{\|\mathbf {x_{1}} \|}}}
S
2
=
‖
⟨
x
1
,
x
1
⟩
x
2
−
⟨
x
1
,
x
2
⟩
x
1
‖
2
‖
x
1
‖
2
=
⟨
⟨
x
1
,
x
1
⟩
x
2
−
⟨
x
1
,
x
2
⟩
x
1
,
⟨
x
1
,
x
1
⟩
x
2
−
⟨
x
1
,
x
2
⟩
x
1
⟩
⟨
x
1
,
x
1
⟩
{\displaystyle S^{2}={\frac {\|\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \|^{2}}{\|\mathbf {x_{1}} \|^{2}}}={\frac {\left\langle \langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} ,\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \mathbf {x_{2}} -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle \mathbf {x_{1}} \right\rangle }{\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle }}}
=
⟨
x
1
,
x
1
⟩
2
⟨
x
2
,
x
2
⟩
+
⟨
x
1
,
x
2
⟩
2
⟨
x
1
,
x
1
⟩
−
2
⋅
⟨
x
1
,
x
1
⟩
⟨
x
1
,
x
2
⟩
2
⟨
x
1
,
x
1
⟩
{\displaystyle ={\frac {\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle ^{2}\langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle +\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle -2\cdot \langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}}{\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle }}}
=
⟨
x
1
,
x
1
⟩
⟨
x
2
,
x
2
⟩
+
⟨
x
1
,
x
2
⟩
2
−
2
⋅
⟨
x
1
,
x
2
⟩
2
=
⟨
x
1
,
x
1
⟩
⟨
x
2
,
x
2
⟩
−
⟨
x
1
,
x
2
⟩
2
{\displaystyle =\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle +\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}-2\cdot \langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}=\langle \mathbf {x_{1}} ,\mathbf {x_{1}} \rangle \langle \mathbf {x_{2}} ,\mathbf {x_{2}} \rangle -\langle \mathbf {x_{1}} ,\mathbf {x_{2}} \rangle ^{2}}
=
(
x
1
2
+
y
1
2
)
(
x
2
2
+
y
2
2
)
−
(
x
1
x
2
+
y
1
y
2
)
2
=
x
1
2
x
2
2
+
y
1
2
y
2
2
+
x
1
2
y
2
2
+
y
1
2
x
2
2
−
(
x
1
2
x
2
2
+
y
1
2
y
2
2
+
2
⋅
x
1
x
2
y
1
y
2
)
{\displaystyle =(x_{1}^{2}+y_{1}^{2})(x_{2}^{2}+y_{2}^{2})-(x_{1}x_{2}+y_{1}y_{2})^{2}=x_{1}^{2}x_{2}^{2}+y_{1}^{2}y_{2}^{2}+x_{1}^{2}y_{2}^{2}+y_{1}^{2}x_{2}^{2}-\left(x_{1}^{2}x_{2}^{2}+y_{1}^{2}y_{2}^{2}+2\cdot x_{1}x_{2}y_{1}y_{2}\right)}
=
x
1
2
y
2
2
+
y
1
2
x
2
2
−
2
⋅
x
1
x
2
y
1
y
2
=
(
x
1
y
2
−
x
2
y
1
)
2
{\displaystyle =x_{1}^{2}y_{2}^{2}+y_{1}^{2}x_{2}^{2}-2\cdot x_{1}x_{2}y_{1}y_{2}=(x_{1}y_{2}-x_{2}y_{1})^{2}}
S
=
x
1
y
2
−
x
2
y
1
{\displaystyle S=x_{1}y_{2}-x_{2}y_{1}}
0
=
0
⋅
(
−
1
)
=
(
1
+
(
−
1
)
)
⋅
(
−
1
)
=
{\displaystyle 0=0\cdot (-1)=(1+(-1))\cdot (-1)=}
=
1
⋅
(
−
1
)
+
(
−
1
)
⋅
(
−
1
)
=
{\displaystyle =1\cdot (-1)+(-1)\cdot (-1)=}
=
(
−
1
)
+
(
−
1
)
⋅
(
−
1
)
{\displaystyle =(-1)+(-1)\cdot (-1)}
לכן:
(
−
1
)
⋅
(
−
1
)
=
1
{\displaystyle (-1)\cdot (-1)=1}
צעד ראשון: 0 כפול כל מספר הוא
0
צעד שני: 0 סכום של הופכיים חיבוריים
צעד שלישי: דיסטריביוטיביות
צעד רביעי: 1 נטרלי כפלי
הסקה: הופכי חיבורי ל - -1-