數學基礎 Probability Theory

基礎知識

概率的定義 Axioms of Probability:

Sample space $Ω$ : The set of all the outcomes of a random experiment.
Set of events(or event space) $F$ : where each event $A \in F$ is a set containing zero or more outcomes (i.e., $A \subseteq Ω$ is a collection of possible outcomes of an experiment).
Probability measure: A function that satisfies the following propeties:
- $P (A) \geq 0$ for all $A \in F$
- $P (Ω) = 1$
- If $A_{1}, . . ., A_{k}$ are disjoint events (i.e., $A_{i} \cap A_{j} = \emptyset$ whereever $i \neq j$ ), then $P (\cup_{i} A_{i}) = \sum_{i} P (A_{i})$

理解：以擲骰子爲例， $Ω$ 就是所有的可能出現的點數集{1, 2, 3, 4, 5, 6}， $F$ 包括了各種事件，比如{1, 2, 3, 4}, {奇數點數}，{偶數點數}等等。

概率的基本屬性:

$If A \subseteq B ⟹ P (A) \leq P (B)$
$P (A \cap B) \leq min (P (A), P (B))$
$P (A \cup B) \leq P (A) + P (B)$
$P (Ω ∖ A) = P (\bar{A}) = 1 - P (A)$
If $A_{1}, . . ., A_{k}$ are a set of disjoint events such that $\cup_{i = 1}^{k} A_{i} = Ω$ , then $\sum_{i = 1}^{k} P (A_{k}) = 1$ .

條件概率：

P (A | B) = \frac{P (A \cap B)}{P (B)}

P (A | B)

is the probability measure of the event A after observing the occurrence of event B. Two events are independent if and only if

P (A \cap B) = P (A) P (B)

P (A | B) = P (A)

隨機變量

扔10次硬幣，出現的所有10個正反面(heads and tails)組合(考慮先後順序)便是樣本空間 $Ω$ ，比如 $ω_{0} =< H, H, T, H, T, H, H, T, T, T >\in Ω$ . 實際問題中，我們往往不關心出現某個特定正反面序列的概率，而更關心real-valued functions of outcomes，比如十次中出現正面的次數，或者連續反面的最長長度，這些函數便是random variables隨機變量。所以隨機變量是一個函數！
Random variable $X$ is a function $X : Ω ⟶ R$

Discrete random variable: $P (X = k) := P ({ω : X (ω) = k})$
Continuous random variable: $P (a \leq X \leq b) := P ({ω : a \leq X (ω) \leq b})$

CDFs, PDFs, PMFs

1.Cumulative distribution function (CDF) is a function $F_{X} : R ⟶ [0, 1]$ such that

F_{X} (x) ≜ P (X \leq x)

By using this function one can calculate the probability of any event in $F$ .
Properties:

$0 \leq F_{X} (x) \leq 1$ .
${lim}_{x \to - \infty} F_{X} (x) = 0$ .
${lim}_{x \to \infty} F_{X} (x) = 1$ .
$x \leq y ⟹ F_{X} (x) \leq F_{X} (y)$ .

2.Probability mass function (PMF) is a function $p_{X} : Ω ⟶ R$ such that

p_{X} (x) ≜ P (X = x)

Properties:

$0 \leq p_{X} (x) \leq 1$ .
$\sum_{x \in V a l (X)} p_{X} (x) = 1$ , $V a l (X)$ is the set of all possible values $X$ may assume.
$\sum_{x \in A} p_{X} (x) = P (X \in A) ?$ .

3.Probability density functions (PDF) is the derivative of the CDF:

f_{X} (x) ≜ \frac{d F_{X} (x)}{d x}

PDF for a continuous random variable may not always exist and for very small

Δ x

P (x \leq X \leq x + Δ x) \approx f_{X} (x) Δ x

The value of PDF at any given point

x

is not the probability of that event, i.e,

f_{X} (x) \neq P (X = x)

and

f_{X} (x)

can take on values larger than one.
Properties:

$f_{X} (x) \geq 0$ .
$\int_{- \infty}^{\infty} f_{X} (x) d x = 1$ .
$\int_{x \in A} f_{X} (x) d x = P (X \in A)$ .

Expectation 期望

$X$ is a discrete random variable with PMF $p_{X} (x)$ and $g : R ⟶ R$ is an arbitrary function. In this case, $g (X)$ can be considered a random variable, and we define the expectation or expected value of $g (X)$ as

E [g (X)] ≜ \sum_{x \in V a l (X)} g (x) p_{X} (x)

X

is a continuous random variable with PDF

f_{X} (x)

, then the expected value of

g (X)

is defined as,

E [g (X)] ≜ \int_{- \infty}^{\infty} g (x) f_{X} (x) d x

The expectation of

g (x)

can be thought of as a “weighted average” of the values that g(x) can taken on for different values of

x

, where the weights are given by

p_{X} (x)

f_{X} (x)

E [X]

is the mean of random variable

X

.
Properties:

$E [a] = a$ for any constant $a \in R$ .
$E [a f (X)] = a E [f (X)]$ for any constant $a \in R$ .
$E [f (X) + g (X)] = E [f (X)] + E [g (X)]$ .
For a discrete random variable $X$ , $E [1 {X = k}] = P (X = k)$ .

Variance 方差

The variance of a random variable $X$ is a measure of how concentrated the distribution of a random variable $X$ is around its mean:

V a r [X] ≜ E [(X - E [X])^{2}]

An alternate expression:

E [(X - E [X])^{2}] = E [X^{2}] - E [X]^{2}

Properties:

$V a r [a] = 0$ for any constant $a \in R$ .
$V a r [a f (X)] = a^{2} V a r [f (X)]$ for an constant $a \in R$ .

Some common random variables

Discrete random variables，以扔一次硬幣正面朝上的概率爲 $p$ 爲例:

伯努利分佈 $X ∽ B e r n o u l l i (p)$ (where $0 \leq p \leq 1$ ):
$p (x) = {\begin{cases} p & if x = 1 \\ 1 - p & if x = 0 \end{cases}$
二項式分佈 $X ∽ B i n o m i a l (n, p)$ (where $0 \leq p \leq 1$ )：投擲 $n$ 次，正面朝上的次數，
$p (x) = (\binom{n}{x}) p^{x} (1 - p)^{n - x}$
幾何分佈 $X ∽ G e o m e t r i c (p)$ (where $p > 0$ ): 投擲幾次第一次出現正面朝上,
$p (x) = p (1 - p)^{x - 1}$
泊松分佈 $X ∽ P o i s s o n (λ)$ (where $λ > 0$ )：a probability distribution over the nonnegative integers used for modeling the frequency of rare events,
$p (x) = e^{- λ} \frac{λ^{x}}{x!}$
Continuous random variable:
均勻分佈 $X ∽ U n i f o r m (a, b)$ (where $a < b$ ): 在 $a$ 和 $b$ 之間均等概率，
$f (x) = {\begin{cases} \frac{1}{b - a} & if a \leq x \leq b \\ 0 & otherwise \end{cases}$
指數分佈 $X ∽ E x p o n e n t i a l (λ)$ (where $λ > 0$ )：概率密度隨着 $x$ 增加減弱，
$f (x) = {\begin{cases} λ e^{- λ x} & if x \geq 0 \\ 0 & otherwise \end{cases}$
正態（高斯）分佈 $X ∽ N o r m a l (μ, σ^{2})$ (also known as the Gaussian distribution):
$f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2 σ^{2}} (x - μ)^{2}}$

PDF and CDF of a couple of random variables：

Summary of some of the properties of these distributions：

雙隨機變量

Joint and marginal distributions 聯合和邊際分佈

Joint cumulative distribution function 聯合累積分佈函數 of $X$ and $Y$ :

F_{X Y} (x, y) = P (X \leq x, Y \leq y)

Properties:

$0 \leq F_{X Y} (x, y) \leq 1$
${lim}_{x, y \to \infty} F_{X Y} (x, y) = 1$
${lim}_{x, y \to - \infty} F_{X Y} (x, y) = 0$
$F_{X} (x) = {lim}_{y \to \infty} F_{X Y} (x, y)$
$F_{Y} (y) = {lim}_{x \to \infty} F_{X Y} (x, y)$

$F_{X} (x)$ and $F_{Y} (y)$ are the marginal cumulative distribution functions 邊際累積分佈函數 of $F_{X Y} (x, y)$ .

Joint and marginal probability mass functions

Joint probability mass function $p_{X Y}$ : $R$ x $R ⟶ [0, 1]$ :

p_{X Y} (x, y) = P (X = x, Y = y)

Properties:

$0 \leq p_{X Y} (x, y) \leq 1$
$\sum_{x \in V a l (X)} \sum_{y \in V a l (Y)} p_{X Y} (x, y) = 1$
$p_{X} (x) = \sum_{y} p_{X Y} (x, y)$ => marginalization
$p_{Y} (y) = \sum_{x} p_{X Y} (x, y)$

$p_{X} (x)$ is the marginal probability mass function 邊際概率質量函數 of $X$ .

Joint and marginal probability density functions

Joint probability density function:

f_{X Y} (x, y) = \frac{\partial^{2} F_{X Y} (x, y)}{\partial x \partial y}

Properties:

$f_{X Y} (x, y) \neq P (X = x, Y = y)$
$\iint_{x \in A} f_{X Y} (x, y) d x d y = P ((X, Y) \in A)$
$\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f_{X Y} (x, y) = 1$
$f_{X} (x) = \int_{- \infty}^{\infty} f_{X Y} (x, y) d y$
$f_{Y} (y) = \int_{- \infty}^{\infty} f_{X Y} (x, y) d x$

$f_{X} (x)$ is the marginal probability density function 邊際概率密度函數 of $X$ .

Conditional distributions and Bayes’s rule

The conditional probability mass function of Y given X, assuming that $p_{X} (x) \neq 0$ :

p_{Y | X} (y | x) = \frac{p_{X Y} (x, y)}{p_{X} (x)}

The conditional probability density of Y given X, assuming that $f_{X} (x) \neq 0$ :

f_{Y | X} (y | x) = \frac{f_{X Y} (x, y)}{f_{X} (x)}

Bayes’s rule: derive expression for the conditional probability of one variable given another. 詳見貝葉斯決策理論
Discrete random variables

X

and

Y

P_{Y | X} (y | x) = \frac{P_{X Y} (x, y)}{P_{X} (x)} = \frac{P_{X | Y} (x | y) P_{Y} (y)}{\sum_{y^{'} \in V a l (Y)} P_{X | Y} (x | y^{'}) P_{Y} (y^{'})}

Continuous random variables

X

and

Y

f_{Y | X} (y | x) = \frac{f_{X Y} (x, y)}{f_{X} (x)} = \frac{f_{X | Y} (x | y) f_{Y} (y)}{\int_{- \infty}^{\infty} f_{X | Y} (x | y^{'}) f_{Y} (y^{'}) d y^{'}}

Independence 獨立

Two random variables X and Y are independent:

If $F_{X Y} (x, y) = F_{X} (x) F_{Y} (y)$ for all values of $x$ and $y$ .
For discrete random variables:
- $p_{X Y} (x, y) = p_{X} (x) p_{Y} (y)$ for all $x \in V a l (X), y \in V a l (Y)$
- $p_{Y | X} (y | x) = p_{Y} (y)$ whenever $p_{X} (x) \neq 0$ for all $y \in V a l (Y)$
For continuous random variables
- $f_{X Y} (x, y) = f_{X} (x) f_{Y} (y)$ for all $x, y \in R$
- $f_{Y | X} (y | x) = f_{Y} (y)$ whenever $f_{X} (x) \neq 0$ for all $y \in R$

If $X$ and $Y$ are independent then for any subset $A, B \subseteq R$ :

P (X \in A, Y \in B) = P (X \in A) P (Y \in B)

X

is independent of

Y

then any function of X is indepedent of any funciton of

Y

Expectation and covariance 期望和協方差

Two discrete variables $X$ , $Y$ and $g : R^{2} ⟶ R$ is a function on them:

E [g (X, Y)] ≜ \sum_{x \in V a l (X)} \sum_{y \in V a l (Y)} g (x, y) p_{X Y} (x, y)

For continuous random variables

X

Y

, the analogous expression is

E [g (X, Y)] ≜ \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} g (x, y) f_{X Y} (x, y) d x d y

The relationship of two random variables with each other: covariance of two variables

X

and

Y

is defined as:

C o v [X, Y] ≜ E [(X - E [X]) (Y - E [Y])]

we can rewrite this as:

\begin{aligned} C o v [X, Y] & = E [(X - E [X]) (Y - E [Y])] \\ = E [X Y - X E [Y] - Y E [X] + E [X] E [Y]] \\ = E [X Y] - E [X] E [Y] - E [Y] E [X] + E [X] E [Y] \\ = E [X Y] - E [X] E [Y] \end{aligned}

Properties:

(Linearity of expectation) $E [f (X, Y) + g (X, Y)] = E [f (X, Y)] + E [g (X, Y)]$
$V a r [X + Y] = V a r [X] + V a r [Y] + 2 C o v [X, Y]$
If $X$ and $Y$ are independent, then $C o v [X, Y] = 0$
If $X$ and $Y$ are independent, then $E [f (X) g (Y)] = E [f (X)] E [g (Y)]$
$C o v [X, Y] = 0$ , we say that $X$ and $Y$ are uncorrelated 不相關. This is not the same thing as stating that $X$ and $Y$ are uncorrelated. For example, if $X ∽ U n i f o r m (- 1, 1)$ and $Y = X^{2}$ , then one can show that $X$ and $Y$ are uncorrelated, even though they are not independent.
隨機變量的不相關和獨立在定義上就是不等價的。獨立是不相關的充分不必要條件，即獨立可以推出不相關，反之不行。獨立就是兩個隨機變量相互獨立，等價於 $p (x, y) = p (x) p (y)$ 。隨機變量uncorrelated的定義就是協方差爲0。

數學基礎 Probability Theory

基礎知識

概率的定義 Axioms of Probability:

概率的基本屬性:

條件概率：

隨機變量

CDFs, PDFs, PMFs

Expectation 期望

Variance 方差

Some common random variables

雙隨機變量

Joint and marginal distributions 聯合和邊際分佈

Joint and marginal probability mass functions

Joint and marginal probability density functions

Conditional distributions and Bayes’s rule

Independence 獨立

Expectation and covariance 期望和協方差

關於遊戲付費的一點想法

我通過CKA和CKS啦！

2020重新啓航

數學基礎 Probability Theory

Tensorflow 2.0 學習資料

NLP資料整理

Top 2 Language Models

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結