概率論
- 概率論是一種處理不確定性的框架(Probability as a mathematical framework for reasoning about uncertainly )
- Probabilistic models
— sample space:
1.) “List” (set) of possible outcomes
2.) "List"must be :Mutually exclusive and Collectively exhaustive
3.) Art: to be at the “right” granularity
— Probability law - Axioms of probability
- Simple examples
直方圖
使用特定的直方圖(Histogram in particular) 可以更好的查看相關數據
- 通過劃分範圍(經過觀察後),其被稱爲"組距"(class interval)
- 製作頻率表格(frequency table),很好的總結,但不能從根本上解決顯示的分佈狀況
排序是一種更好的觀測數據的方法 - 直方圖(寬,高,面積)
skewed(偏向分佈/偏分)
偏向右側分佈的實例有(Ex if the skewed to the right):house price, income, weight(體重)
左偏:用於食品上的開銷佔總開銷的比例
- 百分位數 percentiles
- 四分位數 quartiles 25%
- 中位數 mediun
從直方圖上(只)可以估算上面三個數值
用哪個數值取決於你最想表達的
(it depends on what you mean by best)
- 均值 mean best representation of a list of numbers.closest to every element of the list
可以泛泛地理解爲 balance point - 中位數 medium 位置在中間
- 衆數 mode 數量
歸一化:
將一列數據完全的化爲一個數
the ultimate reduction of a list of numbers to a single number.
- 極差 range
最大觀察數據減去最小觀察數據
(the range is a simply the largest observation minus the smallest observation that is a number in statistics) - 四分位差 IQR
25%位置對應的數減去25%位置對應的數。
the interquartile range the IQR is just 75th percentile minus the 25th percentile
實際上,可以幫助你提取中間50%的數據 - 標準差 SD
表中數據和平均數之間的偏差的均方根
standard deviation is the root means square of the deviations of the list from the mean of the list
Def of RMS:
均方根
平方數的平均數的平方根(一種去掉表中元素符號的方法)
root mean square:square root of the mean of the squares
Ex of RMS:
data | -1 | -5 | 0 | 5 | 1 |
---|
The RMS of the list is
Ex of SD:
data | 6 | 3 | 0 | 2 | 4 | 1 |
---|
1.The MEAN of the list is
(6+3+2+0+4+1)/6=2.67
2.The list of DEVIATION from the MEAN is
{(6- 2.67), (3- 2.67), (0- 2.67), (2- 2.67), (4- 2.67),(1- 2.67) }
={3.33, 0.33, -2.67, -0.67, 1.33, -1.67}