日萌社
人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度學習實戰(不定時更新)
集成學習:Bagging、隨機森林、Boosting、GBDT
5.1 xgboost算法原理
XGBoost(Extreme Gradient Boosting)全名叫極端梯度提升樹,XGBoost是集成學習方法的王牌,在Kaggle數據挖掘比賽中,大部分獲勝者用了XGBoost。
XGBoost在絕大多數的迴歸和分類問題上表現的十分頂尖,本節將較詳細的介紹XGBoost的算法原理。
1 最優模型的構建方法
我們在前面已經知道,構建最優模型的一般方法是最小化訓練數據的損失函數。
我們用字母 L表示損失,如下式:
其中,F是假設空間
假設空間是在已知屬性和屬性可能取值的情況下,對所有可能滿足目標的情況的一種毫無遺漏的假設集合。
式(1.1)稱爲經驗風險最小化,訓練得到的模型複雜度較高。當訓練數據較小時,模型很容易出現過擬合問題。
因此,爲了降低模型的複雜度,常採用下式:
其中J(f)J(f)爲模型的複雜度,
式(2.1)稱爲結構風險最小化,結構風險最小化的模型往往對訓練數據以及未知的測試數據都有較好的預測 。
應用:
- 決策樹的生成和剪枝分別對應了經驗風險最小化和結構風險最小化,
- XGBoost的決策樹生成是結構風險最小化的結果,後續會詳細介紹。
2 XGBoost的目標函數推導
2.1 目標函數確定
目標函數,即損失函數,通過最小化損失函數來構建最優模型。
由前面可知, 損失函數應加上表示模型複雜度的正則項,且XGBoost對應的模型包含了多個CART樹,因此,模型的目標函數爲:
2.2 CART樹的介紹
2.3 樹的複雜度定義
2.3.1 定義每課樹的複雜度
XGBoost法對應的模型包含了多棵cart樹,定義每棵樹的複雜度:
2.3.2 樹的複雜度舉例
假設我們要預測一家人對電子遊戲的喜好程度,考慮到年輕和年老相比,年輕更可能喜歡電子遊戲,以及男性和女性相比,男性更喜歡電子遊戲,故先根據年齡大小區分小孩和大人,然後再通過性別區分開是男是女,逐一給各人在電子遊戲喜好程度上打分,如下圖所示:
就這樣,訓練出了2棵樹tree1和tree2,類似之前gbdt的原理,兩棵樹的結論累加起來便是最終的結論,所以:
- 小男孩的預測分數就是兩棵樹中小孩所落到的結點的分數相加:2 + 0.9 = 2.9。
- 爺爺的預測分數同理:-1 + (-0.9)= -1.9。
具體如下圖所示:
2.4 目標函數推導
3 XGBoost的迴歸樹構建方法
3.1 計算分裂節點
在實際訓練過程中,當建立第 t 棵樹時,XGBoost採用貪心法進行樹結點的分裂:
從樹深爲0時開始:
-
對樹中的每個葉子結點嘗試進行分裂;
-
每次分裂後,原來的一個葉子結點繼續分裂爲左右兩個子葉子結點,原葉子結點中的樣本集將根據該結點的判斷規則分散到左右兩個葉子結點中;
-
新分裂一個結點後,我們需要檢測這次分裂是否會給損失函數帶來增益,增益的定義如下:
如果增益Gain>0,即分裂爲兩個葉子節點後,目標函數下降了,那麼我們會考慮此次分裂的結果。
那麼一直這樣分裂,什麼時候纔會停止呢?
3.2 停止分裂條件判斷
情況一:上節推導得到的打分函數是衡量樹結構好壞的標準,因此,可用打分函數來選擇最佳切分點。首先確定樣本特徵的所有切分點,對每一個確定的切分點進行切分,切分好壞的標準如下:
4 XGBoost與GDBT的區別
- 區別一:
- XGBoost生成CART樹考慮了樹的複雜度,
- GDBT未考慮,GDBT在樹的剪枝步驟中考慮了樹的複雜度。
- 區別二:
- XGBoost是擬合上一輪損失函數的二階導展開,GDBT是擬合上一輪損失函數的一階導展開,因此,XGBoost的準確性更高,且滿足相同的訓練效果,需要的迭代次數更少。
- 區別三:
- XGBoost與GDBT都是逐次迭代來提高模型性能,但是XGBoost在選取最佳切分點時可以開啓多線程進行,大大提高了運行速度。
5 小結
5.2 xgboost算法api介紹
1 xgboost的安裝:
官網鏈接:https://xgboost.readthedocs.io/en/latest/
pip3 install xgboost
2 xgboost參數介紹
xgboost雖然被稱爲kaggle比賽神奇,但是,我們要想訓練出不錯的模型,必須要給參數傳遞合適的值。
xgboost中封裝了很多參數,主要由三種類型構成:通用參數(general parameters),Booster 參數(booster parameters)和學習目標參數(task parameters)
- 通用參數:主要是宏觀函數控制;
- Booster參數:取決於選擇的Booster類型,用於控制每一步的booster(tree, regressiong);
- 學習目標參數:控制訓練目標的表現。
2.1 通用參數(general parameters)
- booster [缺省值=gbtree]
-
決定使用哪個booster,可以是gbtree,gblinear或者dart。
- gbtree和dart使用基於樹的模型(dart 主要多了 Dropout),而gblinear 使用線性函數.
-
silent [缺省值=0]
- 設置爲0打印運行信息;設置爲1靜默模式,不打印
-
nthread [缺省值=設置爲最大可能的線程數]
- 並行運行xgboost的線程數,輸入的參數應該<=系統的CPU核心數,若是沒有設置算法會檢測將其設置爲CPU的全部核心數
下面的兩個參數不需要設置,使用默認的就好了
-
num_pbuffer [xgboost自動設置,不需要用戶設置]
- 預測結果緩存大小,通常設置爲訓練實例的個數。該緩存用於保存最後boosting操作的預測結果。
-
num_feature [xgboost自動設置,不需要用戶設置]
- 在boosting中使用特徵的維度,設置爲特徵的最大維度
2.2 Booster 參數(booster parameters)
2.2.1 Parameters for Tree Booster
-
eta [缺省值=0.3,別名:learning_rate]
-
更新中減少的步長來防止過擬合。
-
在每次boosting之後,可以直接獲得新的特徵權值,這樣可以使得boosting更加魯棒。
- 範圍: [0,1]
-
-
gamma [缺省值=0,別名: min_split_loss](分裂最小loss)
- 在節點分裂時,只有分裂後損失函數的值下降了,纔會分裂這個節點。
-
Gamma指定了節點分裂所需的最小損失函數下降值。 這個參數的值越大,算法越保守。這個參數的值和損失函數息息相關,所以是需要調整的。
-
範圍: [0,∞]
-
max_depth [缺省值=6]
- 這個值爲樹的最大深度。 這個值也是用來避免過擬合的。max_depth越大,模型會學到更具體更局部的樣本。設置爲0代表沒有限制
- 範圍: [0,∞]
-
min_child_weight [缺省值=1]
- 決定最小葉子節點樣本權重和。XGBoost的這個參數是最小樣本權重的和.
- 當它的值較大時,可以避免模型學習到局部的特殊樣本。 但是如果這個值過高,會導致欠擬合。這個參數需要使用CV來調整。.
- 範圍: [0,∞]
-
subsample [缺省值=1]
- 這個參數控制對於每棵樹,隨機採樣的比例。
-
減小這個參數的值,算法會更加保守,避免過擬合。但是,如果這個值設置得過小,它可能會導致欠擬合。
-
典型值:0.5-1,0.5代表平均採樣,防止過擬合.
- 範圍: (0,1]
-
colsample_bytree [缺省值=1]
- 用來控制每棵隨機採樣的列數的佔比(每一列是一個特徵)。
- 典型值:0.5-1
- 範圍: (0,1]
-
colsample_bylevel [缺省值=1]
- 用來控制樹的每一級的每一次分裂,對列數的採樣的佔比。
- 我個人一般不太用這個參數,因爲subsample參數和colsample_bytree參數可以起到相同的作用。但是如果感興趣,可以挖掘這個參數更多的用處。
- 範圍: (0,1]
-
lambda [缺省值=1,別名: reg_lambda]
- 權重的L2正則化項(和Ridge regression類似)。
- 這個參數是用來控制XGBoost的正則化部分的。雖然大部分數據科學家很少用到這個參數,但是這個參數
- 在減少過擬合上還是可以挖掘出更多用處的。.
-
alpha [缺省值=0,別名: reg_alpha]
- 權重的L1正則化項。(和Lasso regression類似)。 可以應用在很高維度的情況下,使得算法的速度更快。
-
scale_pos_weight[缺省值=1]
- 在各類別樣本十分不平衡時,把這個參數設定爲一個正值,可以使算法更快收斂。通常可以將其設置爲負
- 樣本的數目與正樣本數目的比值。
2.2.2 Parameters for Linear Booster
linear booster一般很少用到。
-
lambda [缺省值=0,別稱: reg_lambda]
- L2正則化懲罰係數,增加該值會使得模型更加保守。
-
alpha [缺省值=0,別稱: reg_alpha]
- L1正則化懲罰係數,增加該值會使得模型更加保守。
-
lambda_bias [缺省值=0,別稱: reg_lambda_bias]
- 偏置上的L2正則化(沒有在L1上加偏置,因爲並不重要)
2.3 學習目標參數(task parameters)
-
objective [缺省值=reg:linear]
- “reg:linear” – 線性迴歸
- “reg:logistic” – 邏輯迴歸
- “binary:logistic” – 二分類邏輯迴歸,輸出爲概率
- “multi:softmax” – 使用softmax的多分類器,返回預測的類別(不是概率)。在這種情況下,你還需要多設一個參數:num_class(類別數目)
- “multi:softprob” – 和multi:softmax參數一樣,但是返回的是每個數據屬於各個類別的概率。
-
eval_metric [缺省值=通過目標函數選擇]
可供選擇的如下所示:
- “rmse”: 均方根誤差
- “mae”: 平均絕對值誤差
- “logloss”: 負對數似然函數值
- “error”: 二分類錯誤率。
- 其值通過錯誤分類數目與全部分類數目比值得到。對於預測,預測值大於0.5被認爲是正類,其它歸爲負類。
- “error@t”: 不同的劃分閾值可以通過 ‘t’進行設置
- “merror”: 多分類錯誤率,計算公式爲(wrong cases)/(all cases)
- “mlogloss”: 多分類log損失
- “auc”: 曲線下的面積
-
seed [缺省值=0]
- 隨機數的種子
- 設置它可以復現隨機數據的結果,也可以用於調整參數
5.3 xgboost案例介紹
1 案例背景
該案例和前面決策樹中所用案例一樣。
泰坦尼克號沉沒是歷史上最臭名昭着的沉船事件之一。1912年4月15日,在她的處女航中,泰坦尼克號在與冰山相撞後沉沒,在2224名乘客和機組人員中造成1502人死亡。這場聳人聽聞的悲劇震驚了國際社會,併爲船舶制定了更好的安全規定。 造成海難失事的原因之一是乘客和機組人員沒有足夠的救生艇。儘管倖存下沉有一些運氣因素,但有些人比其他人更容易生存,例如婦女,兒童和上流社會。 在這個案例中,我們要求您完成對哪些人可能存活的分析。特別是,我們要求您運用機器學習工具來預測哪些乘客倖免於悲劇。
我們提取到的數據集中的特徵包括票的類別,是否存活,乘坐班次,年齡,登陸home.dest,房間,船和性別等。
數據:http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt
經過觀察數據得到:
- 1 乘坐班是指乘客班(1,2,3),是社會經濟階層的代表。
- 2 其中age數據存在缺失。
2 步驟分析
- 1.獲取數據
- 2.數據基本處理
- 2.1 確定特徵值,目標值
- 2.2 缺失值處理
- 2.3 數據集劃分
- 3.特徵工程(字典特徵抽取)
- 4.機器學習(xgboost)
- 5.模型評估
3 代碼實現
- 導入需要的模塊
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
- 1.獲取數據
# 1、獲取數據
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
-
2.數據基本處理
- 2.1 確定特徵值,目標值
x = titan[["pclass", "age", "sex"]] y = titan["survived"]
- 2.2 缺失值處理
# 缺失值需要處理,將特徵當中有類別的這些特徵進行字典特徵抽取 x['age'].fillna(x['age'].mean(), inplace=True)
- 2.3 數據集劃分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
-
3.特徵工程(字典特徵抽取)
特徵中出現類別符號,需要進行one-hot編碼處理(DictVectorizer)
x.to_dict(orient="records") 需要將數組特徵轉換成字典數據
# 對於x轉換成字典數據x.to_dict(orient="records")
# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]
transfer = DictVectorizer(sparse=False)
x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))
- 4.xgboost模型訓練和模型評估
# 模型初步訓練
from xgboost import XGBClassifier
xg = XGBClassifier()
xg.fit(x_train, y_train)
xg.score(x_test, y_test)
# 針對max_depth進行模型調優
depth_range = range(10)
score = []
for i in depth_range:
xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
xg.fit(x_train, y_train)
s = xg.score(x_test, y_test)
print(s)
score.append(s)
# 結果可視化
import matplotlib.pyplot as plt
plt.plot(depth_range, score)
plt.show()
In [1]:
# 1.獲取數據
# 2.數據基本處理
# 2.1 確定特徵值,目標值
# 2.2 缺失值處理
# 2.3 數據集劃分
# 3.特徵工程(字典特徵抽取)
# 4.機器學習(xgboost)
# 5.模型評估
In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz
In [3]:
# 1.獲取數據
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
In [4]:
titan
Out[4]:
row.names | pclass | survived | name | age | embarked | home.dest | room | ticket | boat | sex | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1st | 1 | Allen, Miss Elisabeth Walton | 29.0000 | Southampton | St Louis, MO | B-5 | 24160 L221 | 2 | female |
1 | 2 | 1st | 0 | Allison, Miss Helen Loraine | 2.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
2 | 3 | 1st | 0 | Allison, Mr Hudson Joshua Creighton | 30.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | (135) | male |
3 | 4 | 1st | 0 | Allison, Mrs Hudson J.C. (Bessie Waldo Daniels) | 25.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
4 | 5 | 1st | 1 | Allison, Master Hudson Trevor | 0.9167 | Southampton | Montreal, PQ / Chesterville, ON | C22 | NaN | 11 | male |
5 | 6 | 1st | 1 | Anderson, Mr Harry | 47.0000 | Southampton | New York, NY | E-12 | NaN | 3 | male |
6 | 7 | 1st | 1 | Andrews, Miss Kornelia Theodosia | 63.0000 | Southampton | Hudson, NY | D-7 | 13502 L77 | 10 | female |
7 | 8 | 1st | 0 | Andrews, Mr Thomas, jr | 39.0000 | Southampton | Belfast, NI | A-36 | NaN | NaN | male |
8 | 9 | 1st | 1 | Appleton, Mrs Edward Dale (Charlotte Lamson) | 58.0000 | Southampton | Bayside, Queens, NY | C-101 | NaN | 2 | female |
9 | 10 | 1st | 0 | Artagaveytia, Mr Ramon | 71.0000 | Cherbourg | Montevideo, Uruguay | NaN | NaN | (22) | male |
10 | 11 | 1st | 0 | Astor, Colonel John Jacob | 47.0000 | Cherbourg | New York, NY | NaN | 17754 L224 10s 6d | (124) | male |
11 | 12 | 1st | 1 | Astor, Mrs John Jacob (Madeleine Talmadge Force) | 19.0000 | Cherbourg | New York, NY | NaN | 17754 L224 10s 6d | 4 | female |
12 | 13 | 1st | 1 | Aubert, Mrs Leontine Pauline | NaN | Cherbourg | Paris, France | B-35 | 17477 L69 6s | 9 | female |
13 | 14 | 1st | 1 | Barkworth, Mr Algernon H. | NaN | Southampton | Hessle, Yorks | A-23 | NaN | B | male |
14 | 15 | 1st | 0 | Baumann, Mr John D. | NaN | Southampton | New York, NY | NaN | NaN | NaN | male |
15 | 16 | 1st | 1 | Baxter, Mrs James (Helene DeLaudeniere Chaput) | 50.0000 | Cherbourg | Montreal, PQ | B-58/60 | NaN | 6 | female |
16 | 17 | 1st | 0 | Baxter, Mr Quigg Edmond | 24.0000 | Cherbourg | Montreal, PQ | B-58/60 | NaN | NaN | male |
17 | 18 | 1st | 0 | Beattie, Mr Thomson | 36.0000 | Cherbourg | Winnipeg, MN | C-6 | NaN | NaN | male |
18 | 19 | 1st | 1 | Beckwith, Mr Richard Leonard | 37.0000 | Southampton | New York, NY | D-35 | NaN | 5 | male |
19 | 20 | 1st | 1 | Beckwith, Mrs Richard Leonard (Sallie Monypeny) | 47.0000 | Southampton | New York, NY | D-35 | NaN | 5 | female |
20 | 21 | 1st | 1 | Behr, Mr Karl Howell | 26.0000 | Cherbourg | New York, NY | C-148 | NaN | 5 | male |
21 | 22 | 1st | 0 | Birnbaum, Mr Jakob | 25.0000 | Cherbourg | San Francisco, CA | NaN | NaN | (148) | male |
22 | 23 | 1st | 1 | Bishop, Mr Dickinson H. | 25.0000 | Cherbourg | Dowagiac, MI | B-49 | NaN | 7 | male |
23 | 24 | 1st | 1 | Bishop, Mrs Dickinson H. (Helen Walton) | 19.0000 | Cherbourg | Dowagiac, MI | B-49 | NaN | 7 | female |
24 | 25 | 1st | 1 | Bjornstrm-Steffansson, Mr Mauritz Hakan | 28.0000 | Southampton | Stockholm, Sweden / Washington, DC | NaN | D | male | |
25 | 26 | 1st | 0 | Blackwell, Mr Stephen Weart | 45.0000 | Southampton | Trenton, NJ | NaN | NaN | (241) | male |
26 | 27 | 1st | 1 | Blank, Mr Henry | 39.0000 | Cherbourg | Glen Ridge, NJ | A-31 | NaN | 7 | male |
27 | 28 | 1st | 1 | Bonnell, Miss Caroline | 30.0000 | Southampton | Youngstown, OH | C-7 | NaN | 8 | female |
28 | 29 | 1st | 1 | Bonnell, Miss Elizabeth | 58.0000 | Southampton | Birkdale, England Cleveland, Ohio | C-103 | NaN | 8 | female |
29 | 30 | 1st | 0 | Borebank, Mr John James | NaN | Southampton | London / Winnipeg, MB | D-21/2 | NaN | NaN | male |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1283 | 1284 | 3rd | 0 | Vestrom, Miss Hulda Amanda Adolfina | NaN | NaN | NaN | NaN | NaN | NaN | female |
1284 | 1285 | 3rd | 0 | Vonk, Mr Jenko | NaN | NaN | NaN | NaN | NaN | NaN | male |
1285 | 1286 | 3rd | 0 | Ware, Mr Frederick | NaN | NaN | NaN | NaN | NaN | NaN | male |
1286 | 1287 | 3rd | 0 | Warren, Mr Charles William | NaN | NaN | NaN | NaN | NaN | NaN | male |
1287 | 1288 | 3rd | 0 | Wazli, Mr Yousif | NaN | NaN | NaN | NaN | NaN | NaN | male |
1288 | 1289 | 3rd | 0 | Webber, Mr James | NaN | NaN | NaN | NaN | NaN | NaN | male |
1289 | 1290 | 3rd | 1 | Wennerstrom, Mr August Edvard | NaN | NaN | NaN | NaN | NaN | NaN | male |
1290 | 1291 | 3rd | 0 | Wenzel, Mr Linhart | NaN | NaN | NaN | NaN | NaN | NaN | male |
1291 | 1292 | 3rd | 0 | Widegren, Mr Charles Peter | NaN | NaN | NaN | NaN | NaN | NaN | male |
1292 | 1293 | 3rd | 0 | Wiklund, Mr Jacob Alfred | NaN | NaN | NaN | NaN | NaN | NaN | male |
1293 | 1294 | 3rd | 1 | Wilkes, Mrs Ellen | NaN | NaN | NaN | NaN | NaN | NaN | female |
1294 | 1295 | 3rd | 0 | Willer, Mr Aaron | NaN | NaN | NaN | NaN | NaN | NaN | male |
1295 | 1296 | 3rd | 0 | Willey, Mr Edward | NaN | NaN | NaN | NaN | NaN | NaN | male |
1296 | 1297 | 3rd | 0 | Williams, Mr Howard Hugh | NaN | NaN | NaN | NaN | NaN | NaN | male |
1297 | 1298 | 3rd | 0 | Williams, Mr Leslie | NaN | NaN | NaN | NaN | NaN | NaN | male |
1298 | 1299 | 3rd | 0 | Windelov, Mr Einar | NaN | NaN | NaN | NaN | NaN | NaN | male |
1299 | 1300 | 3rd | 0 | Wirz, Mr Albert | NaN | NaN | NaN | NaN | NaN | NaN | male |
1300 | 1301 | 3rd | 0 | Wiseman, Mr Phillippe | NaN | NaN | NaN | NaN | NaN | NaN | male |
1301 | 1302 | 3rd | 0 | Wittevrongel, Mr Camiel | NaN | NaN | NaN | NaN | NaN | NaN | male |
1302 | 1303 | 3rd | 1 | Yalsevac, Mr Ivan | NaN | NaN | NaN | NaN | NaN | NaN | male |
1303 | 1304 | 3rd | 0 | Yasbeck, Mr Antoni | NaN | NaN | NaN | NaN | NaN | NaN | male |
1304 | 1305 | 3rd | 1 | Yasbeck, Mrs Antoni | NaN | NaN | NaN | NaN | NaN | NaN | female |
1305 | 1306 | 3rd | 0 | Youssef, Mr Gerios | NaN | NaN | NaN | NaN | NaN | NaN | male |
1306 | 1307 | 3rd | 0 | Zabour, Miss Hileni | NaN | NaN | NaN | NaN | NaN | NaN | female |
1307 | 1308 | 3rd | 0 | Zabour, Miss Tamini | NaN | NaN | NaN | NaN | NaN | NaN | female |
1308 | 1309 | 3rd | 0 | Zakarian, Mr Artun | NaN | NaN | NaN | NaN | NaN | NaN | male |
1309 | 1310 | 3rd | 0 | Zakarian, Mr Maprieder | NaN | NaN | NaN | NaN | NaN | NaN | male |
1310 | 1311 | 3rd | 0 | Zenn, Mr Philip | NaN | NaN | NaN | NaN | NaN | NaN | male |
1311 | 1312 | 3rd | 0 | Zievens, Rene | NaN | NaN | NaN | NaN | NaN | NaN | female |
1312 | 1313 | 3rd | 0 | Zimmerman, Leo | NaN | NaN | NaN | NaN | NaN | NaN | male |
1313 rows × 11 columns
In [5]:
titan.describe()
Out[5]:
row.names | survived | age | |
---|---|---|---|
count | 1313.000000 | 1313.000000 | 633.000000 |
mean | 657.000000 | 0.341965 | 31.194181 |
std | 379.174762 | 0.474549 | 14.747525 |
min | 1.000000 | 0.000000 | 0.166700 |
25% | 329.000000 | 0.000000 | 21.000000 |
50% | 657.000000 | 0.000000 | 30.000000 |
75% | 985.000000 | 1.000000 | 41.000000 |
max | 1313.000000 | 1.000000 | 71.000000 |
In [6]:
# 2.數據基本處理
# 2.1 確定特徵值,目標值
x = titan[["pclass", "age", "sex"]]
y = titan["survived"]
In [7]:
x.head()
Out[7]:
pclass | age | sex | |
---|---|---|---|
0 | 1st | 29.0000 | female |
1 | 1st | 2.0000 | female |
2 | 1st | 30.0000 | male |
3 | 1st | 25.0000 | female |
4 | 1st | 0.9167 | male |
In [8]:
y.head()
Out[8]:
0 1
1 0
2 0
3 0
4 1
Name: survived, dtype: int64
In [9]:
# 2.2 缺失值處理
x['age'].fillna(value=titan["age"].mean(), inplace=True)
In [10]:
x.head()
Out[10]:
pclass | age | sex | |
---|---|---|---|
0 | 1st | 29.0000 | female |
1 | 1st | 2.0000 | female |
2 | 1st | 30.0000 | male |
3 | 1st | 25.0000 | female |
4 | 1st | 0.9167 | male |
In [11]:
# 2.3 數據集劃分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)
In [12]:
# 3.特徵工程(字典特徵抽取)
In [13]:
x_train.head()
Out[13]:
pclass | age | sex | |
---|---|---|---|
649 | 3rd | 45.000000 | female |
1078 | 3rd | 31.194181 | male |
59 | 1st | 31.194181 | female |
201 | 1st | 18.000000 | male |
61 | 1st | 31.194181 | female |
In [14]:
x_train = x_train.to_dict(orient="records")
x_test = x_test.to_dict(orient="records")
In [15]:
x_train
Out[15]:
[{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 27.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 13.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 62.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 6.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 10.0, 'sex': 'female'},
{'pclass': '1st', 'age': 53.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'female'},
{'pclass': '1st', 'age': 25.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '1st', 'age': 21.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
{'pclass': '1st', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 16.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '1st', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 43.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 59.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 47.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 51.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'female'},
{'pclass': '1st', 'age': 58.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 4.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 12.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 44.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 69.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 2.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '1st', 'age': 47.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
{'pclass': '1st', 'age': 21.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 39.0, 'sex': 'male'},
{'pclass': '1st', 'age': 14.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 47.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '1st', 'age': 53.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'female'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 55.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 49.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 38.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 8.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 57.0, 'sex': 'male'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 6.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 61.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 41.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'female'},
{'pclass': '1st', 'age': 34.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 39.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'female'},
{'pclass': '1st', 'age': 57.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 39.0, 'sex': 'male'},
{'pclass': '1st', 'age': 35.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 67.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 11.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '1st', 'age': 59.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 52.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 43.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '1st', 'age': 51.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 44.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 65.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 37.0, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 40.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 48.0, 'sex': 'female'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 2.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '1st', 'age': 29.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 0.9167, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 14.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '1st', 'age': 60.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 61.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 0.1667, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 15.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
{'pclass': '1st', 'age': 20.0, 'sex': 'female'},
{'pclass': '1st', 'age': 62.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 23.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 70.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 51.0, 'sex': 'female'},
{'pclass': '1st', 'age': 21.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 59.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 38.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 54.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 3.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 15.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 40.0, 'sex': 'female'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 8.0, 'sex': 'female'},
{'pclass': '1st', 'age': 63.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 43.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 38.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'male'},
{'pclass': '1st', 'age': 4.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 57.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 47.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 5.0, 'sex': 'female'},
{'pclass': '1st', 'age': 21.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 35.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '1st', 'age': 11.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 26.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 18.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '1st', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 58.0, 'sex': 'female'},
{'pclass': '1st', 'age': 60.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 44.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 71.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 13.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 58.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 33.0, 'sex': 'female'},
{'pclass': '1st', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 55.0, 'sex': 'female'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '1st', 'age': 71.0, 'sex': 'male'},
{'pclass': '1st', 'age': 47.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 23.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 65.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 7.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 16.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 2.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
...]
In [16]:
transfer = DictVectorizer()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
In [21]:
# 4.xgboost模型訓練
# 4.1 初步模型訓練
from xgboost import XGBClassifier
xg = XGBClassifier()
xg.fit(x_train, y_train)
Out[21]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
In [22]:
xg.score(x_test, y_test)
Out[22]:
0.7832699619771863
In [23]:
# 4.2 對max_depth進行調優
depth_range = range(10)
score = []
for i in depth_range:
xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
xg.fit(x_train, y_train)
s = xg.score(x_test, y_test)
print(s)
score.append(s)
0.6311787072243346
0.7908745247148289
0.7870722433460076
0.7832699619771863
0.7870722433460076
0.7908745247148289
0.7908745247148289
0.7946768060836502
0.7908745247148289
0.7946768060836502
In [25]:
# 4.3 調優結果可視化
import matplotlib.pyplot as plt
plt.plot(depth_range, score)
plt.show()
5.4 otto案例介紹 -- Otto Group Product Classification Challenge【xgboost實現】
1 背景介紹
奧托集團是世界上最大的電子商務公司之一,在20多個國家設有子公司。該公司每天都在世界各地銷售數百萬種產品,所以對其產品根據性能合理的分類非常重要。
不過,在實際工作中,工作人員發現,許多相同的產品得到了不同的分類。本案例要求,你對奧拓集團的產品進行正確的分分類。儘可能的提供分類的準確性。
鏈接:https://www.kaggle.com/c/otto-group-product-classification-challenge/overview
2 思路分析
-
1.數據獲取
-
2.數據基本處理
- 2.1 截取部分數據
- 2.2 把標籤紙轉換爲數字
- 2.3 分割數據(使用StratifiedShuffleSplit)
- 2.4 數據標準化
- 2.5 數據pca降維
-
3.模型訓練
- 3.1 基本模型訓練
- 3.2 模型調優
- 3.2.1 調優參數:
- n_estimator,
- max_depth,
- min_child_weights,
- subsamples,
- consample_bytrees,
- etas
- 3.2.2 確定最後最優參數
- 3.2.1 調優參數:
3 部分代碼實現
-
2.數據基本處理
-
2.1 截取部分數據
-
2.2 把標籤紙轉換爲數字
-
2.3 分割數據(使用StratifiedShuffleSplit)
# 使用StratifiedShuffleSplit對數據集進行分割 from sklearn.model_selection import StratifiedShuffleSplit sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0) for train_index, test_index in sss.split(X_resampled.values, y_resampled): print(len(train_index)) print(len(test_index)) x_train = X_resampled.values[train_index] x_val = X_resampled.values[test_index] y_train = y_resampled[train_index] y_val = y_resampled[test_index]
# 分割數據圖形可視化 import seaborn as sns sns.countplot(y_val) plt.show()
-
2.4 數據標準化
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(x_train) x_train_scaled = scaler.transform(x_train) x_val_scaled = scaler.transform(x_val)
-
2.5 數據pca降維
print(x_train_scaled.shape) # (13888, 93) from sklearn.decomposition import PCA pca = PCA(n_components=0.9) x_train_pca = pca.fit_transform(x_train_scaled) x_val_pca = pca.transform(x_val_scaled) print(x_train_pca.shape, x_val_pca.shape) (13888, 65) (3473, 65)
從上面輸出的數據可以看出,只選擇65個元素,就可以表達出特徵中90%的信息
# 降維數據可視化 plt.plot(np.cumsum(pca.explained_variance_ratio_)) plt.xlabel("元素數量") plt.ylabel("可表達信息的百分佔比") plt.show()
-
-
3.模型訓練
-
3.1 基本模型訓練
from xgboost import XGBClassifier xgb = XGBClassifier() xgb.fit(x_train_pca, y_train) # 改變預測值的輸出模式,讓輸出結果爲百分佔比,降低logloss值 y_pre_proba = xgb.predict_proba(x_val_pca)
# logloss進行模型評估 from sklearn.metrics import log_loss log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True) xgb.get_params
-
-
3.2 模型調優
-
3.2.1 調優參數:
-
n_estimator,
scores_ne = [] n_estimators = [100,200,400,450,500,550,600,700] for nes in n_estimators: print("n_estimators:", nes) xgb = XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=nes, objective="multi:softprob", n_jobs=-1, nthread=4, min_child_weight=1, subsample=1, colsample_bytree=1, seed=42) xgb.fit(x_train_pca, y_train) y_pre = xgb.predict_proba(x_val_pca) score = log_loss(y_val, y_pre) scores_ne.append(score) print("測試數據的logloss值爲:{}".format(score))
# 數據變化可視化 plt.plot(n_estimators, scores_ne, "o-") plt.ylabel("log_loss") plt.xlabel("n_estimators") print("n_estimators的最優值爲:{}".format(n_estimators[np.argmin(scores_ne)]))
-
-
-
-
-
max_depth,
scores_md = [] max_depths = [1,3,5,6,7] for md in max_depths: # 修改 xgb = XGBClassifier(max_depth=md, # 修改 learning_rate=0.1, n_estimators=n_estimators[np.argmin(scores_ne)], # 修改 objective="multi:softprob", n_jobs=-1, nthread=4, min_child_weight=1, subsample=1, colsample_bytree=1, seed=42) xgb.fit(x_train_pca, y_train) y_pre = xgb.predict_proba(x_val_pca) score = log_loss(y_val, y_pre) scores_md.append(score) # 修改 print("測試數據的logloss值爲:{}".format(log_loss(y_val, y_pre)))
# 數據變化可視化 plt.plot(max_depths, scores_md, "o-") # 修改 plt.ylabel("log_loss") plt.xlabel("max_depths") # 修改 print("max_depths的最優值爲:{}".format(max_depths[np.argmin(scores_md)])) # 修改
-
min_child_weights,
- 依據上面模式進行調整
-
subsamples,
-
consample_bytrees,
-
etas
-
-
3.2.2 確定最後最優參數
xgb = XGBClassifier(learning_rate =0.1, n_estimators=550, max_depth=3, min_child_weight=3, subsample=0.7, colsample_bytree=0.7, nthread=4, seed=42, objective='multi:softprob') xgb.fit(x_train_scaled, y_train) y_pre = xgb.predict_proba(x_val_scaled) print("測試數據的logloss值爲 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
-
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
數據獲取
In [2]:
data = pd.read_csv("./data/otto/train.csv")
In [3]:
data.head()
Out[3]:
id | feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | ... | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
3 | 4 | 1 | 0 | 0 | 1 | 6 | 1 | 5 | 0 | 0 | ... | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Class_1 |
5 rows × 95 columns
In [4]:
data.shape
Out[4]:
(61878, 95)
In [5]:
data.describe()
Out[5]:
id | feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | ... | feat_84 | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 61878.000000 | 61878.00000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | ... | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 |
mean | 30939.500000 | 0.38668 | 0.263066 | 0.901467 | 0.779081 | 0.071043 | 0.025696 | 0.193704 | 0.662433 | 1.011296 | ... | 0.070752 | 0.532306 | 1.128576 | 0.393549 | 0.874915 | 0.457772 | 0.812421 | 0.264941 | 0.380119 | 0.126135 |
std | 17862.784315 | 1.52533 | 1.252073 | 2.934818 | 2.788005 | 0.438902 | 0.215333 | 1.030102 | 2.255770 | 3.474822 | ... | 1.151460 | 1.900438 | 2.681554 | 1.575455 | 2.115466 | 1.527385 | 4.597804 | 2.045646 | 0.982385 | 1.201720 |
min | 1.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 15470.250000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 30939.500000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
75% | 46408.750000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
max | 61878.000000 | 61.00000 | 51.000000 | 64.000000 | 70.000000 | 19.000000 | 10.000000 | 38.000000 | 76.000000 | 43.000000 | ... | 76.000000 | 55.000000 | 65.000000 | 67.000000 | 30.000000 | 61.000000 | 130.000000 | 52.000000 | 19.000000 | 87.000000 |
8 rows × 94 columns
In [6]:
# 圖形可視化,查看數據分佈
import seaborn as sns
sns.countplot(data.target)
plt.show()
由上圖可以看出,該數據類別不均衡,所以需要後期處理
數據基本處理
數據已經經過脫敏,不再需要特殊處理
截取部分數據
In [7]:
new1_data = data[:10000]
new1_data.shape
Out[7]:
(10000, 95)
In [8]:
# 圖形可視化,查看數據分佈
import seaborn as sns
sns.countplot(new1_data.target)
plt.show()
使用上面方式獲取數據不可行,然後使用隨機欠採樣獲取響應的數據
In [9]:
# 隨機欠採樣獲取數據
# 首先需要確定特徵值\標籤值
y = data["target"]
x = data.drop(["id", "target"], axis=1)
In [10]:
x.head()
Out[10]:
feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | feat_10 | ... | feat_84 | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 1 | 6 | 1 | 5 | 0 | 0 | 1 | ... | 22 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5 rows × 93 columns
In [11]:
y.head()
Out[11]:
0 Class_1
1 Class_1
2 Class_1
3 Class_1
4 Class_1
Name: target, dtype: object
In [12]:
# 欠採樣獲取數據
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_resampled, y_resampled = rus.fit_resample(x, y)
In [13]:
x.shape, y.shape
Out[13]:
((61878, 93), (61878,))
In [14]:
X_resampled.shape, y_resampled.shape
Out[14]:
((17361, 93), (17361,))
In [15]:
# 圖形可視化,查看數據分佈
import seaborn as sns
sns.countplot(y_resampled)
plt.show()
把標籤值轉換爲數字
In [16]:
y_resampled.head()
Out[16]:
0 Class_1
1 Class_1
2 Class_1
3 Class_1
4 Class_1
Name: target, dtype: object
In [17]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_resampled = le.fit_transform(y_resampled)
In [18]:
y_resampled
Out[18]:
array([0, 0, 0, ..., 8, 8, 8])
分割數據
In [19]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)
In [20]:
x_train.shape, y_train.shape
Out[20]:
((13888, 93), (13888,))
In [21]:
x_test.shape, y_test.shape
Out[21]:
((3473, 93), (3473,))
In [22]:
# 1.數據獲取
# 2.數據基本處理
# 2.1 截取部分數據
# 2.2 把標籤紙轉換爲數字
# 2.3 分割數據(使用StratifiedShuffleSplit)
# 2.4 數據標準化
# 2.5 數據pca降維
# 3.模型訓練
# 3.1 基本模型訓練
# 3.2 模型調優
# 3.2.1 調優參數:
# n_estimator,
# max_depth,
# min_child_weights,
# subsamples,
# consample_bytrees,
# etas
# 3.2.2 確定最後最優參數
In [23]:
# 圖形可視化
import seaborn as sns
sns.countplot(y_test)
plt.show()
In [28]:
# 通過StratifiedShuffleSplit實現數據分割
from sklearn.model_selection import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
for train_index, test_index in sss.split(X_resampled.values, y_resampled):
print(len(train_index))
print(len(test_index))
x_train = X_resampled.values[train_index]
x_val = X_resampled.values[test_index]
y_train = y_resampled[train_index]
y_val = y_resampled[test_index]
13888
3473
In [29]:
print(x_train.shape, x_val.shape)
(13888, 93) (3473, 93)
In [30]:
# 圖形可視化
import seaborn as sns
sns.countplot(y_val)
plt.show()
數據標準化
In [31]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)
數據PCA降維
In [33]:
x_train_scaled.shape
Out[33]:
(13888, 93)
In [34]:
from sklearn.decomposition import PCA
pca = PCA(n_components=0.9)
x_train_pca = pca.fit_transform(x_train_scaled)
x_val_pca = pca.transform(x_val_scaled)
In [35]:
print(x_train_pca.shape, x_val_pca.shape)
(13888, 65) (3473, 65)
In [37]:
# 可視化數據降維信息變化程度
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel("元素數量")
plt.ylabel("表達信息百分佔比")
plt.show()
模型訓練
基本模型訓練
In [38]:
from xgboost import XGBClassifier
xgb = XGBClassifier()
xgb.fit(x_train_pca, y_train)
Out[38]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
In [39]:
# 輸出預測值,一定輸出帶有百分佔比的預測值
y_pre_proba = xgb.predict_proba(x_val_pca)
In [40]:
y_pre_proba
Out[40]:
array([[0.4893983 , 0.00375719, 0.00225278, ..., 0.06179977, 0.17131925,
0.03980364],
[0.14336601, 0.01110009, 0.01018962, ..., 0.00691424, 0.02062171,
0.7525783 ],
[0.00834821, 0.14602502, 0.65013766, ..., 0.01385602, 0.00602207,
0.00240582],
...,
[0.09568001, 0.00293341, 0.00582061, ..., 0.1031019 , 0.7587154 ,
0.02730099],
[0.40236628, 0.12317444, 0.03567632, ..., 0.18818544, 0.13276173,
0.07105519],
[0.00473167, 0.01536749, 0.02546864, ..., 0.00882399, 0.88531935,
0.00384397]], dtype=float32)
In [42]:
# logloss評估
from sklearn.metrics import log_loss
log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)
Out[42]:
0.7845457684689274
In [43]:
xgb.get_params
Out[43]:
<bound method XGBModel.get_params of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)>
模型調優
確定最優的estimators
In [44]:
scores_ne = []
n_estimators = [100, 200, 300, 400, 500, 550, 600, 700]
In [49]:
for nes in n_estimators:
print("n_estimators:", nes)
xgb = XGBClassifier(max_depth=3,
learning_rate=0.1,
n_estimators=nes,
objective="multi:softprob",
n_jobs=-1,
nthread=4,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
seed=42)
xgb.fit(x_train_pca, y_train)
y_pre = xgb.predict_proba(x_val_pca)
score = log_loss(y_val, y_pre)
scores_ne.append(score)
print("每次測試的logloss值是:{}".format(score))
n_estimators: 100
每次測試的logloss值是:0.7845457684689274
n_estimators: 200
每次測試的logloss值是:0.7163659085830947
n_estimators: 300
每次測試的logloss值是:0.6933389946023942
n_estimators: 400
每次測試的logloss值是:0.68119252278615
n_estimators: 500
每次測試的logloss值是:0.67700775120196
n_estimators: 550
每次測試的logloss值是:0.6756911007299885
n_estimators: 600
每次測試的logloss值是:0.6757532660164814
n_estimators: 700
每次測試的logloss值是:0.6778721089881976
In [50]:
# 圖形化展示相應的logloss值
plt.plot(n_estimators, scores_ne, "o-")
plt.xlabel("n_estimators")
plt.ylabel("log_loss")
plt.show()
print("最優的n_estimators值是:{}".format(n_estimators[np.argmin(scores_ne)]))
最優的n_estimators值是:550
確定最優的max_depth
In [63]:
scores_md = []
max_depths = [1,3,5,6,7]
In [64]:
for md in max_depths:
print("max_depth:", md)
xgb = XGBClassifier(max_depth=md,
learning_rate=0.1,
n_estimators=n_estimators[np.argmin(scores_ne)],
objective="multi:softprob",
n_jobs=-1,
nthread=4,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
seed=42)
xgb.fit(x_train_pca, y_train)
y_pre = xgb.predict_proba(x_val_pca)
score = log_loss(y_val, y_pre)
scores_md.append(score)
print("每次測試的logloss值是:{}".format(score))
max_depth: 1
每次測試的logloss值是:0.8186777106711784
max_depth: 3
每次測試的logloss值是:0.6756911007299885
max_depth: 5
每次測試的logloss值是:0.730323661087053
max_depth: 6
每次測試的logloss值是:0.7693314501840949
max_depth: 7
每次測試的logloss值是:0.7889236364892144
In [67]:
# 圖形化展示相應的logloss值
plt.plot(max_depths, scores_md, "o-")
plt.xlabel("max_depths")
plt.ylabel("log_loss")
plt.show()
print("最優的max_depths值是:{}".format(max_depths[np.argmin(scores_md)]))
最優的max_depths值是:3
依據上面模式,運行調試下面參數
min_child_weights,
subsamples,
consample_bytrees,
etas
In [69]:
xgb = XGBClassifier(learning_rate =0.1,
n_estimators=550,
max_depth=3,
min_child_weight=3,
subsample=0.7,
colsample_bytree=0.7,
nthread=4,
seed=42,
objective='multi:softprob')
xgb.fit(x_train_scaled, y_train)
y_pre = xgb.predict_proba(x_val_scaled)
print("測試數據的log_loss值爲 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
測試數據的log_loss值爲 : 0.5944022517380477