集成學習:XGBoost

日萌社

人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度學習實戰(不定時更新)


集成學習:Bagging、隨機森林、Boosting、GBDT

集成學習:XGBoost

集成學習:lightGBM(一)

集成學習:lightGBM(二)


5.1 xgboost算法原理

XGBoost(Extreme Gradient Boosting)全名叫極端梯度提升樹,XGBoost是集成學習方法的王牌,在Kaggle數據挖掘比賽中,大部分獲勝者用了XGBoost。

XGBoost在絕大多數的迴歸和分類問題上表現的十分頂尖,本節將較詳細的介紹XGBoost的算法原理。

1 最優模型的構建方法

我們在前面已經知道,構建最優模型的一般方法是最小化訓練數據的損失函數

我們用字母 L表示損失,如下式:

其中,F是假設空間

假設空間是在已知屬性和屬性可能取值的情況下,對所有可能滿足目標的情況的一種毫無遺漏的假設集合。

式(1.1)稱爲經驗風險最小化,訓練得到的模型複雜度較高。當訓練數據較小時,模型很容易出現過擬合問題。

因此,爲了降低模型的複雜度,常採用下式:

其中J(f)J(f)爲模型的複雜度,

式(2.1)稱爲結構風險最小化,結構風險最小化的模型往往對訓練數據以及未知的測試數據都有較好的預測 。

應用:

  • 決策樹的生成和剪枝分別對應了經驗風險最小化和結構風險最小化,
  • XGBoost的決策樹生成是結構風險最小化的結果,後續會詳細介紹。

2 XGBoost的目標函數推導

2.1 目標函數確定

目標函數,即損失函數,通過最小化損失函數來構建最優模型。

由前面可知, 損失函數應加上表示模型複雜度的正則項,且XGBoost對應的模型包含了多個CART樹,因此,模型的目標函數爲:

2.2 CART樹的介紹

2.3 樹的複雜度定義

2.3.1 定義每課樹的複雜度

XGBoost法對應的模型包含了多棵cart樹,定義每棵樹的複雜度:

2.3.2 樹的複雜度舉例

假設我們要預測一家人對電子遊戲的喜好程度,考慮到年輕和年老相比,年輕更可能喜歡電子遊戲,以及男性和女性相比,男性更喜歡電子遊戲,故先根據年齡大小區分小孩和大人,然後再通過性別區分開是男是女,逐一給各人在電子遊戲喜好程度上打分,如下圖所示:

就這樣,訓練出了2棵樹tree1和tree2,類似之前gbdt的原理,兩棵樹的結論累加起來便是最終的結論,所以:

  • 小男孩的預測分數就是兩棵樹中小孩所落到的結點的分數相加:2 + 0.9 = 2.9。
  • 爺爺的預測分數同理:-1 + (-0.9)= -1.9。

具體如下圖所示:

2.4 目標函數推導

3 XGBoost的迴歸樹構建方法

3.1 計算分裂節點

在實際訓練過程中,當建立第 t 棵樹時,XGBoost採用貪心法進行樹結點的分裂:

從樹深爲0時開始:

  • 對樹中的每個葉子結點嘗試進行分裂;

  • 每次分裂後,原來的一個葉子結點繼續分裂爲左右兩個子葉子結點,原葉子結點中的樣本集將根據該結點的判斷規則分散到左右兩個葉子結點中;

  • 新分裂一個結點後,我們需要檢測這次分裂是否會給損失函數帶來增益,增益的定義如下:

如果增益Gain>0,即分裂爲兩個葉子節點後,目標函數下降了,那麼我們會考慮此次分裂的結果。

那麼一直這樣分裂,什麼時候纔會停止呢?

3.2 停止分裂條件判斷

情況一:上節推導得到的打分函數是衡量樹結構好壞的標準,因此,可用打分函數來選擇最佳切分點。首先確定樣本特徵的所有切分點,對每一個確定的切分點進行切分,切分好壞的標準如下:

4 XGBoost與GDBT的區別

  • 區別一:
    • XGBoost生成CART樹考慮了樹的複雜度,
    • GDBT未考慮,GDBT在樹的剪枝步驟中考慮了樹的複雜度。
  • 區別二:
    • XGBoost是擬合上一輪損失函數的二階導展開,GDBT是擬合上一輪損失函數的一階導展開,因此,XGBoost的準確性更高,且滿足相同的訓練效果,需要的迭代次數更少。
  • 區別三:
    • XGBoost與GDBT都是逐次迭代來提高模型性能,但是XGBoost在選取最佳切分點時可以開啓多線程進行,大大提高了運行速度。

5 小結


5.2 xgboost算法api介紹

1 xgboost的安裝:

官網鏈接:https://xgboost.readthedocs.io/en/latest/

pip3 install xgboost

2 xgboost參數介紹

xgboost雖然被稱爲kaggle比賽神奇,但是,我們要想訓練出不錯的模型,必須要給參數傳遞合適的值。

xgboost中封裝了很多參數,主要由三種類型構成:通用參數(general parameters),Booster 參數(booster parameters)和學習目標參數(task parameters)

  • 通用參數:主要是宏觀函數控制;
  • Booster參數:取決於選擇的Booster類型,用於控制每一步的booster(tree, regressiong)
  • 學習目標參數:控制訓練目標的表現

2.1 通用參數(general parameters)

  1. booster [缺省值=gbtree]
  2. 決定使用哪個booster,可以是gbtree,gblinear或者dart。

    • gbtree和dart使用基於樹的模型(dart 主要多了 Dropout),而gblinear 使用線性函數.
  3. silent [缺省值=0]

    • 設置爲0打印運行信息;設置爲1靜默模式,不打印
  4. nthread [缺省值=設置爲最大可能的線程數]

    • 並行運行xgboost的線程數,輸入的參數應該<=系統的CPU核心數,若是沒有設置算法會檢測將其設置爲CPU的全部核心數

下面的兩個參數不需要設置,使用默認的就好了

  1. num_pbuffer [xgboost自動設置,不需要用戶設置]

    • 預測結果緩存大小,通常設置爲訓練實例的個數。該緩存用於保存最後boosting操作的預測結果。
  2. num_feature [xgboost自動設置,不需要用戶設置]

    • 在boosting中使用特徵的維度,設置爲特徵的最大維度

2.2 Booster 參數(booster parameters)

2.2.1 Parameters for Tree Booster

  1. eta [缺省值=0.3,別名:learning_rate]

    • 更新中減少的步長來防止過擬合。

    • 在每次boosting之後,可以直接獲得新的特徵權值,這樣可以使得boosting更加魯棒。

    • 範圍: [0,1]
  2. gamma [缺省值=0,別名: min_split_loss](分裂最小loss)

    • 在節點分裂時,只有分裂後損失函數的值下降了,纔會分裂這個節點。
    • Gamma指定了節點分裂所需的最小損失函數下降值。 這個參數的值越大,算法越保守。這個參數的值和損失函數息息相關,所以是需要調整的。

    • 範圍: [0,∞]

  3. max_depth [缺省值=6]

    • 這個值爲樹的最大深度。 這個值也是用來避免過擬合的。max_depth越大,模型會學到更具體更局部的樣本。設置爲0代表沒有限制
    • 範圍: [0,∞]
  4. min_child_weight [缺省值=1]

    • 決定最小葉子節點樣本權重和。XGBoost的這個參數是最小樣本權重的和.
    • 當它的值較大時,可以避免模型學習到局部的特殊樣本。 但是如果這個值過高,會導致欠擬合。這個參數需要使用CV來調整。.
    • 範圍: [0,∞]
  5. subsample [缺省值=1]

    • 這個參數控制對於每棵樹,隨機採樣的比例。
    • 減小這個參數的值,算法會更加保守,避免過擬合。但是,如果這個值設置得過小,它可能會導致欠擬合。

    • 典型值:0.5-1,0.5代表平均採樣,防止過擬合.

    • 範圍: (0,1]
  6. colsample_bytree [缺省值=1]

    • 用來控制每棵隨機採樣的列數的佔比(每一列是一個特徵)。
    • 典型值:0.5-1
    • 範圍: (0,1]
  7. colsample_bylevel [缺省值=1]

    • 用來控制樹的每一級的每一次分裂,對列數的採樣的佔比。
    • 我個人一般不太用這個參數,因爲subsample參數和colsample_bytree參數可以起到相同的作用。但是如果感興趣,可以挖掘這個參數更多的用處。
    • 範圍: (0,1]
  8. lambda [缺省值=1,別名: reg_lambda]

    • 權重的L2正則化項(和Ridge regression類似)。
    • 這個參數是用來控制XGBoost的正則化部分的。雖然大部分數據科學家很少用到這個參數,但是這個參數
    • 在減少過擬合上還是可以挖掘出更多用處的。.
  9. alpha [缺省值=0,別名: reg_alpha]

    • 權重的L1正則化項。(和Lasso regression類似)。 可以應用在很高維度的情況下,使得算法的速度更快。
  10. scale_pos_weight[缺省值=1]

    • 在各類別樣本十分不平衡時,把這個參數設定爲一個正值,可以使算法更快收斂。通常可以將其設置爲負
    • 樣本的數目與正樣本數目的比值。

2.2.2 Parameters for Linear Booster

linear booster一般很少用到。

  1. lambda [缺省值=0,別稱: reg_lambda]

    • L2正則化懲罰係數,增加該值會使得模型更加保守。
  2. alpha [缺省值=0,別稱: reg_alpha]

    • L1正則化懲罰係數,增加該值會使得模型更加保守。
  3. lambda_bias [缺省值=0,別稱: reg_lambda_bias]

    • 偏置上的L2正則化(沒有在L1上加偏置,因爲並不重要)

2.3 學習目標參數(task parameters)

  1. objective [缺省值=reg:linear]

    1. reg:linear” – 線性迴歸
    2. “reg:logistic” – 邏輯迴歸
    3. binary:logistic” – 二分類邏輯迴歸,輸出爲概率
    4. multi:softmax” – 使用softmax的多分類器,返回預測的類別(不是概率)。在這種情況下,你還需要多設一個參數:num_class(類別數目)
    5. multi:softprob” – 和multi:softmax參數一樣,但是返回的是每個數據屬於各個類別的概率。
  2. eval_metric [缺省值=通過目標函數選擇]

    可供選擇的如下所示:

    1. rmse”: 均方根誤差
    2. mae”: 平均絕對值誤差
    3. logloss”: 負對數似然函數值
    4. error”: 二分類錯誤率。
      • 其值通過錯誤分類數目與全部分類數目比值得到。對於預測,預測值大於0.5被認爲是正類,其它歸爲負類。
    5. error@t”: 不同的劃分閾值可以通過 ‘t’進行設置
    6. merror”: 多分類錯誤率,計算公式爲(wrong cases)/(all cases)
    7. mlogloss”: 多分類log損失
    8. auc”: 曲線下的面積
  3. seed [缺省值=0]

    • 隨機數的種子
  • 設置它可以復現隨機數據的結果,也可以用於調整參數

5.3 xgboost案例介紹

1 案例背景

該案例和前面決策樹中所用案例一樣。

泰坦尼克號沉沒是歷史上最臭名昭着的沉船事件之一。1912年4月15日,在她的處女航中,泰坦尼克號在與冰山相撞後沉沒,在2224名乘客和機組人員中造成1502人死亡。這場聳人聽聞的悲劇震驚了國際社會,併爲船舶制定了更好的安全規定。 造成海難失事的原因之一是乘客和機組人員沒有足夠的救生艇。儘管倖存下沉有一些運氣因素,但有些人比其他人更容易生存,例如婦女,兒童和上流社會。 在這個案例中,我們要求您完成對哪些人可能存活的分析。特別是,我們要求您運用機器學習工具來預測哪些乘客倖免於悲劇。

案例:https://www.kaggle.com/c/titanic/overview

我們提取到的數據集中的特徵包括票的類別,是否存活,乘坐班次,年齡,登陸home.dest,房間,船和性別等。

數據:http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt

經過觀察數據得到:

  • 1 乘坐班是指乘客班(1,2,3),是社會經濟階層的代表。
  • 2 其中age數據存在缺失。

2 步驟分析

  • 1.獲取數據
  • 2.數據基本處理
    • 2.1 確定特徵值,目標值
    • 2.2 缺失值處理
    • 2.3 數據集劃分
  • 3.特徵工程(字典特徵抽取)
  • 4.機器學習(xgboost)
  • 5.模型評估

3 代碼實現

  • 導入需要的模塊
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
  • 1.獲取數據
# 1、獲取數據
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
  • 2.數據基本處理

    • 2.1 確定特徵值,目標值
    x = titan[["pclass", "age", "sex"]]
    y = titan["survived"]
    
    • 2.2 缺失值處理
    # 缺失值需要處理,將特徵當中有類別的這些特徵進行字典特徵抽取
    x['age'].fillna(x['age'].mean(), inplace=True)
    
    • 2.3 數據集劃分
    x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
    
  • 3.特徵工程(字典特徵抽取)

特徵中出現類別符號,需要進行one-hot編碼處理(DictVectorizer)

x.to_dict(orient="records") 需要將數組特徵轉換成字典數據

# 對於x轉換成字典數據x.to_dict(orient="records")
# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]

transfer = DictVectorizer(sparse=False)

x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))

  • 4.xgboost模型訓練和模型評估
# 模型初步訓練
from xgboost import XGBClassifier
xg = XGBClassifier()

xg.fit(x_train, y_train)

xg.score(x_test, y_test)
# 針對max_depth進行模型調優
depth_range = range(10)
score = []
for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    s = xg.score(x_test, y_test)
    print(s)
    score.append(s)
# 結果可視化
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()


In [1]:

# 1.獲取數據
# 2.數據基本處理
# 2.1 確定特徵值,目標值
# 2.2 缺失值處理
# 2.3 數據集劃分
# 3.特徵工程(字典特徵抽取)
# 4.機器學習(xgboost)
# 5.模型評估

In [2]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz

In [3]:

# 1.獲取數據
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")

In [4]:

titan

Out[4]:

  row.names pclass survived name age embarked home.dest room ticket boat sex
0 1 1st 1 Allen, Miss Elisabeth Walton 29.0000 Southampton St Louis, MO B-5 24160 L221 2 female
1 2 1st 0 Allison, Miss Helen Loraine 2.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN NaN female
2 3 1st 0 Allison, Mr Hudson Joshua Creighton 30.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN (135) male
3 4 1st 0 Allison, Mrs Hudson J.C. (Bessie Waldo Daniels) 25.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN NaN female
4 5 1st 1 Allison, Master Hudson Trevor 0.9167 Southampton Montreal, PQ / Chesterville, ON C22 NaN 11 male
5 6 1st 1 Anderson, Mr Harry 47.0000 Southampton New York, NY E-12 NaN 3 male
6 7 1st 1 Andrews, Miss Kornelia Theodosia 63.0000 Southampton Hudson, NY D-7 13502 L77 10 female
7 8 1st 0 Andrews, Mr Thomas, jr 39.0000 Southampton Belfast, NI A-36 NaN NaN male
8 9 1st 1 Appleton, Mrs Edward Dale (Charlotte Lamson) 58.0000 Southampton Bayside, Queens, NY C-101 NaN 2 female
9 10 1st 0 Artagaveytia, Mr Ramon 71.0000 Cherbourg Montevideo, Uruguay NaN NaN (22) male
10 11 1st 0 Astor, Colonel John Jacob 47.0000 Cherbourg New York, NY NaN 17754 L224 10s 6d (124) male
11 12 1st 1 Astor, Mrs John Jacob (Madeleine Talmadge Force) 19.0000 Cherbourg New York, NY NaN 17754 L224 10s 6d 4 female
12 13 1st 1 Aubert, Mrs Leontine Pauline NaN Cherbourg Paris, France B-35 17477 L69 6s 9 female
13 14 1st 1 Barkworth, Mr Algernon H. NaN Southampton Hessle, Yorks A-23 NaN B male
14 15 1st 0 Baumann, Mr John D. NaN Southampton New York, NY NaN NaN NaN male
15 16 1st 1 Baxter, Mrs James (Helene DeLaudeniere Chaput) 50.0000 Cherbourg Montreal, PQ B-58/60 NaN 6 female
16 17 1st 0 Baxter, Mr Quigg Edmond 24.0000 Cherbourg Montreal, PQ B-58/60 NaN NaN male
17 18 1st 0 Beattie, Mr Thomson 36.0000 Cherbourg Winnipeg, MN C-6 NaN NaN male
18 19 1st 1 Beckwith, Mr Richard Leonard 37.0000 Southampton New York, NY D-35 NaN 5 male
19 20 1st 1 Beckwith, Mrs Richard Leonard (Sallie Monypeny) 47.0000 Southampton New York, NY D-35 NaN 5 female
20 21 1st 1 Behr, Mr Karl Howell 26.0000 Cherbourg New York, NY C-148 NaN 5 male
21 22 1st 0 Birnbaum, Mr Jakob 25.0000 Cherbourg San Francisco, CA NaN NaN (148) male
22 23 1st 1 Bishop, Mr Dickinson H. 25.0000 Cherbourg Dowagiac, MI B-49 NaN 7 male
23 24 1st 1 Bishop, Mrs Dickinson H. (Helen Walton) 19.0000 Cherbourg Dowagiac, MI B-49 NaN 7 female
24 25 1st 1 Bjornstrm-Steffansson, Mr Mauritz Hakan 28.0000 Southampton Stockholm, Sweden / Washington, DC NaN   D male
25 26 1st 0 Blackwell, Mr Stephen Weart 45.0000 Southampton Trenton, NJ NaN NaN (241) male
26 27 1st 1 Blank, Mr Henry 39.0000 Cherbourg Glen Ridge, NJ A-31 NaN 7 male
27 28 1st 1 Bonnell, Miss Caroline 30.0000 Southampton Youngstown, OH C-7 NaN 8 female
28 29 1st 1 Bonnell, Miss Elizabeth 58.0000 Southampton Birkdale, England Cleveland, Ohio C-103 NaN 8 female
29 30 1st 0 Borebank, Mr John James NaN Southampton London / Winnipeg, MB D-21/2 NaN NaN male
... ... ... ... ... ... ... ... ... ... ... ...
1283 1284 3rd 0 Vestrom, Miss Hulda Amanda Adolfina NaN NaN NaN NaN NaN NaN female
1284 1285 3rd 0 Vonk, Mr Jenko NaN NaN NaN NaN NaN NaN male
1285 1286 3rd 0 Ware, Mr Frederick NaN NaN NaN NaN NaN NaN male
1286 1287 3rd 0 Warren, Mr Charles William NaN NaN NaN NaN NaN NaN male
1287 1288 3rd 0 Wazli, Mr Yousif NaN NaN NaN NaN NaN NaN male
1288 1289 3rd 0 Webber, Mr James NaN NaN NaN NaN NaN NaN male
1289 1290 3rd 1 Wennerstrom, Mr August Edvard NaN NaN NaN NaN NaN NaN male
1290 1291 3rd 0 Wenzel, Mr Linhart NaN NaN NaN NaN NaN NaN male
1291 1292 3rd 0 Widegren, Mr Charles Peter NaN NaN NaN NaN NaN NaN male
1292 1293 3rd 0 Wiklund, Mr Jacob Alfred NaN NaN NaN NaN NaN NaN male
1293 1294 3rd 1 Wilkes, Mrs Ellen NaN NaN NaN NaN NaN NaN female
1294 1295 3rd 0 Willer, Mr Aaron NaN NaN NaN NaN NaN NaN male
1295 1296 3rd 0 Willey, Mr Edward NaN NaN NaN NaN NaN NaN male
1296 1297 3rd 0 Williams, Mr Howard Hugh NaN NaN NaN NaN NaN NaN male
1297 1298 3rd 0 Williams, Mr Leslie NaN NaN NaN NaN NaN NaN male
1298 1299 3rd 0 Windelov, Mr Einar NaN NaN NaN NaN NaN NaN male
1299 1300 3rd 0 Wirz, Mr Albert NaN NaN NaN NaN NaN NaN male
1300 1301 3rd 0 Wiseman, Mr Phillippe NaN NaN NaN NaN NaN NaN male
1301 1302 3rd 0 Wittevrongel, Mr Camiel NaN NaN NaN NaN NaN NaN male
1302 1303 3rd 1 Yalsevac, Mr Ivan NaN NaN NaN NaN NaN NaN male
1303 1304 3rd 0 Yasbeck, Mr Antoni NaN NaN NaN NaN NaN NaN male
1304 1305 3rd 1 Yasbeck, Mrs Antoni NaN NaN NaN NaN NaN NaN female
1305 1306 3rd 0 Youssef, Mr Gerios NaN NaN NaN NaN NaN NaN male
1306 1307 3rd 0 Zabour, Miss Hileni NaN NaN NaN NaN NaN NaN female
1307 1308 3rd 0 Zabour, Miss Tamini NaN NaN NaN NaN NaN NaN female
1308 1309 3rd 0 Zakarian, Mr Artun NaN NaN NaN NaN NaN NaN male
1309 1310 3rd 0 Zakarian, Mr Maprieder NaN NaN NaN NaN NaN NaN male
1310 1311 3rd 0 Zenn, Mr Philip NaN NaN NaN NaN NaN NaN male
1311 1312 3rd 0 Zievens, Rene NaN NaN NaN NaN NaN NaN female
1312 1313 3rd 0 Zimmerman, Leo NaN NaN NaN NaN NaN NaN male

1313 rows × 11 columns

In [5]:

titan.describe()

Out[5]:

  row.names survived age
count 1313.000000 1313.000000 633.000000
mean 657.000000 0.341965 31.194181
std 379.174762 0.474549 14.747525
min 1.000000 0.000000 0.166700
25% 329.000000 0.000000 21.000000
50% 657.000000 0.000000 30.000000
75% 985.000000 1.000000 41.000000
max 1313.000000 1.000000 71.000000

In [6]:

# 2.數據基本處理
# 2.1 確定特徵值,目標值
x = titan[["pclass", "age", "sex"]]
y = titan["survived"]

In [7]:

x.head()

Out[7]:

  pclass age sex
0 1st 29.0000 female
1 1st 2.0000 female
2 1st 30.0000 male
3 1st 25.0000 female
4 1st 0.9167 male

In [8]:

y.head()

Out[8]:

0    1
1    0
2    0
3    0
4    1
Name: survived, dtype: int64

In [9]:

# 2.2 缺失值處理
x['age'].fillna(value=titan["age"].mean(), inplace=True)

In [10]:

x.head()

Out[10]:

  pclass age sex
0 1st 29.0000 female
1 1st 2.0000 female
2 1st 30.0000 male
3 1st 25.0000 female
4 1st 0.9167 male

In [11]:

# 2.3 數據集劃分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)

In [12]:

# 3.特徵工程(字典特徵抽取)

In [13]:

x_train.head()

Out[13]:

  pclass age sex
649 3rd 45.000000 female
1078 3rd 31.194181 male
59 1st 31.194181 female
201 1st 18.000000 male
61 1st 31.194181 female

In [14]:

x_train = x_train.to_dict(orient="records")
x_test = x_test.to_dict(orient="records")

In [15]:

x_train

Out[15]:

[{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 13.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 10.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 43.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 51.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 4.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 12.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 69.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 14.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 57.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 41.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 67.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 11.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 59.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 44.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 65.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 37.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 0.9167, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 14.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.1667, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 20.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 70.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 63.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 4.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 47.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 5.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 11.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 71.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 13.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 71.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 65.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 7.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 ...]

In [16]:

transfer = DictVectorizer()

x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)

In [21]:

# 4.xgboost模型訓練
# 4.1 初步模型訓練
from xgboost import XGBClassifier

xg = XGBClassifier()

xg.fit(x_train, y_train)

Out[21]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [22]:

xg.score(x_test, y_test)

Out[22]:

0.7832699619771863

In [23]:

# 4.2 對max_depth進行調優

depth_range  = range(10)
score = []

for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    
    s = xg.score(x_test, y_test)
    
    print(s)
    score.append(s)

0.6311787072243346
0.7908745247148289
0.7870722433460076
0.7832699619771863
0.7870722433460076
0.7908745247148289
0.7908745247148289
0.7946768060836502
0.7908745247148289
0.7946768060836502

In [25]:

# 4.3 調優結果可視化
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()


5.4 otto案例介紹 -- Otto Group Product Classification Challenge【xgboost實現】

1 背景介紹

奧托集團是世界上最大的電子商務公司之一,在20多個國家設有子公司。該公司每天都在世界各地銷售數百萬種產品,所以對其產品根據性能合理的分類非常重要。

不過,在實際工作中,工作人員發現,許多相同的產品得到了不同的分類。本案例要求,你對奧拓集團的產品進行正確的分分類。儘可能的提供分類的準確性。

鏈接:https://www.kaggle.com/c/otto-group-product-classification-challenge/overview

2 思路分析

  • 1.數據獲取

  • 2.數據基本處理

    • 2.1 截取部分數據
    • 2.2 把標籤紙轉換爲數字
    • 2.3 分割數據(使用StratifiedShuffleSplit)
    • 2.4 數據標準化
    • 2.5 數據pca降維
  • 3.模型訓練

    • 3.1 基本模型訓練
    • 3.2 模型調優
      • 3.2.1 調優參數:
        • n_estimator,
        • max_depth,
        • min_child_weights,
        • subsamples,
        • consample_bytrees,
        • etas
      • 3.2.2 確定最後最優參數

3 部分代碼實現

  • 2.數據基本處理

    • 2.1 截取部分數據

    • 2.2 把標籤紙轉換爲數字

    • 2.3 分割數據(使用StratifiedShuffleSplit)

      # 使用StratifiedShuffleSplit對數據集進行分割
      from sklearn.model_selection import StratifiedShuffleSplit
      
      sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
      for train_index, test_index in sss.split(X_resampled.values, y_resampled):
          print(len(train_index))
          print(len(test_index))
      
          x_train = X_resampled.values[train_index]
          x_val = X_resampled.values[test_index]
      
          y_train = y_resampled[train_index]
          y_val = y_resampled[test_index]
      
      # 分割數據圖形可視化
      import seaborn as sns
      
      sns.countplot(y_val)
      
      plt.show()
      
    • 2.4 數據標準化

      from sklearn.preprocessing import StandardScaler
      
      scaler = StandardScaler()
      scaler.fit(x_train)
      
      x_train_scaled = scaler.transform(x_train)
      x_val_scaled = scaler.transform(x_val)
      
    • 2.5 數據pca降維

      print(x_train_scaled.shape)
      # (13888, 93)
      
      from sklearn.decomposition import PCA
      
      pca = PCA(n_components=0.9)
      x_train_pca = pca.fit_transform(x_train_scaled)
      x_val_pca = pca.transform(x_val_scaled)
      
      print(x_train_pca.shape, x_val_pca.shape)
      (13888, 65) (3473, 65)
      

      從上面輸出的數據可以看出,只選擇65個元素,就可以表達出特徵中90%的信息

      # 降維數據可視化
      plt.plot(np.cumsum(pca.explained_variance_ratio_))
      
      plt.xlabel("元素數量")
      plt.ylabel("可表達信息的百分佔比")
      
      plt.show()

  • 3.模型訓練

    • 3.1 基本模型訓練

      from xgboost import XGBClassifier
      
      xgb = XGBClassifier()
      xgb.fit(x_train_pca, y_train)
      
      # 改變預測值的輸出模式,讓輸出結果爲百分佔比,降低logloss值
      y_pre_proba = xgb.predict_proba(x_val_pca)
      
      # logloss進行模型評估
      from sklearn.metrics import log_loss
      log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)
      
      xgb.get_params
      
  • 3.2 模型調優

    • 3.2.1 調優參數:

      • n_estimator,

        scores_ne = []
        n_estimators = [100,200,400,450,500,550,600,700]
        
        for nes in n_estimators:
            print("n_estimators:", nes)
            xgb = XGBClassifier(max_depth=3, 
                                learning_rate=0.1, 
                                n_estimators=nes, 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_ne.append(score)
            print("測試數據的logloss值爲:{}".format(score))
        
        # 數據變化可視化
        plt.plot(n_estimators, scores_ne, "o-")
        
        plt.ylabel("log_loss")
        plt.xlabel("n_estimators")
        print("n_estimators的最優值爲:{}".format(n_estimators[np.argmin(scores_ne)]))
        

      • max_depth,

        scores_md = []
        max_depths = [1,3,5,6,7]
        
        for md in max_depths:  # 修改
            xgb = XGBClassifier(max_depth=md, # 修改
                                learning_rate=0.1, 
                                n_estimators=n_estimators[np.argmin(scores_ne)],   # 修改 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_md.append(score)  # 修改
            print("測試數據的logloss值爲:{}".format(log_loss(y_val, y_pre)))
        
        # 數據變化可視化
        plt.plot(max_depths, scores_md, "o-")  # 修改
        
        plt.ylabel("log_loss")
        plt.xlabel("max_depths")  # 修改
        print("max_depths的最優值爲:{}".format(max_depths[np.argmin(scores_md)]))  # 修改
        
      • min_child_weights,

        • 依據上面模式進行調整
      • subsamples,

      • consample_bytrees,

      • etas

    • 3.2.2 確定最後最優參數

      xgb = XGBClassifier(learning_rate =0.1, 
                          n_estimators=550, 
                          max_depth=3, 
                          min_child_weight=3, 
                          subsample=0.7, 
                          colsample_bytree=0.7, 
                          nthread=4, 
                          seed=42, 
                          objective='multi:softprob')
      xgb.fit(x_train_scaled, y_train)
      
      y_pre = xgb.predict_proba(x_val_scaled)
      
      print("測試數據的logloss值爲 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
      

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

數據獲取

In [2]:

data = pd.read_csv("./data/otto/train.csv")

In [3]:

data.head()

Out[3]:

  id feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 ... feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93 target
0 1 1 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 Class_1
1 2 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 Class_1
2 3 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 Class_1
3 4 1 0 0 1 6 1 5 0 0 ... 0 1 2 0 0 0 0 0 0 Class_1
4 5 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 1 0 0 0 Class_1

5 rows × 95 columns

In [4]:

data.shape

Out[4]:

(61878, 95)

In [5]:

data.describe()

Out[5]:

  id feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 ... feat_84 feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93
count 61878.000000 61878.00000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 ... 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000
mean 30939.500000 0.38668 0.263066 0.901467 0.779081 0.071043 0.025696 0.193704 0.662433 1.011296 ... 0.070752 0.532306 1.128576 0.393549 0.874915 0.457772 0.812421 0.264941 0.380119 0.126135
std 17862.784315 1.52533 1.252073 2.934818 2.788005 0.438902 0.215333 1.030102 2.255770 3.474822 ... 1.151460 1.900438 2.681554 1.575455 2.115466 1.527385 4.597804 2.045646 0.982385 1.201720
min 1.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 15470.250000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 30939.500000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 46408.750000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 ... 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 61878.000000 61.00000 51.000000 64.000000 70.000000 19.000000 10.000000 38.000000 76.000000 43.000000 ... 76.000000 55.000000 65.000000 67.000000 30.000000 61.000000 130.000000 52.000000 19.000000 87.000000

8 rows × 94 columns

In [6]:

# 圖形可視化,查看數據分佈
import seaborn as sns

sns.countplot(data.target)

plt.show()

由上圖可以看出,該數據類別不均衡,所以需要後期處理

數據基本處理

數據已經經過脫敏,不再需要特殊處理

截取部分數據

In [7]:

new1_data = data[:10000]
new1_data.shape

Out[7]:

(10000, 95)

In [8]:

# 圖形可視化,查看數據分佈
import seaborn as sns

sns.countplot(new1_data.target)

plt.show()

使用上面方式獲取數據不可行,然後使用隨機欠採樣獲取響應的數據

In [9]:

# 隨機欠採樣獲取數據
# 首先需要確定特徵值\標籤值

y = data["target"]
x = data.drop(["id", "target"], axis=1)

In [10]:

x.head()

Out[10]:

  feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 feat_10 ... feat_84 feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93
0 1 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 1 0 0 1 6 1 5 0 0 1 ... 22 0 1 2 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0

5 rows × 93 columns

In [11]:

y.head()

Out[11]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [12]:

# 欠採樣獲取數據
from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(random_state=0)

X_resampled, y_resampled = rus.fit_resample(x, y)

In [13]:

x.shape, y.shape

Out[13]:

((61878, 93), (61878,))

In [14]:

X_resampled.shape, y_resampled.shape

Out[14]:

((17361, 93), (17361,))

In [15]:

# 圖形可視化,查看數據分佈
import seaborn as sns

sns.countplot(y_resampled)

plt.show()

把標籤值轉換爲數字

In [16]:

y_resampled.head()

Out[16]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [17]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_resampled = le.fit_transform(y_resampled)
 

In [18]:

y_resampled

Out[18]:

array([0, 0, 0, ..., 8, 8, 8])

分割數據

In [19]:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)

In [20]:

 

x_train.shape, y_train.shape

Out[20]:

((13888, 93), (13888,))

In [21]:

x_test.shape, y_test.shape

Out[21]:

((3473, 93), (3473,))

In [22]:

# 1.數據獲取

# 2.數據基本處理

    # 2.1 截取部分數據
    # 2.2 把標籤紙轉換爲數字
    # 2.3 分割數據(使用StratifiedShuffleSplit)
    # 2.4 數據標準化
    # 2.5 數據pca降維

# 3.模型訓練
    # 3.1 基本模型訓練
    # 3.2 模型調優
        # 3.2.1 調優參數:
            # n_estimator,
            # max_depth,
            # min_child_weights,
            # subsamples,
            # consample_bytrees,
            # etas
        # 3.2.2 確定最後最優參數
    

In [23]:

# 圖形可視化
import seaborn as sns

sns.countplot(y_test)
plt.show()

In [28]:

# 通過StratifiedShuffleSplit實現數據分割

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

for train_index, test_index in sss.split(X_resampled.values, y_resampled):
    print(len(train_index))
    print(len(test_index))
    
    x_train = X_resampled.values[train_index]
    x_val = X_resampled.values[test_index]
    
    y_train = y_resampled[train_index]
    y_val = y_resampled[test_index]

13888
3473

In [29]:

print(x_train.shape, x_val.shape)

(13888, 93) (3473, 93)

In [30]:

# 圖形可視化
import seaborn as sns

sns.countplot(y_val)
plt.show()

數據標準化

In [31]:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(x_train)

x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)

數據PCA降維

In [33]:

x_train_scaled.shape

Out[33]:

(13888, 93)

In [34]:

from sklearn.decomposition import PCA

pca = PCA(n_components=0.9)

x_train_pca = pca.fit_transform(x_train_scaled)
x_val_pca = pca.transform(x_val_scaled)

In [35]:

print(x_train_pca.shape, x_val_pca.shape)

(13888, 65) (3473, 65)

In [37]:

# 可視化數據降維信息變化程度
plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel("元素數量")
plt.ylabel("表達信息百分佔比")

plt.show()

模型訓練

基本模型訓練

In [38]:

from xgboost import XGBClassifier

xgb = XGBClassifier()
xgb.fit(x_train_pca, y_train)

Out[38]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [39]:

# 輸出預測值,一定輸出帶有百分佔比的預測值
y_pre_proba = xgb.predict_proba(x_val_pca)

In [40]:

y_pre_proba

Out[40]:

array([[0.4893983 , 0.00375719, 0.00225278, ..., 0.06179977, 0.17131925,
        0.03980364],
       [0.14336601, 0.01110009, 0.01018962, ..., 0.00691424, 0.02062171,
        0.7525783 ],
       [0.00834821, 0.14602502, 0.65013766, ..., 0.01385602, 0.00602207,
        0.00240582],
       ...,
       [0.09568001, 0.00293341, 0.00582061, ..., 0.1031019 , 0.7587154 ,
        0.02730099],
       [0.40236628, 0.12317444, 0.03567632, ..., 0.18818544, 0.13276173,
        0.07105519],
       [0.00473167, 0.01536749, 0.02546864, ..., 0.00882399, 0.88531935,
        0.00384397]], dtype=float32)

In [42]:

# logloss評估
from sklearn.metrics import log_loss

log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)

Out[42]:

0.7845457684689274

In [43]:

xgb.get_params

Out[43]:

<bound method XGBModel.get_params of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)>

模型調優

確定最優的estimators

In [44]:

scores_ne = []
n_estimators = [100, 200, 300, 400, 500, 550, 600, 700]

In [49]:

for nes in n_estimators:
    print("n_estimators:", nes)
    xgb = XGBClassifier(max_depth=3,
                        learning_rate=0.1, 
                        n_estimators=nes, 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_ne.append(score)
    
    print("每次測試的logloss值是:{}".format(score))

n_estimators: 100
每次測試的logloss值是:0.7845457684689274
n_estimators: 200
每次測試的logloss值是:0.7163659085830947
n_estimators: 300
每次測試的logloss值是:0.6933389946023942
n_estimators: 400
每次測試的logloss值是:0.68119252278615
n_estimators: 500
每次測試的logloss值是:0.67700775120196
n_estimators: 550
每次測試的logloss值是:0.6756911007299885
n_estimators: 600
每次測試的logloss值是:0.6757532660164814
n_estimators: 700
每次測試的logloss值是:0.6778721089881976

In [50]:

# 圖形化展示相應的logloss值
plt.plot(n_estimators, scores_ne, "o-")

plt.xlabel("n_estimators")
plt.ylabel("log_loss")
plt.show()

print("最優的n_estimators值是:{}".format(n_estimators[np.argmin(scores_ne)]))

最優的n_estimators值是:550

確定最優的max_depth

In [63]:

scores_md = []
max_depths = [1,3,5,6,7]

In [64]:

for md in max_depths:
    print("max_depth:", md)
    xgb = XGBClassifier(max_depth=md,
                        learning_rate=0.1, 
                        n_estimators=n_estimators[np.argmin(scores_ne)], 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_md.append(score)
    
    print("每次測試的logloss值是:{}".format(score))

max_depth: 1
每次測試的logloss值是:0.8186777106711784
max_depth: 3
每次測試的logloss值是:0.6756911007299885
max_depth: 5
每次測試的logloss值是:0.730323661087053
max_depth: 6
每次測試的logloss值是:0.7693314501840949
max_depth: 7
每次測試的logloss值是:0.7889236364892144

In [67]:

# 圖形化展示相應的logloss值
plt.plot(max_depths, scores_md, "o-")

plt.xlabel("max_depths")
plt.ylabel("log_loss")
plt.show()

print("最優的max_depths值是:{}".format(max_depths[np.argmin(scores_md)]))

最優的max_depths值是:3

依據上面模式,運行調試下面參數

min_child_weights,

subsamples,

consample_bytrees,

etas

In [69]:

xgb = XGBClassifier(learning_rate =0.1, 
                    n_estimators=550, 
                    max_depth=3, 
                    min_child_weight=3, 
                    subsample=0.7, 
                    colsample_bytree=0.7, 
                    nthread=4, 
                    seed=42, 
                    objective='multi:softprob')

xgb.fit(x_train_scaled, y_train)

y_pre = xgb.predict_proba(x_val_scaled)

print("測試數據的log_loss值爲 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))

測試數據的log_loss值爲 : 0.5944022517380477

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章