<h1 style="text-align:center">泰坦尼克數據處理與分析 </h1>
![](http://www.allengao.cn/wp-content/uploads/2018/06/Titanic.jpg)
```python
import pandas as pd
%matplotlib inline
```
#### 導入數據
```python
titanic = pd.read_csv('K:/Code/jupyter-notebook/Python Study/train.csv')
```
#### 快速預覽
```python
titanic.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
</tr>
</tbody>
</table>
</div>
|單詞|翻譯|
|---|---|
|Passenger|社會階層(1、精英;2、中層;3、船員/勞苦大衆)|
|Survived|是否倖存|
|name|名字|
|sex|性別|
|age|年齡|
|sibsp|兄弟姐妹配偶個數 sibling spouse|
|parch|父母兒女個數|
|ticket|船票號|
|fare|船票價格|
|cabin|船艙|
|embarked|登船口|
```python
titanic.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
```python
# 把所有數值類型的數據做一個簡單的統計
titanic.describe()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Fare</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>891.000000</td>
<td>891.000000</td>
<td>891.000000</td>
<td>714.000000</td>
<td>891.000000</td>
<td>891.000000</td>
<td>891.000000</td>
</tr>
<tr>
<th>mean</th>
<td>446.000000</td>
<td>0.383838</td>
<td>2.308642</td>
<td>29.699118</td>
<td>0.523008</td>
<td>0.381594</td>
<td>32.204208</td>
</tr>
<tr>
<th>std</th>
<td>257.353842</td>
<td>0.486592</td>
<td>0.836071</td>
<td>14.526497</td>
<td>1.102743</td>
<td>0.806057</td>
<td>49.693429</td>
</tr>
<tr>
<th>min</th>
<td>1.000000</td>
<td>0.000000</td>
<td>1.000000</td>
<td>0.420000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
</tr>
<tr>
<th>25%</th>
<td>223.500000</td>
<td>0.000000</td>
<td>2.000000</td>
<td>20.125000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>7.910400</td>
</tr>
<tr>
<th>50%</th>
<td>446.000000</td>
<td>0.000000</td>
<td>3.000000</td>
<td>28.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>14.454200</td>
</tr>
<tr>
<th>75%</th>
<td>668.500000</td>
<td>1.000000</td>
<td>3.000000</td>
<td>38.000000</td>
<td>1.000000</td>
<td>0.000000</td>
<td>31.000000</td>
</tr>
<tr>
<th>max</th>
<td>891.000000</td>
<td>1.000000</td>
<td>3.000000</td>
<td>80.000000</td>
<td>8.000000</td>
<td>6.000000</td>
<td>512.329200</td>
</tr>
</tbody>
</table>
</div>
```python
# isnull函數統計null值的個數
titanic.isnull().sum()
```
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
#### 處理空值
```python
# 可以填充整個dataframe裏面的空值,可以取消註釋,試驗一下
#titanic.fillna(0)
# 單獨選擇一列進行填充
#titanic.Age.fillna(0)
# 求年齡的中位數
titanic.Age.median()
#按年齡的中位數進行填充,此時返回一個新的series
# titanic.Age.fillna(titanic.Age.median())
#直接填充,並不返回新的series
titanic.Age.fillna(titanic.Age.median(),inplace=True)
# 在次查看Age的空值
titanic.isnull().sum()
```
### 嘗試從性別進行分析
```python
# 做簡單的彙總統計,經常用到
titanic.Sex.value_counts()
```
male 577
female 314
Name: Sex, dtype: int64
```python
# 生還者中,男女的人數
survived = titanic[titanic.Survived==1].Sex.value_counts()
```
```python
# 未生還者中,男女的人數
dead = titanic[titanic.Survived==0].Sex.value_counts()
```
```python
df = pd.DataFrame([survived,dead],index=['survived','dead'])
df.plot.bar()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496afd27f0>
![png](output_17_1.png)
```python
# 繪圖成功,但不是想要的效果
# 把dataframe轉置一下,行列相互替換
df = df.T
df
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>survived</th>
<th>dead</th>
</tr>
</thead>
<tbody>
<tr>
<th>female</th>
<td>233</td>
<td>81</td>
</tr>
<tr>
<th>male</th>
<td>109</td>
<td>468</td>
</tr>
</tbody>
</table>
</div>
```python
df.plot.bar() # df.plot(kind='bar')等價的
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d1d7940>
![png](output_19_1.png)
```python
# 仍然不是我們想要的結果
df.plot(kind = 'bar',stacked = True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d22aef0>
![png](output_20_1.png)
```python
# 男女中生還者的比例情況
df['p_survived'] = df.survived / (df.survived + df.dead)
df['p_dead'] = df.dead / (df.survived + df.dead)
df[['p_survived','p_dead']].plot.bar(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d2b7470>
![png](output_21_1.png)
#### 通過上面圖片可以看出:性別特徵對是否生還的影響還是挺大的
### 嘗試從年齡進行分析
```python
# 簡單統計
# titanic.Age.value_counts()
```
```python
survived = titanic[titanic.Survived==1].Age
dead = titanic[titanic.Survived==0].Age
df =pd.DataFrame([survived,dead],index=['survived','dead'])
df = df.T
df.plot.hist(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d3c4be0>
![png](output_25_1.png)
```python
# 直方圖柱子顯示多一點
df.plot.hist(stacked = True,bins = 30)
# 中間很高的柱子,是因爲我們把空值都替換爲了中位數
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e42f588>
![png](output_26_1.png)
```python
# 密度圖,更直觀一點
df.plot.kde()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e4c7dd8>
![png](output_27_1.png)
```python
# 可以查看年齡的分佈,來決定圖片橫軸的取值範圍
titanic.Age.describe()
```
count 891.000000
mean 29.361582
std 13.019697
min 0.420000
25% 22.000000
50% 28.000000
75% 35.000000
max 80.000000
Name: Age, dtype: float64
```python
# 限定範圍
df.plot.kde(xlim=(0,80))
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e511c18>
![png](output_29_1.png)
```python
age = 16
young = titanic[titanic.Age<=age]['Survived'].value_counts()
old = titanic[titanic.Age>age]['Survived'].value_counts()
df = pd.DataFrame([young,old],index = ['young','old'])
df.columns = ['dead','survived']
df.plot.bar(stacked = True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f3a3b70>
![png](output_30_1.png)
```python
# 大於16歲和小於等於16歲中生還者的比例情況
df['p_survived'] = df.survived / (df.survived + df.dead)
df['p_dead'] = df.dead / (df.survived + df.dead)
df[['p_survived','p_dead']].plot.bar(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f407c50>
![png](output_31_1.png)
### 分析票價
```python
# 票價和年齡特徵相似
survived = titanic[titanic.Survived==1].Fare
dead = titanic[titanic.Survived==0].Fare
df = pd.DataFrame([survived,dead],index = ['survived','dead'])
df = df.T
df.plot.kde()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f47b978>
![png](output_33_1.png)
```python
# 設定xlim範圍,先查看票價的範圍
titanic.Fare.describe()
```
count 891.000000
mean 32.204208
std 49.693429
min 0.000000
25% 7.910400
50% 14.454200
75% 31.000000
max 512.329200
Name: Fare, dtype: float64
```python
df.plot(kind = 'kde',xlim = (0,513))
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f45bba8>
![png](output_35_1.png)
#### 可以看出低票價的人生還率比較低
### 組合特徵
```python
# 比如同時查看年齡和票價對生還率的影響
import matplotlib.pyplot as plt
plt.scatter(titanic[titanic.Survived==0].Age, titanic[titanic.Survived==0].Fare)
```
<matplotlib.collections.PathCollection at 0x1496f597a58>
![png](output_38_1.png)
```python
# 不美觀
ax = plt.subplot()
# 未生還者
age = titanic[titanic.Survived==0].Age
fare = titanic[titanic.Survived==0].Fare
plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='gray')
#生還者
age = titanic[titanic.Survived==1].Age
fare = titanic[titanic.Survived==1].Fare
plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='red')
ax.set_xlabel('age')
ax.set_ylabel('fare')
```
Text(0,0.5,'fare')
![png](output_39_1.png)
```python
# 生還者
ax = plt.subplot()
age = titanic[titanic.Survived==1].Age
fare = titanic[titanic.Survived==1].Fare
plt.scatter(age, fare,s=20,alpha=0.5,linewidths=1,edgecolors='red')
ax.set_xlabel('age')
ax.set_ylabel('fare')
```
Text(0,0.5,'fare')
![png](output_40_1.png)
### 隱含特徵
```python
#提取稱呼Mr Mrs Miss
titanic.Name
```
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
5 Moran, Mr. James
6 McCarthy, Mr. Timothy J
7 Palsson, Master. Gosta Leonard
8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9 Nasser, Mrs. Nicholas (Adele Achem)
10 Sandstrom, Miss. Marguerite Rut
11 Bonnell, Miss. Elizabeth
12 Saundercock, Mr. William Henry
13 Andersson, Mr. Anders Johan
14 Vestrom, Miss. Hulda Amanda Adolfina
15 Hewlett, Mrs. (Mary D Kingcome)
16 Rice, Master. Eugene
17 Williams, Mr. Charles Eugene
18 Vander Planke, Mrs. Julius (Emelia Maria Vande...
19 Masselmani, Mrs. Fatima
20 Fynney, Mr. Joseph J
21 Beesley, Mr. Lawrence
22 McGowan, Miss. Anna "Annie"
23 Sloper, Mr. William Thompson
24 Palsson, Miss. Torborg Danira
25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
26 Emir, Mr. Farred Chehab
27 Fortune, Mr. Charles Alexander
28 O'Dwyer, Miss. Ellen "Nellie"
29 Todoroff, Mr. Lalio
...
861 Giles, Mr. Frederick Edward
862 Swift, Mrs. Frederick Joel (Margaret Welles Ba...
863 Sage, Miss. Dorothy Edith "Dolly"
864 Gill, Mr. John William
865 Bystrom, Mrs. (Karolina)
866 Duran y More, Miss. Asuncion
867 Roebling, Mr. Washington Augustus II
868 van Melkebeke, Mr. Philemon
869 Johnson, Master. Harold Theodor
870 Balkic, Mr. Cerin
871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
872 Carlsson, Mr. Frans Olof
873 Vander Cruyssen, Mr. Victor
874 Abelson, Mrs. Samuel (Hannah Wizosky)
875 Najib, Miss. Adele Kiamie "Jane"
876 Gustafsson, Mr. Alfred Ossian
877 Petroff, Mr. Nedelio
878 Laleff, Mr. Kristo
879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
880 Shelley, Mrs. William (Imanita Parrish Hall)
881 Markun, Mr. Johann
882 Dahlberg, Miss. Gerda Ulrika
883 Banfield, Mr. Frederick James
884 Sutehall, Mr. Henry Jr
885 Rice, Mrs. William (Margaret Norton)
886 Montvila, Rev. Juozas
887 Graham, Miss. Margaret Edith
888 Johnston, Miss. Catherine Helen "Carrie"
889 Behr, Mr. Karl Howell
890 Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object
```python
titanic['title'] = titanic.Name.apply(lambda name: name.split(',')[1].split('.')[0].strip())
```
```python
s= 'Williams, Mr.Howard Hugh "harry"'
s.split(',')[-1].split('.')[0].strip()
```
'Mr'
```python
titanic.title.value_counts()
# 比如有一個人稱呼是Mr,而年齡是不可知的,這個時候可以用所有Mr的年齡平均值來替代,
# 而不是用我們之前最簡單的所有數據的中位數。
```
Mr 517
Miss 182
Mrs 125
Master 40
Dr 7
Rev 6
Mlle 2
Major 2
Col 2
Capt 1
Ms 1
Mme 1
Jonkheer 1
the Countess 1
Don 1
Lady 1
Sir 1
Name: title, dtype: int64
### GDP
```python
### 夜光圖,簡單用燈光圖的亮度來模擬這個GDP
```
```python
titanic.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
<td>Mrs</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
<td>Mrs</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
</tr>
</tbody>
</table>
</div>
```python
titanic['family_size'] = titanic.SibSp + titanic.Parch + 1
```
```python
titanic
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
<th>title</th>
<th>family_size</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>2</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>5</th>
<td>6</td>
<td>0</td>
<td>3</td>
<td>Moran, Mr. James</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>330877</td>
<td>8.4583</td>
<td>NaN</td>
<td>Q</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>6</th>
<td>7</td>
<td>0</td>
<td>1</td>
<td>McCarthy, Mr. Timothy J</td>
<td>male</td>
<td>54.0</td>
<td>0</td>
<td>0</td>
<td>17463</td>
<td>51.8625</td>
<td>E46</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>7</th>
<td>8</td>
<td>0</td>
<td>3</td>
<td>Palsson, Master. Gosta Leonard</td>
<td>male</td>
<td>2.0</td>
<td>3</td>
<td>1</td>
<td>349909</td>
<td>21.0750</td>
<td>NaN</td>
<td>S</td>
<td>Master</td>
<td>5</td>
</tr>
<tr>
<th>8</th>
<td>9</td>
<td>1</td>
<td>3</td>
<td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td>
<td>female</td>
<td>27.0</td>
<td>0</td>
<td>2</td>
<td>347742</td>
<td>11.1333</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>3</td>
</tr>
<tr>
<th>9</th>
<td>10</td>
<td>1</td>
<td>2</td>
<td>Nasser, Mrs. Nicholas (Adele Achem)</td>
<td>female</td>
<td>14.0</td>
<td>1</td>
<td>0</td>
<td>237736</td>
<td>30.0708</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>10</th>
<td>11</td>
<td>1</td>
<td>3</td>
<td>Sandstrom, Miss. Marguerite Rut</td>
<td>female</td>
<td>4.0</td>
<td>1</td>
<td>1</td>
<td>PP 9549</td>
<td>16.7000</td>
<td>G6</td>
<td>S</td>
<td>Miss</td>
<td>3</td>
</tr>
<tr>
<th>11</th>
<td>12</td>
<td>1</td>
<td>1</td>
<td>Bonnell, Miss. Elizabeth</td>
<td>female</td>
<td>58.0</td>
<td>0</td>
<td>0</td>
<td>113783</td>
<td>26.5500</td>
<td>C103</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>12</th>
<td>13</td>
<td>0</td>
<td>3</td>
<td>Saundercock, Mr. William Henry</td>
<td>male</td>
<td>20.0</td>
<td>0</td>
<td>0</td>
<td>A/5. 2151</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>13</th>
<td>14</td>
<td>0</td>
<td>3</td>
<td>Andersson, Mr. Anders Johan</td>
<td>male</td>
<td>39.0</td>
<td>1</td>
<td>5</td>
<td>347082</td>
<td>31.2750</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>7</td>
</tr>
<tr>
<th>14</th>
<td>15</td>
<td>0</td>
<td>3</td>
<td>Vestrom, Miss. Hulda Amanda Adolfina</td>
<td>female</td>
<td>14.0</td>
<td>0</td>
<td>0</td>
<td>350406</td>
<td>7.8542</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>15</th>
<td>16</td>
<td>1</td>
<td>2</td>
<td>Hewlett, Mrs. (Mary D Kingcome)</td>
<td>female</td>
<td>55.0</td>
<td>0</td>
<td>0</td>
<td>248706</td>
<td>16.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>16</th>
<td>17</td>
<td>0</td>
<td>3</td>
<td>Rice, Master. Eugene</td>
<td>male</td>
<td>2.0</td>
<td>4</td>
<td>1</td>
<td>382652</td>
<td>29.1250</td>
<td>NaN</td>
<td>Q</td>
<td>Master</td>
<td>6</td>
</tr>
<tr>
<th>17</th>
<td>18</td>
<td>1</td>
<td>2</td>
<td>Williams, Mr. Charles Eugene</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>244373</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>18</th>
<td>19</td>
<td>0</td>
<td>3</td>
<td>Vander Planke, Mrs. Julius (Emelia Maria Vande...</td>
<td>female</td>
<td>31.0</td>
<td>1</td>
<td>0</td>
<td>345763</td>
<td>18.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>19</th>
<td>20</td>
<td>1</td>
<td>3</td>
<td>Masselmani, Mrs. Fatima</td>
<td>female</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>2649</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>20</th>
<td>21</td>
<td>0</td>
<td>2</td>
<td>Fynney, Mr. Joseph J</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>239865</td>
<td>26.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>21</th>
<td>22</td>
<td>1</td>
<td>2</td>
<td>Beesley, Mr. Lawrence</td>
<td>male</td>
<td>34.0</td>
<td>0</td>
<td>0</td>
<td>248698</td>
<td>13.0000</td>
<td>D56</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>22</th>
<td>23</td>
<td>1</td>
<td>3</td>
<td>McGowan, Miss. Anna "Annie"</td>
<td>female</td>
<td>15.0</td>
<td>0</td>
<td>0</td>
<td>330923</td>
<td>8.0292</td>
<td>NaN</td>
<td>Q</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>23</th>
<td>24</td>
<td>1</td>
<td>1</td>
<td>Sloper, Mr. William Thompson</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>113788</td>
<td>35.5000</td>
<td>A6</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>24</th>
<td>25</td>
<td>0</td>
<td>3</td>
<td>Palsson, Miss. Torborg Danira</td>
<td>female</td>
<td>8.0</td>
<td>3</td>
<td>1</td>
<td>349909</td>
<td>21.0750</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>5</td>
</tr>
<tr>
<th>25</th>
<td>26</td>
<td>1</td>
<td>3</td>
<td>Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>5</td>
<td>347077</td>
<td>31.3875</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>7</td>
</tr>
<tr>
<th>26</th>
<td>27</td>
<td>0</td>
<td>3</td>
<td>Emir, Mr. Farred Chehab</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>2631</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>27</th>
<td>28</td>
<td>0</td>
<td>1</td>
<td>Fortune, Mr. Charles Alexander</td>
<td>male</td>
<td>19.0</td>
<td>3</td>
<td>2</td>
<td>19950</td>
<td>263.0000</td>
<td>C23 C25 C27</td>
<td>S</td>
<td>Mr</td>
<td>6</td>
</tr>
<tr>
<th>28</th>
<td>29</td>
<td>1</td>
<td>3</td>
<td>O'Dwyer, Miss. Ellen "Nellie"</td>
<td>female</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>330959</td>
<td>7.8792</td>
<td>NaN</td>
<td>Q</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>29</th>
<td>30</td>
<td>0</td>
<td>3</td>
<td>Todoroff, Mr. Lalio</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>349216</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>861</th>
<td>862</td>
<td>0</td>
<td>2</td>
<td>Giles, Mr. Frederick Edward</td>
<td>male</td>
<td>21.0</td>
<td>1</td>
<td>0</td>
<td>28134</td>
<td>11.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>2</td>
</tr>
<tr>
<th>862</th>
<td>863</td>
<td>1</td>
<td>1</td>
<td>Swift, Mrs. Frederick Joel (Margaret Welles Ba...</td>
<td>female</td>
<td>48.0</td>
<td>0</td>
<td>0</td>
<td>17466</td>
<td>25.9292</td>
<td>D17</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>863</th>
<td>864</td>
<td>0</td>
<td>3</td>
<td>Sage, Miss. Dorothy Edith "Dolly"</td>
<td>female</td>
<td>28.0</td>
<td>8</td>
<td>2</td>
<td>CA. 2343</td>
<td>69.5500</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>11</td>
</tr>
<tr>
<th>864</th>
<td>865</td>
<td>0</td>
<td>2</td>
<td>Gill, Mr. John William</td>
<td>male</td>
<td>24.0</td>
<td>0</td>
<td>0</td>
<td>233866</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>865</th>
<td>866</td>
<td>1</td>
<td>2</td>
<td>Bystrom, Mrs. (Karolina)</td>
<td>female</td>
<td>42.0</td>
<td>0</td>
<td>0</td>
<td>236852</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>866</th>
<td>867</td>
<td>1</td>
<td>2</td>
<td>Duran y More, Miss. Asuncion</td>
<td>female</td>
<td>27.0</td>
<td>1</td>
<td>0</td>
<td>SC/PARIS 2149</td>
<td>13.8583</td>
<td>NaN</td>
<td>C</td>
<td>Miss</td>
<td>2</td>
</tr>
<tr>
<th>867</th>
<td>868</td>
<td>0</td>
<td>1</td>
<td>Roebling, Mr. Washington Augustus II</td>
<td>male</td>
<td>31.0</td>
<td>0</td>
<td>0</td>
<td>PC 17590</td>
<td>50.4958</td>
<td>A24</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>868</th>
<td>869</td>
<td>0</td>
<td>3</td>
<td>van Melkebeke, Mr. Philemon</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>345777</td>
<td>9.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>869</th>
<td>870</td>
<td>1</td>
<td>3</td>
<td>Johnson, Master. Harold Theodor</td>
<td>male</td>
<td>4.0</td>
<td>1</td>
<td>1</td>
<td>347742</td>
<td>11.1333</td>
<td>NaN</td>
<td>S</td>
<td>Master</td>
<td>3</td>
</tr>
<tr>
<th>870</th>
<td>871</td>
<td>0</td>
<td>3</td>
<td>Balkic, Mr. Cerin</td>
<td>male</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>349248</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>871</th>
<td>872</td>
<td>1</td>
<td>1</td>
<td>Beckwith, Mrs. Richard Leonard (Sallie Monypeny)</td>
<td>female</td>
<td>47.0</td>
<td>1</td>
<td>1</td>
<td>11751</td>
<td>52.5542</td>
<td>D35</td>
<td>S</td>
<td>Mrs</td>
<td>3</td>
</tr>
<tr>
<th>872</th>
<td>873</td>
<td>0</td>
<td>1</td>
<td>Carlsson, Mr. Frans Olof</td>
<td>male</td>
<td>33.0</td>
<td>0</td>
<td>0</td>
<td>695</td>
<td>5.0000</td>
<td>B51 B53 B55</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>873</th>
<td>874</td>
<td>0</td>
<td>3</td>
<td>Vander Cruyssen, Mr. Victor</td>
<td>male</td>
<td>47.0</td>
<td>0</td>
<td>0</td>
<td>345765</td>
<td>9.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>874</th>
<td>875</td>
<td>1</td>
<td>2</td>
<td>Abelson, Mrs. Samuel (Hannah Wizosky)</td>
<td>female</td>
<td>28.0</td>
<td>1</td>
<td>0</td>
<td>P/PP 3381</td>
<td>24.0000</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>875</th>
<td>876</td>
<td>1</td>
<td>3</td>
<td>Najib, Miss. Adele Kiamie "Jane"</td>
<td>female</td>
<td>15.0</td>
<td>0</td>
<td>0</td>
<td>2667</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>876</th>
<td>877</td>
<td>0</td>
<td>3</td>
<td>Gustafsson, Mr. Alfred Ossian</td>
<td>male</td>
<td>20.0</td>
<td>0</td>
<td>0</td>
<td>7534</td>
<td>9.8458</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>877</th>
<td>878</td>
<td>0</td>
<td>3</td>
<td>Petroff, Mr. Nedelio</td>
<td>male</td>
<td>19.0</td>
<td>0</td>
<td>0</td>
<td>349212</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>878</th>
<td>879</td>
<td>0</td>
<td>3</td>
<td>Laleff, Mr. Kristo</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>349217</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>879</th>
<td>880</td>
<td>1</td>
<td>1</td>
<td>Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)</td>
<td>female</td>
<td>56.0</td>
<td>0</td>
<td>1</td>
<td>11767</td>
<td>83.1583</td>
<td>C50</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>880</th>
<td>881</td>
<td>1</td>
<td>2</td>
<td>Shelley, Mrs. William (Imanita Parrish Hall)</td>
<td>female</td>
<td>25.0</td>
<td>0</td>
<td>1</td>
<td>230433</td>
<td>26.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>881</th>
<td>882</td>
<td>0</td>
<td>3</td>
<td>Markun, Mr. Johann</td>
<td>male</td>
<td>33.0</td>
<td>0</td>
<td>0</td>
<td>349257</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>882</th>
<td>883</td>
<td>0</td>
<td>3</td>
<td>Dahlberg, Miss. Gerda Ulrika</td>
<td>female</td>
<td>22.0</td>
<td>0</td>
<td>0</td>
<td>7552</td>
<td>10.5167</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>883</th>
<td>884</td>
<td>0</td>
<td>2</td>
<td>Banfield, Mr. Frederick James</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>C.A./SOTON 34068</td>
<td>10.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>884</th>
<td>885</td>
<td>0</td>
<td>3</td>
<td>Sutehall, Mr. Henry Jr</td>
<td>male</td>
<td>25.0</td>
<td>0</td>
<td>0</td>
<td>SOTON/OQ 392076</td>
<td>7.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>885</th>
<td>886</td>
<td>0</td>
<td>3</td>
<td>Rice, Mrs. William (Margaret Norton)</td>
<td>female</td>
<td>39.0</td>
<td>0</td>
<td>5</td>
<td>382652</td>
<td>29.1250</td>
<td>NaN</td>
<td>Q</td>
<td>Mrs</td>
<td>6</td>
</tr>
<tr>
<th>886</th>
<td>887</td>
<td>0</td>
<td>2</td>
<td>Montvila, Rev. Juozas</td>
<td>male</td>
<td>27.0</td>
<td>0</td>
<td>0</td>
<td>211536</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Rev</td>
<td>1</td>
</tr>
<tr>
<th>887</th>
<td>888</td>
<td>1</td>
<td>1</td>
<td>Graham, Miss. Margaret Edith</td>
<td>female</td>
<td>19.0</td>
<td>0</td>
<td>0</td>
<td>112053</td>
<td>30.0000</td>
<td>B42</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>888</th>
<td>889</td>
<td>0</td>
<td>3</td>
<td>Johnston, Miss. Catherine Helen "Carrie"</td>
<td>female</td>
<td>28.0</td>
<td>1</td>
<td>2</td>
<td>W./C. 6607</td>
<td>23.4500</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>4</td>
</tr>
<tr>
<th>889</th>
<td>890</td>
<td>1</td>
<td>1</td>
<td>Behr, Mr. Karl Howell</td>
<td>male</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>111369</td>
<td>30.0000</td>
<td>C148</td>
<td>C</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>890</th>
<td>891</td>
<td>0</td>
<td>3</td>
<td>Dooley, Mr. Patrick</td>
<td>male</td>
<td>32.0</td>
<td>0</td>
<td>0</td>
<td>370376</td>
<td>7.7500</td>
<td>NaN</td>
<td>Q</td>
<td>Mr</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>891 rows × 14 columns</p>
</div>
```python
titanic.family_size.value_counts()
```
1 537
2 161
3 102
4 29
6 22
5 15
7 12
11 7
8 6
Name: family_size, dtype: int64
```python
def func(family_size):
if family_size == 1:
return 'Singleton'
if family_size <= 4 and family_size >= 2:
return 'SmallFamily'
if family_size > 4:
return 'LargeFamily'
titanic['family_type'] = titanic.family_size.apply(func)
```
```python
titanic.family_type.value_counts()
```
Singleton 537
SmallFamily 292
LargeFamily 62
Name: family_type, dtype: int64
Python編程入門學習筆記(十)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.