<h1 style="text-align:center">泰坦尼克数据处理与分析 </h1>
![](http://www.allengao.cn/wp-content/uploads/2018/06/Titanic.jpg)
```python
import pandas as pd
%matplotlib inline
```
#### 导入数据
```python
titanic = pd.read_csv('K:/Code/jupyter-notebook/Python Study/train.csv')
```
#### 快速预览
```python
titanic.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
</tr>
</tbody>
</table>
</div>
|单词|翻译|
|---|---|
|Passenger|社会阶层(1、精英;2、中层;3、船员/劳苦大众)|
|Survived|是否幸存|
|name|名字|
|sex|性别|
|age|年龄|
|sibsp|兄弟姐妹配偶个数 sibling spouse|
|parch|父母儿女个数|
|ticket|船票号|
|fare|船票价格|
|cabin|船舱|
|embarked|登船口|
```python
titanic.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
```python
# 把所有数值类型的数据做一个简单的统计
titanic.describe()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Fare</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>891.000000</td>
<td>891.000000</td>
<td>891.000000</td>
<td>714.000000</td>
<td>891.000000</td>
<td>891.000000</td>
<td>891.000000</td>
</tr>
<tr>
<th>mean</th>
<td>446.000000</td>
<td>0.383838</td>
<td>2.308642</td>
<td>29.699118</td>
<td>0.523008</td>
<td>0.381594</td>
<td>32.204208</td>
</tr>
<tr>
<th>std</th>
<td>257.353842</td>
<td>0.486592</td>
<td>0.836071</td>
<td>14.526497</td>
<td>1.102743</td>
<td>0.806057</td>
<td>49.693429</td>
</tr>
<tr>
<th>min</th>
<td>1.000000</td>
<td>0.000000</td>
<td>1.000000</td>
<td>0.420000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
</tr>
<tr>
<th>25%</th>
<td>223.500000</td>
<td>0.000000</td>
<td>2.000000</td>
<td>20.125000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>7.910400</td>
</tr>
<tr>
<th>50%</th>
<td>446.000000</td>
<td>0.000000</td>
<td>3.000000</td>
<td>28.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>14.454200</td>
</tr>
<tr>
<th>75%</th>
<td>668.500000</td>
<td>1.000000</td>
<td>3.000000</td>
<td>38.000000</td>
<td>1.000000</td>
<td>0.000000</td>
<td>31.000000</td>
</tr>
<tr>
<th>max</th>
<td>891.000000</td>
<td>1.000000</td>
<td>3.000000</td>
<td>80.000000</td>
<td>8.000000</td>
<td>6.000000</td>
<td>512.329200</td>
</tr>
</tbody>
</table>
</div>
```python
# isnull函数统计null值的个数
titanic.isnull().sum()
```
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
#### 处理空值
```python
# 可以填充整个dataframe里面的空值,可以取消注释,试验一下
#titanic.fillna(0)
# 单独选择一列进行填充
#titanic.Age.fillna(0)
# 求年龄的中位数
titanic.Age.median()
#按年龄的中位数进行填充,此时返回一个新的series
# titanic.Age.fillna(titanic.Age.median())
#直接填充,并不返回新的series
titanic.Age.fillna(titanic.Age.median(),inplace=True)
# 在次查看Age的空值
titanic.isnull().sum()
```
### 尝试从性别进行分析
```python
# 做简单的汇总统计,经常用到
titanic.Sex.value_counts()
```
male 577
female 314
Name: Sex, dtype: int64
```python
# 生还者中,男女的人数
survived = titanic[titanic.Survived==1].Sex.value_counts()
```
```python
# 未生还者中,男女的人数
dead = titanic[titanic.Survived==0].Sex.value_counts()
```
```python
df = pd.DataFrame([survived,dead],index=['survived','dead'])
df.plot.bar()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496afd27f0>
![png](output_17_1.png)
```python
# 绘图成功,但不是想要的效果
# 把dataframe转置一下,行列相互替换
df = df.T
df
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>survived</th>
<th>dead</th>
</tr>
</thead>
<tbody>
<tr>
<th>female</th>
<td>233</td>
<td>81</td>
</tr>
<tr>
<th>male</th>
<td>109</td>
<td>468</td>
</tr>
</tbody>
</table>
</div>
```python
df.plot.bar() # df.plot(kind='bar')等价的
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d1d7940>
![png](output_19_1.png)
```python
# 仍然不是我们想要的结果
df.plot(kind = 'bar',stacked = True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d22aef0>
![png](output_20_1.png)
```python
# 男女中生还者的比例情况
df['p_survived'] = df.survived / (df.survived + df.dead)
df['p_dead'] = df.dead / (df.survived + df.dead)
df[['p_survived','p_dead']].plot.bar(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d2b7470>
![png](output_21_1.png)
#### 通过上面图片可以看出:性别特征对是否生还的影响还是挺大的
### 尝试从年龄进行分析
```python
# 简单统计
# titanic.Age.value_counts()
```
```python
survived = titanic[titanic.Survived==1].Age
dead = titanic[titanic.Survived==0].Age
df =pd.DataFrame([survived,dead],index=['survived','dead'])
df = df.T
df.plot.hist(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496d3c4be0>
![png](output_25_1.png)
```python
# 直方图柱子显示多一点
df.plot.hist(stacked = True,bins = 30)
# 中间很高的柱子,是因为我们把空值都替换为了中位数
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e42f588>
![png](output_26_1.png)
```python
# 密度图,更直观一点
df.plot.kde()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e4c7dd8>
![png](output_27_1.png)
```python
# 可以查看年龄的分布,来决定图片横轴的取值范围
titanic.Age.describe()
```
count 891.000000
mean 29.361582
std 13.019697
min 0.420000
25% 22.000000
50% 28.000000
75% 35.000000
max 80.000000
Name: Age, dtype: float64
```python
# 限定范围
df.plot.kde(xlim=(0,80))
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496e511c18>
![png](output_29_1.png)
```python
age = 16
young = titanic[titanic.Age<=age]['Survived'].value_counts()
old = titanic[titanic.Age>age]['Survived'].value_counts()
df = pd.DataFrame([young,old],index = ['young','old'])
df.columns = ['dead','survived']
df.plot.bar(stacked = True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f3a3b70>
![png](output_30_1.png)
```python
# 大于16岁和小于等于16岁中生还者的比例情况
df['p_survived'] = df.survived / (df.survived + df.dead)
df['p_dead'] = df.dead / (df.survived + df.dead)
df[['p_survived','p_dead']].plot.bar(stacked=True)
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f407c50>
![png](output_31_1.png)
### 分析票价
```python
# 票价和年龄特征相似
survived = titanic[titanic.Survived==1].Fare
dead = titanic[titanic.Survived==0].Fare
df = pd.DataFrame([survived,dead],index = ['survived','dead'])
df = df.T
df.plot.kde()
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f47b978>
![png](output_33_1.png)
```python
# 设定xlim范围,先查看票价的范围
titanic.Fare.describe()
```
count 891.000000
mean 32.204208
std 49.693429
min 0.000000
25% 7.910400
50% 14.454200
75% 31.000000
max 512.329200
Name: Fare, dtype: float64
```python
df.plot(kind = 'kde',xlim = (0,513))
```
<matplotlib.axes._subplots.AxesSubplot at 0x1496f45bba8>
![png](output_35_1.png)
#### 可以看出低票价的人生还率比较低
### 组合特征
```python
# 比如同时查看年龄和票价对生还率的影响
import matplotlib.pyplot as plt
plt.scatter(titanic[titanic.Survived==0].Age, titanic[titanic.Survived==0].Fare)
```
<matplotlib.collections.PathCollection at 0x1496f597a58>
![png](output_38_1.png)
```python
# 不美观
ax = plt.subplot()
# 未生还者
age = titanic[titanic.Survived==0].Age
fare = titanic[titanic.Survived==0].Fare
plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='gray')
#生还者
age = titanic[titanic.Survived==1].Age
fare = titanic[titanic.Survived==1].Fare
plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='red')
ax.set_xlabel('age')
ax.set_ylabel('fare')
```
Text(0,0.5,'fare')
![png](output_39_1.png)
```python
# 生还者
ax = plt.subplot()
age = titanic[titanic.Survived==1].Age
fare = titanic[titanic.Survived==1].Fare
plt.scatter(age, fare,s=20,alpha=0.5,linewidths=1,edgecolors='red')
ax.set_xlabel('age')
ax.set_ylabel('fare')
```
Text(0,0.5,'fare')
![png](output_40_1.png)
### 隐含特征
```python
#提取称呼Mr Mrs Miss
titanic.Name
```
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
5 Moran, Mr. James
6 McCarthy, Mr. Timothy J
7 Palsson, Master. Gosta Leonard
8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9 Nasser, Mrs. Nicholas (Adele Achem)
10 Sandstrom, Miss. Marguerite Rut
11 Bonnell, Miss. Elizabeth
12 Saundercock, Mr. William Henry
13 Andersson, Mr. Anders Johan
14 Vestrom, Miss. Hulda Amanda Adolfina
15 Hewlett, Mrs. (Mary D Kingcome)
16 Rice, Master. Eugene
17 Williams, Mr. Charles Eugene
18 Vander Planke, Mrs. Julius (Emelia Maria Vande...
19 Masselmani, Mrs. Fatima
20 Fynney, Mr. Joseph J
21 Beesley, Mr. Lawrence
22 McGowan, Miss. Anna "Annie"
23 Sloper, Mr. William Thompson
24 Palsson, Miss. Torborg Danira
25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
26 Emir, Mr. Farred Chehab
27 Fortune, Mr. Charles Alexander
28 O'Dwyer, Miss. Ellen "Nellie"
29 Todoroff, Mr. Lalio
...
861 Giles, Mr. Frederick Edward
862 Swift, Mrs. Frederick Joel (Margaret Welles Ba...
863 Sage, Miss. Dorothy Edith "Dolly"
864 Gill, Mr. John William
865 Bystrom, Mrs. (Karolina)
866 Duran y More, Miss. Asuncion
867 Roebling, Mr. Washington Augustus II
868 van Melkebeke, Mr. Philemon
869 Johnson, Master. Harold Theodor
870 Balkic, Mr. Cerin
871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
872 Carlsson, Mr. Frans Olof
873 Vander Cruyssen, Mr. Victor
874 Abelson, Mrs. Samuel (Hannah Wizosky)
875 Najib, Miss. Adele Kiamie "Jane"
876 Gustafsson, Mr. Alfred Ossian
877 Petroff, Mr. Nedelio
878 Laleff, Mr. Kristo
879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
880 Shelley, Mrs. William (Imanita Parrish Hall)
881 Markun, Mr. Johann
882 Dahlberg, Miss. Gerda Ulrika
883 Banfield, Mr. Frederick James
884 Sutehall, Mr. Henry Jr
885 Rice, Mrs. William (Margaret Norton)
886 Montvila, Rev. Juozas
887 Graham, Miss. Margaret Edith
888 Johnston, Miss. Catherine Helen "Carrie"
889 Behr, Mr. Karl Howell
890 Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object
```python
titanic['title'] = titanic.Name.apply(lambda name: name.split(',')[1].split('.')[0].strip())
```
```python
s= 'Williams, Mr.Howard Hugh "harry"'
s.split(',')[-1].split('.')[0].strip()
```
'Mr'
```python
titanic.title.value_counts()
# 比如有一个人称呼是Mr,而年龄是不可知的,这个时候可以用所有Mr的年龄平均值来替代,
# 而不是用我们之前最简单的所有数据的中位数。
```
Mr 517
Miss 182
Mrs 125
Master 40
Dr 7
Rev 6
Mlle 2
Major 2
Col 2
Capt 1
Ms 1
Mme 1
Jonkheer 1
the Countess 1
Don 1
Lady 1
Sir 1
Name: title, dtype: int64
### GDP
```python
### 夜光图,简单用灯光图的亮度来模拟这个GDP
```
```python
titanic.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
<td>Mrs</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
<td>Mrs</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
</tr>
</tbody>
</table>
</div>
```python
titanic['family_size'] = titanic.SibSp + titanic.Parch + 1
```
```python
titanic
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
<th>title</th>
<th>family_size</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>2</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>1</td>
<td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>
<td>female</td>
<td>35.0</td>
<td>1</td>
<td>0</td>
<td>113803</td>
<td>53.1000</td>
<td>C123</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>0</td>
<td>3</td>
<td>Allen, Mr. William Henry</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>373450</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>5</th>
<td>6</td>
<td>0</td>
<td>3</td>
<td>Moran, Mr. James</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>330877</td>
<td>8.4583</td>
<td>NaN</td>
<td>Q</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>6</th>
<td>7</td>
<td>0</td>
<td>1</td>
<td>McCarthy, Mr. Timothy J</td>
<td>male</td>
<td>54.0</td>
<td>0</td>
<td>0</td>
<td>17463</td>
<td>51.8625</td>
<td>E46</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>7</th>
<td>8</td>
<td>0</td>
<td>3</td>
<td>Palsson, Master. Gosta Leonard</td>
<td>male</td>
<td>2.0</td>
<td>3</td>
<td>1</td>
<td>349909</td>
<td>21.0750</td>
<td>NaN</td>
<td>S</td>
<td>Master</td>
<td>5</td>
</tr>
<tr>
<th>8</th>
<td>9</td>
<td>1</td>
<td>3</td>
<td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td>
<td>female</td>
<td>27.0</td>
<td>0</td>
<td>2</td>
<td>347742</td>
<td>11.1333</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>3</td>
</tr>
<tr>
<th>9</th>
<td>10</td>
<td>1</td>
<td>2</td>
<td>Nasser, Mrs. Nicholas (Adele Achem)</td>
<td>female</td>
<td>14.0</td>
<td>1</td>
<td>0</td>
<td>237736</td>
<td>30.0708</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>10</th>
<td>11</td>
<td>1</td>
<td>3</td>
<td>Sandstrom, Miss. Marguerite Rut</td>
<td>female</td>
<td>4.0</td>
<td>1</td>
<td>1</td>
<td>PP 9549</td>
<td>16.7000</td>
<td>G6</td>
<td>S</td>
<td>Miss</td>
<td>3</td>
</tr>
<tr>
<th>11</th>
<td>12</td>
<td>1</td>
<td>1</td>
<td>Bonnell, Miss. Elizabeth</td>
<td>female</td>
<td>58.0</td>
<td>0</td>
<td>0</td>
<td>113783</td>
<td>26.5500</td>
<td>C103</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>12</th>
<td>13</td>
<td>0</td>
<td>3</td>
<td>Saundercock, Mr. William Henry</td>
<td>male</td>
<td>20.0</td>
<td>0</td>
<td>0</td>
<td>A/5. 2151</td>
<td>8.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>13</th>
<td>14</td>
<td>0</td>
<td>3</td>
<td>Andersson, Mr. Anders Johan</td>
<td>male</td>
<td>39.0</td>
<td>1</td>
<td>5</td>
<td>347082</td>
<td>31.2750</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>7</td>
</tr>
<tr>
<th>14</th>
<td>15</td>
<td>0</td>
<td>3</td>
<td>Vestrom, Miss. Hulda Amanda Adolfina</td>
<td>female</td>
<td>14.0</td>
<td>0</td>
<td>0</td>
<td>350406</td>
<td>7.8542</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>15</th>
<td>16</td>
<td>1</td>
<td>2</td>
<td>Hewlett, Mrs. (Mary D Kingcome)</td>
<td>female</td>
<td>55.0</td>
<td>0</td>
<td>0</td>
<td>248706</td>
<td>16.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>16</th>
<td>17</td>
<td>0</td>
<td>3</td>
<td>Rice, Master. Eugene</td>
<td>male</td>
<td>2.0</td>
<td>4</td>
<td>1</td>
<td>382652</td>
<td>29.1250</td>
<td>NaN</td>
<td>Q</td>
<td>Master</td>
<td>6</td>
</tr>
<tr>
<th>17</th>
<td>18</td>
<td>1</td>
<td>2</td>
<td>Williams, Mr. Charles Eugene</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>244373</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>18</th>
<td>19</td>
<td>0</td>
<td>3</td>
<td>Vander Planke, Mrs. Julius (Emelia Maria Vande...</td>
<td>female</td>
<td>31.0</td>
<td>1</td>
<td>0</td>
<td>345763</td>
<td>18.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>19</th>
<td>20</td>
<td>1</td>
<td>3</td>
<td>Masselmani, Mrs. Fatima</td>
<td>female</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>2649</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>20</th>
<td>21</td>
<td>0</td>
<td>2</td>
<td>Fynney, Mr. Joseph J</td>
<td>male</td>
<td>35.0</td>
<td>0</td>
<td>0</td>
<td>239865</td>
<td>26.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>21</th>
<td>22</td>
<td>1</td>
<td>2</td>
<td>Beesley, Mr. Lawrence</td>
<td>male</td>
<td>34.0</td>
<td>0</td>
<td>0</td>
<td>248698</td>
<td>13.0000</td>
<td>D56</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>22</th>
<td>23</td>
<td>1</td>
<td>3</td>
<td>McGowan, Miss. Anna "Annie"</td>
<td>female</td>
<td>15.0</td>
<td>0</td>
<td>0</td>
<td>330923</td>
<td>8.0292</td>
<td>NaN</td>
<td>Q</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>23</th>
<td>24</td>
<td>1</td>
<td>1</td>
<td>Sloper, Mr. William Thompson</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>113788</td>
<td>35.5000</td>
<td>A6</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>24</th>
<td>25</td>
<td>0</td>
<td>3</td>
<td>Palsson, Miss. Torborg Danira</td>
<td>female</td>
<td>8.0</td>
<td>3</td>
<td>1</td>
<td>349909</td>
<td>21.0750</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>5</td>
</tr>
<tr>
<th>25</th>
<td>26</td>
<td>1</td>
<td>3</td>
<td>Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>5</td>
<td>347077</td>
<td>31.3875</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>7</td>
</tr>
<tr>
<th>26</th>
<td>27</td>
<td>0</td>
<td>3</td>
<td>Emir, Mr. Farred Chehab</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>2631</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>27</th>
<td>28</td>
<td>0</td>
<td>1</td>
<td>Fortune, Mr. Charles Alexander</td>
<td>male</td>
<td>19.0</td>
<td>3</td>
<td>2</td>
<td>19950</td>
<td>263.0000</td>
<td>C23 C25 C27</td>
<td>S</td>
<td>Mr</td>
<td>6</td>
</tr>
<tr>
<th>28</th>
<td>29</td>
<td>1</td>
<td>3</td>
<td>O'Dwyer, Miss. Ellen "Nellie"</td>
<td>female</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>330959</td>
<td>7.8792</td>
<td>NaN</td>
<td>Q</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>29</th>
<td>30</td>
<td>0</td>
<td>3</td>
<td>Todoroff, Mr. Lalio</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>349216</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>861</th>
<td>862</td>
<td>0</td>
<td>2</td>
<td>Giles, Mr. Frederick Edward</td>
<td>male</td>
<td>21.0</td>
<td>1</td>
<td>0</td>
<td>28134</td>
<td>11.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>2</td>
</tr>
<tr>
<th>862</th>
<td>863</td>
<td>1</td>
<td>1</td>
<td>Swift, Mrs. Frederick Joel (Margaret Welles Ba...</td>
<td>female</td>
<td>48.0</td>
<td>0</td>
<td>0</td>
<td>17466</td>
<td>25.9292</td>
<td>D17</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>863</th>
<td>864</td>
<td>0</td>
<td>3</td>
<td>Sage, Miss. Dorothy Edith "Dolly"</td>
<td>female</td>
<td>28.0</td>
<td>8</td>
<td>2</td>
<td>CA. 2343</td>
<td>69.5500</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>11</td>
</tr>
<tr>
<th>864</th>
<td>865</td>
<td>0</td>
<td>2</td>
<td>Gill, Mr. John William</td>
<td>male</td>
<td>24.0</td>
<td>0</td>
<td>0</td>
<td>233866</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>865</th>
<td>866</td>
<td>1</td>
<td>2</td>
<td>Bystrom, Mrs. (Karolina)</td>
<td>female</td>
<td>42.0</td>
<td>0</td>
<td>0</td>
<td>236852</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>1</td>
</tr>
<tr>
<th>866</th>
<td>867</td>
<td>1</td>
<td>2</td>
<td>Duran y More, Miss. Asuncion</td>
<td>female</td>
<td>27.0</td>
<td>1</td>
<td>0</td>
<td>SC/PARIS 2149</td>
<td>13.8583</td>
<td>NaN</td>
<td>C</td>
<td>Miss</td>
<td>2</td>
</tr>
<tr>
<th>867</th>
<td>868</td>
<td>0</td>
<td>1</td>
<td>Roebling, Mr. Washington Augustus II</td>
<td>male</td>
<td>31.0</td>
<td>0</td>
<td>0</td>
<td>PC 17590</td>
<td>50.4958</td>
<td>A24</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>868</th>
<td>869</td>
<td>0</td>
<td>3</td>
<td>van Melkebeke, Mr. Philemon</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>345777</td>
<td>9.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>869</th>
<td>870</td>
<td>1</td>
<td>3</td>
<td>Johnson, Master. Harold Theodor</td>
<td>male</td>
<td>4.0</td>
<td>1</td>
<td>1</td>
<td>347742</td>
<td>11.1333</td>
<td>NaN</td>
<td>S</td>
<td>Master</td>
<td>3</td>
</tr>
<tr>
<th>870</th>
<td>871</td>
<td>0</td>
<td>3</td>
<td>Balkic, Mr. Cerin</td>
<td>male</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>349248</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>871</th>
<td>872</td>
<td>1</td>
<td>1</td>
<td>Beckwith, Mrs. Richard Leonard (Sallie Monypeny)</td>
<td>female</td>
<td>47.0</td>
<td>1</td>
<td>1</td>
<td>11751</td>
<td>52.5542</td>
<td>D35</td>
<td>S</td>
<td>Mrs</td>
<td>3</td>
</tr>
<tr>
<th>872</th>
<td>873</td>
<td>0</td>
<td>1</td>
<td>Carlsson, Mr. Frans Olof</td>
<td>male</td>
<td>33.0</td>
<td>0</td>
<td>0</td>
<td>695</td>
<td>5.0000</td>
<td>B51 B53 B55</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>873</th>
<td>874</td>
<td>0</td>
<td>3</td>
<td>Vander Cruyssen, Mr. Victor</td>
<td>male</td>
<td>47.0</td>
<td>0</td>
<td>0</td>
<td>345765</td>
<td>9.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>874</th>
<td>875</td>
<td>1</td>
<td>2</td>
<td>Abelson, Mrs. Samuel (Hannah Wizosky)</td>
<td>female</td>
<td>28.0</td>
<td>1</td>
<td>0</td>
<td>P/PP 3381</td>
<td>24.0000</td>
<td>NaN</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>875</th>
<td>876</td>
<td>1</td>
<td>3</td>
<td>Najib, Miss. Adele Kiamie "Jane"</td>
<td>female</td>
<td>15.0</td>
<td>0</td>
<td>0</td>
<td>2667</td>
<td>7.2250</td>
<td>NaN</td>
<td>C</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>876</th>
<td>877</td>
<td>0</td>
<td>3</td>
<td>Gustafsson, Mr. Alfred Ossian</td>
<td>male</td>
<td>20.0</td>
<td>0</td>
<td>0</td>
<td>7534</td>
<td>9.8458</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>877</th>
<td>878</td>
<td>0</td>
<td>3</td>
<td>Petroff, Mr. Nedelio</td>
<td>male</td>
<td>19.0</td>
<td>0</td>
<td>0</td>
<td>349212</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>878</th>
<td>879</td>
<td>0</td>
<td>3</td>
<td>Laleff, Mr. Kristo</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>349217</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>879</th>
<td>880</td>
<td>1</td>
<td>1</td>
<td>Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)</td>
<td>female</td>
<td>56.0</td>
<td>0</td>
<td>1</td>
<td>11767</td>
<td>83.1583</td>
<td>C50</td>
<td>C</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>880</th>
<td>881</td>
<td>1</td>
<td>2</td>
<td>Shelley, Mrs. William (Imanita Parrish Hall)</td>
<td>female</td>
<td>25.0</td>
<td>0</td>
<td>1</td>
<td>230433</td>
<td>26.0000</td>
<td>NaN</td>
<td>S</td>
<td>Mrs</td>
<td>2</td>
</tr>
<tr>
<th>881</th>
<td>882</td>
<td>0</td>
<td>3</td>
<td>Markun, Mr. Johann</td>
<td>male</td>
<td>33.0</td>
<td>0</td>
<td>0</td>
<td>349257</td>
<td>7.8958</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>882</th>
<td>883</td>
<td>0</td>
<td>3</td>
<td>Dahlberg, Miss. Gerda Ulrika</td>
<td>female</td>
<td>22.0</td>
<td>0</td>
<td>0</td>
<td>7552</td>
<td>10.5167</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>883</th>
<td>884</td>
<td>0</td>
<td>2</td>
<td>Banfield, Mr. Frederick James</td>
<td>male</td>
<td>28.0</td>
<td>0</td>
<td>0</td>
<td>C.A./SOTON 34068</td>
<td>10.5000</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>884</th>
<td>885</td>
<td>0</td>
<td>3</td>
<td>Sutehall, Mr. Henry Jr</td>
<td>male</td>
<td>25.0</td>
<td>0</td>
<td>0</td>
<td>SOTON/OQ 392076</td>
<td>7.0500</td>
<td>NaN</td>
<td>S</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>885</th>
<td>886</td>
<td>0</td>
<td>3</td>
<td>Rice, Mrs. William (Margaret Norton)</td>
<td>female</td>
<td>39.0</td>
<td>0</td>
<td>5</td>
<td>382652</td>
<td>29.1250</td>
<td>NaN</td>
<td>Q</td>
<td>Mrs</td>
<td>6</td>
</tr>
<tr>
<th>886</th>
<td>887</td>
<td>0</td>
<td>2</td>
<td>Montvila, Rev. Juozas</td>
<td>male</td>
<td>27.0</td>
<td>0</td>
<td>0</td>
<td>211536</td>
<td>13.0000</td>
<td>NaN</td>
<td>S</td>
<td>Rev</td>
<td>1</td>
</tr>
<tr>
<th>887</th>
<td>888</td>
<td>1</td>
<td>1</td>
<td>Graham, Miss. Margaret Edith</td>
<td>female</td>
<td>19.0</td>
<td>0</td>
<td>0</td>
<td>112053</td>
<td>30.0000</td>
<td>B42</td>
<td>S</td>
<td>Miss</td>
<td>1</td>
</tr>
<tr>
<th>888</th>
<td>889</td>
<td>0</td>
<td>3</td>
<td>Johnston, Miss. Catherine Helen "Carrie"</td>
<td>female</td>
<td>28.0</td>
<td>1</td>
<td>2</td>
<td>W./C. 6607</td>
<td>23.4500</td>
<td>NaN</td>
<td>S</td>
<td>Miss</td>
<td>4</td>
</tr>
<tr>
<th>889</th>
<td>890</td>
<td>1</td>
<td>1</td>
<td>Behr, Mr. Karl Howell</td>
<td>male</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>111369</td>
<td>30.0000</td>
<td>C148</td>
<td>C</td>
<td>Mr</td>
<td>1</td>
</tr>
<tr>
<th>890</th>
<td>891</td>
<td>0</td>
<td>3</td>
<td>Dooley, Mr. Patrick</td>
<td>male</td>
<td>32.0</td>
<td>0</td>
<td>0</td>
<td>370376</td>
<td>7.7500</td>
<td>NaN</td>
<td>Q</td>
<td>Mr</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>891 rows × 14 columns</p>
</div>
```python
titanic.family_size.value_counts()
```
1 537
2 161
3 102
4 29
6 22
5 15
7 12
11 7
8 6
Name: family_size, dtype: int64
```python
def func(family_size):
if family_size == 1:
return 'Singleton'
if family_size <= 4 and family_size >= 2:
return 'SmallFamily'
if family_size > 4:
return 'LargeFamily'
titanic['family_type'] = titanic.family_size.apply(func)
```
```python
titanic.family_type.value_counts()
```
Singleton 537
SmallFamily 292
LargeFamily 62
Name: family_type, dtype: int64
Python编程入门学习笔记(十)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.