數據分析----pandas

原創

2018-12-25 18:21

數據分析----pandas

核心數據結構 Series & DataFrame

import pandas as pd
import numpy as npfrom pandas 
import Series, DataFrame

Series是一個定長的字典序列，它有兩個基本屬性index 、 value index 默認是 0 ,1,2,3 遞增的，也可以自己指定索引 index=[‘a’, ‘b’, ‘c’]

創建Series的三種方式

x1 = Series([1,2,3,4])
x2 = Series(data=[1,2,3,4],index=['a','b','c','d'])
dic = {'a':1,'b':2,'c':3,'d':4}
x3 = Series(dic)
print(x1)
print(x2)
print(x3)

0    1    1    2    2    3    3    4    
dtype: int64    
a    1    b    2    c    3    d    4    
dtype: int64    
a    1    b    2    c    3    d    4    
dtype: int64

DataFrame類似數據庫中的表，可以將其看成是由有相同的索引的Series組成

創建DataFra幾種方式

data = {"chinese":[90,80,70,60,50],'math':[70,80,70,90,60],'english':[30,50,70,80,60]}
df1 = DataFrame(data=data,index=['zhangfei','guanyu','zhaoyun','huangzhong','machao'])
print(df)

            chinese  english  math    
zhangfei         90       30    70    
guanyu           80       50    80    
zhaoyun          70       70    70   
huangzhong       60       80    90   
machao           50       60    60

import xlrddf2 = DataFrame(pd.read_excel('datas/grades.xlsx'))
df2 = df2.drop_duplicates()

print(df2)

            姓名   高數  英語  C++    
     0   蔣廣佳   43  69   61    
     1    廖菲   80  64   62    
     2   沈秀玲   68  74   98    
     3    韋丹   48  53   64    
     4   張夢雅   72  73   96    
     5   趙雅欣   60  63   70    
     6   曹海廣   74  60   20    
     7   陳澤燦   38  21   92    
     8    鄧傑   88  67   84    
        。。。。。。。。。。

數據清洗

刪除不必要的行或列

#刪除行
df2 = df2.drop(columns=['姓名'])
print(df2)

          高數  英語  C++    
     0    43  69   61    
     1    80  64   62    
     2    68  74   98   
     3    48  53   64    
     4    72  73   96

#刪除列
df2 = df2.drop(index = [27])
print(df2)

         高數  英語  C++    
    0    43  69   61    
    1    80  64   62    
    2    68  74   98    
    3    48  53   64    
    4    72  73   96

重命名列名

df2 = df2.rename(columns={'高數':'math','英語':'english'})

去除重複的值

df2 = df2.drop_duplicates()

更改數據格式

df2['math'] = df2['math'].astype('str')
#df2['math'].astype(np.int64)

清除數據間的空格

df2['math'] = df2['math'].map(str.strip) #刪除左右兩邊的空格
df2['math'] = df2['math'].map(str.lstrip) #刪除左邊的空格（str.rstrip 右邊的空格）

刪除指定字符

df2['math'] = df2['math'].str.strip('$')

大小寫轉換

df2.columns = df2.columns.str.upper() #全部大寫（lower（）全部小寫 title（）首字母大寫）

pythondf2

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

使用apply函數對數據進行清洗

#df2['MATH'] = df2['MATH'].apply(str.lower)
df2['MATH'] = df2['MATH'].astype(np.int64)

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

def plus(df):    
	df['Total'] = df['MATH']+df['ENGLISH']+df['C++']    
	return dfdf2 = df2.apply(plus,axis=1)

print(df2)

          MATH  ENGLISH  C++  Total    
    0     43       69   61    173   
     1     80       64   62    206    
     2     68       74   98    240    
     3     48       53   64    165    
     4     72       73   96    241

pandas中常用的統計函數

print(df2.describe())

                            MATH    ENGLISH         C++     Total   
              count   27.000000  27.000000   27.000000   27.000000    
              mean    69.444444  59.703704   81.148148  210.296296    
              std     16.113380  12.406000   17.933003   34.410212    
              min     38.000000  21.000000   20.000000  139.000000    
              25%     60.000000  55.000000   72.000000  193.500000    
              50%     70.000000  63.000000   86.000000  216.000000    
              75%     80.000000  68.000000   95.000000  239.500000    
              max    100.000000  74.000000  100.000000  258.000000

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

數據分析----pandas

數據分析----pandas

核心數據結構 Series & DataFrame

Series是一個定長的字典序列，它有兩個基本屬性index 、 value index 默認是 0 ,1,2,3 遞增的，也可以自己指定索引 index=[‘a’, ‘b’, ‘c’]

創建Series的三種方式

DataFrame類似數據庫中的表，可以將其看成是由有相同的索引的Series組成

創建DataFra幾種方式

數據清洗

刪除不必要的行或列

重命名列名

去除重複的值

更改數據格式

清除數據間的空格

刪除指定字符

大小寫轉換

使用apply函數對數據進行清洗

pandas中常用的統計函數

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

Ubuntu18.04下搭建LAMP環境

Ubuntu安裝Tomcat以及Mysql（Javaweb項目發佈）

爬取網易雲熱門評論，並生成詞雲

創建自己的私有云

兩步在ubuntu 18.04 上安裝node&npm

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100

	MATH	ENGLISH	C++
0	43	69	61
1	80	64	62
2	68	74	98
3	48	53	64
4	72	73	96
5	60	63	70
6	74	60	20
7	38	21	92
8	88	67	84
9	86	74	96
10	84	60	90
11	64	69	96
12	60	33	70
13	76	56	84
14	68	54	94
15	68	63	98
16	39	44	56
17	90	63	90
18	64	63	78
19	74	60	76
20	52	48	94
21	60	69	74
22	70	49	76
23	91	67	86
24	78	73	88
25	100	60	98
26	80	63	100