Python Pandas 的使用——DataFrame

Pandas是一個強大的分析結構化數據的工具集；它的使用基礎是Numpy（提供高性能的矩陣運算）；用於數據挖掘和數據分析，同時也提供數據清洗功能。

1. Pandas 安裝

官方推薦的安裝方式是通過Anaconda安裝，但Anaconda太過龐大，若只是需要Pandas的功能，則可通過PyPi方式安裝。

pip install Pandas

2. Pandas 的數據結構——DataFrame

使用pandas前需要先引入pandas，若無特別說明，pd作爲Pandas別名的通用寫法

import pandas as pd

2.1 DataFrame的創建

DataFrame 的定義
- DataFrame是一張二維表，可以看成是電子表格，Sql表，或者說是以一維數據Series爲元素的Series
- 也可以看做是二維的numpy.ndarray
通過dict 生成DataFrame

d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d,index=['d', 'b', 'a']) # 設置index參數，對DataFrame的行重新排序
df2 = pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three']) # 設置columns參數添加新的列或者修改一列數據

out:
    df
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

通過ndarray創建

df = pd.DataFrame(np.random.randn(2,3))

out:
    df
          0         1         2
0  0.545203  0.645277 -1.464948
1 -0.529290 -0.537678  1.181774

通過 dict 的 list 創建

data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df3 = pd.DataFrame(data2)
df3 = pd.DataFrame(data2, index=['first', 'second'])
df4 = pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))
# 通過設置orient='index'，可使行列倒置
df5 = pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),orient='index', columns=['one', 'two', 'three'])

out:
    df3
   a   b     c
0  1   2   NaN
1  5  10  20.0
	df4
   A  B
0  1  4
1  2  5
2  3  6
	df5
   one  two  three
A    1    2      3
B    4    5      6

通過 tuples 的 dict 來創建

df6 = pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

out:
    df6
       a              b      
       b    a    c    a     b
A B  1.0  4.0  5.0  8.0  10.0
  C  2.0  3.0  6.0  7.0   NaN
  D  NaN  NaN  NaN  NaN   9.0

2.2 DataFrame 的插入

out:
    df2
   two three
d  NaN   NaN
b  2.0   NaN
a  1.0   NaN

插入列

data = [1., 2., 3.]
df2.insert(0,'four',data)   # 參數:(插入的位置，插入列的列名,列數據)

df2
   four  two three
d   1.0  NaN   NaN
b   2.0  2.0   NaN
a   3.0  1.0   NaN

插入行

通過append()函數插入

row = {'two':4,'four':5,'three':8}	
df2.append(row,ignore_index=True)

out:
    df2
   four  two three
d   1.0  4.0   NaN
b   2.0  2.0   NaN
a   3.0  1.0   NaN

通過loc指定位置插入

df2.loc[2]=[9,10,11]

df2
   four   two three
d   1.0   4.0   NaN
b   2.0   2.0   NaN
a   3.0   1.0   NaN
2   9.0  10.0    11

2.3 DataFrame獲取數據

獲取列數據

df2['two']
df['three'] = df['one'] * df['two'] # 將列'one'的值*列'two'的值賦給'three'
df['flag'] = df['one'] > 2 #新增列

通過loc函數獲取

df2.loc[:,'two]  # 取出標籤爲‘two'的列數據

通過iloc函數獲取

df2.iloc[:,1]   # 取出第2列數據

獲取行數據

通過切片方式獲取

df2[1:3]

out:
    df2[1:3]
   four  two three
b   2.0  2.0   NaN
a   3.0  1.0   NaN

按照條件獲取

df2[df2.four > 2]

out:
    df2[df2.four > 2]
   four   two three
a   3.0   1.0   NaN
2   9.0  10.0    11

通過loc函數

df2.loc['a'] # df2.loc['a', 'b']可取多行

out:
    df2.loc['a']
four       3
two        1
three    NaN
Name: a, dtype: object

通過iloc函數

df2.iloc[1] # df2.iloc[1,1] 指取出第2行第2列的數據

out:
    df2.iloc[1]
four       2
two        2
three    NaN
Name: b, dtype: object

loc函數與iloc函數的區別：

loc是通過標籤來獲取數據，loc[‘a’]取出標籤爲’a’的行，loc[:,‘two],取出標籤爲‘two’的列數據，loc[‘a’,‘two’]取出行標籤爲’a’，列標籤爲’two’的數據

iloc是通過位置來獲取數據，iloc[1]取出第二行，iloc[:,1]取出第2列數據，iloc[1,1] 指取出第2行第2列的數據

2.4 DataFrame 刪除數據

drop函數

df2.drop('a')  # 刪除行標籤爲'a'的行，實際上是df2.drop('a', axis=0)
df2.drop('three', axis=1)   # 刪除列標籤爲'three'的列
df2.drop(df2.index[2])   # 根據DataFrame的默認整型索引指定位置進行刪除，
# 原理是df2.index得到df2的行標籤數組，再通過定位得到具體的行標籤，本質上和第一行代碼功能一樣

drop函數的axis屬性有倆個值，0與1，當設置爲0時，表示刪除行，當設置爲1時，表示刪除列，默認爲0

pop函數

df2.pop('four')

df2
    two three
d   4.0   NaN
b   2.0   NaN
a   1.0   NaN
2  10.0    11

drop與pop函數的區別在於，

drop函數不對源DataFrame進行修改，而是返回一個新的對象，pop函數是直接在源DataFrame進行修改

drop不指定axis參數時默認對行進行刪除，pop默認對列進行刪除

條件刪除

# 條件刪除的本意並不對原DataFrame進行修改，而是通過條件篩選出需要保留的數據並返回一個新的對象
# eg:刪除列標籤爲'two'且數據大於9的行
df2[df2['two'] <= 9]

df2[df2['two'] <= 9]
   four  two three
d   1.0  4.0   NaN
b   2.0  2.0   NaN
a   3.0  1.0   NaN

2.5 DataFrame屬性

屬性	說明
index	以數組方式返回行標籤
columns	以數組方式返回列標籤
axes	以二維數組方式返回行標籤與列標籤
T	返回行列轉置後的DataFrame，類似於矩陣

2.6 DataFrame函數（獲取屬性層面的函數）

函數	說明
info()	打印二維數組的信息，包含行列標籤，內存大小，
head(i)	顯示前i行數據，以行爲單位
tail(i)	顯示後i行數據，以行爲單位
describe()	查看數據值列的彙總統計

Python Pandas 的使用——DataFrame

Python Pandas 的使用——DataFrame

1. Pandas 安裝

2. Pandas 的數據結構——DataFrame

使用pandas前需要先引入pandas，若無特別說明，pd作爲Pandas別名的通用寫法

2.1 DataFrame的創建

2.2 DataFrame 的插入

2.3 DataFrame獲取數據

2.4 DataFrame 刪除數據

2.5 DataFrame屬性

2.6 DataFrame函數（獲取屬性層面的函數）

Go語言之簡單算法的實現——查找算法

Netflix Zuul 1.x 的理念與原理學習

Go語言之簡單算法的實現——插入排序算法

Pandas一些常見場景的解決方案

Scala中一些經典場景的解決方案

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結