Pandas詳解二之DataFrame對象

約定
import pandas as pd
from pandas import DataFrame
import numpy as np

DataFrame

DataFrame是一個表格型的數據結構,既有行索引(保存在index)又有列索引(保存在columns)

一、DataFrame對象常用屬性:

  • 創建DateFrame方法有很多(後面再介紹),最常用的是直接傳入一個由等長列表或Numpy組成的字典:
dict1={"Province":["Guangdong","Beijing","Qinghai","Fujiang"],
      "year":[2018]*4,
      "pop":[1.3,2.5,1.1,0.7]}
df1=DataFrame(dict1)
df1
代碼結果:
Province pop year
0 Guangdong 1.3 2018
1 Beijing 2.5 2018
2 Qinghai 1.1 2018
3 Fujiang 0.7 2018
  • 同Series一樣,也可在創建時指定序列(對於字典中缺失的用NaN):
df2=DataFrame(dict1,columns=['year','Province','pop','debt'],index=['one','two','three','four'])
df2
代碼結果:
year Province pop debt
one 2018 Guangdong 1.3 NaN
two 2018 Beijing 2.5 NaN
three 2018 Qinghai 1.1 NaN
four 2018 Fujiang 0.7 NaN
  • 同Series一樣,DataFrame的index和columns有name屬性
df2
代碼結果:
year Province pop debt
one 2018 Guangdong 1.3 NaN
two 2018 Beijing 2.5 NaN
three 2018 Qinghai 1.1 NaN
four 2018 Fujiang 0.7 NaN
df2.index.name='English'
df2.columns.name='Province'
df2
代碼結果:
Province year Province pop debt
English
one 2018 Guangdong 1.3 NaN
two 2018 Beijing 2.5 NaN
three 2018 Qinghai 1.1 NaN
four 2018 Fujiang 0.7 NaN
  • 通過shape屬性獲取DataFrame的行數和列數:
df2.shape

代碼結果:

(4, 4)
  • values屬性也會以二維ndarray的形式返回DataFrame的數據:
df2.values

代碼結果:

array([[2018, 'Guangdong', 1.3, nan],
       [2018, 'Beijing', 2.5, nan],
       [2018, 'Qinghai', 1.1, nan],
       [2018, 'Fujiang', 0.7, nan]], dtype=object)
  • 列索引會作爲DataFrame對象的屬性:
df2.Province

代碼結果:

English
one      Guangdong
two        Beijing
three      Qinghai
four       Fujiang
Name: Province, dtype: object

二、DataFrame對象常見存取、賦值和刪除方式:

  • DataFrame_object[ ] 能通過列索引來存取,當只有一個標籤則返回Series,多於一個則返回DataFrame:
df2['Province']
代碼結果: English one Guangdong two Beijing three Qinghai four Fujiang Name: Province, dtype: object
df2[['Province','pop']]
代碼結果:
Province Province pop
English
one Guangdong 1.3
two Beijing 2.5
three Qinghai 1.1
four Fujiang 0.7
  • DataFrame_object.loc[ ] 能通過行索引來獲取指定行:
df2.loc['one']

代碼結果:

Province
year             2018
Province    Guangdong
pop               1.3
debt              NaN
Name: one, dtype: object
df2.loc['one':'three']
代碼結果:
Province year Province pop debt
English
one 2018 Guangdong 1.3 NaN
two 2018 Beijing 2.5 NaN
three 2018 Qinghai 1.1 NaN
  • 還可以獲取單值
df2.loc['one','Province']

代碼結果:

'Guangdong'
  • DataFrame的列可以通過賦值(一個值或一組值)來修改
df2["debt"]=np.arange(2,3,0.25)
df2
代碼結果:
Province year Province pop debt
English
one 2018 Guangdong 1.3 2.00
two 2018 Beijing 2.5 2.25
three 2018 Qinghai 1.1 2.50
four 2018 Fujiang 0.7 2.75
  • 爲不存在的列賦值會創建一個新的列,可通過del來刪除
df2['eastern']=df2.Province=='Guangdong'
df2
代碼結果:
Province year Province pop debt eastern
English
one 2018 Guangdong 1.3 2.00 True
two 2018 Beijing 2.5 2.25 False
three 2018 Qinghai 1.1 2.50 False
four 2018 Fujiang 0.7 2.75 False
del df2['eastern']
df2.columns

代碼結果:

Index(['year', 'Province', 'pop', 'debt'], dtype='object', name='Province')
  • 當然,還可以轉置
df2.T
English one two three four
Province
year 2018 2018 2018 2018
Province Guangdong Beijing Qinghai Fujiang
pop 1.3 2.5 1.1 0.7
debt 2 2.25 2.5 2.75

三、多種創建DataFrame方式

  • 調用DataFrame()可以將多種格式的數據轉換爲DataFrame對象,它的的三個參數data、index和columns分別爲數據、行索引和列索引。data可以是:

1 二維數組

df3=pd.DataFrame(np.random.randint(0,10,(4,4)),index=[1,2,3,4],columns=['A','B','C','D'])
df3
代碼結果:
A B C D
1 9 8 4 6
2 5 7 7 4
3 6 3 0 2
4 4 6 9 8

2 字典

行索引由index決定,列索引由字典的鍵決定

dict1

代碼結果:

{'Province': ['Guangdong', 'Beijing', 'Qinghai', 'Fujiang'],
 'pop': [1.3, 2.5, 1.1, 0.7],
 'year': [2018, 2018, 2018, 2018]}
df4=pd.DataFrame(dict1,index=[1,2,3,4])
df4
代碼結果:
Province pop year
1 Guangdong 1.3 2018
2 Beijing 2.5 2018
3 Qinghai 1.1 2018
4 Fujiang 0.7 2018

3 結構數組

其中列索引由結構數組的字段名決定

arr=np.array([('item1',10),('item2',20),('item3',30),('item4',40)],dtype=[("name","10S"),("count",int)])
df5=pd.DataFrame(arr)
df5
代碼結果:
name count
0 b’item1’ 10
1 b’item2’ 20
2 b’item3’ 30
3 b’item4’ 40
  • 此外可以調用from_開頭的類方法,將特定的數據轉換爲DataFrame對象。例如from_dict(),其orient參數指定字典鍵對應的方向,默認爲”columns”:
dict2={"a":[1,2,3],"b":[4,5,6]}
df6=pd.DataFrame.from_dict(dict2)
df6
代碼結果:
a b
0 1 4
1 2 5
2 3 6
df7=pd.DataFrame.from_dict(dict2,orient="index")
df7
代碼結果:
0 1 2
a 1 2 3
b 4 5 6

四、將DataFrame對象轉換爲其他格式的數據

  • to_dict()方法將DataFrame對象轉換爲字典,參數orient決定字典元素的類型:
df7.to_dict()

代碼結果:

{0: {'a': 1, 'b': 4}, 1: {'a': 2, 'b': 5}, 2: {'a': 3, 'b': 6}}
df7.to_dict(orient="records")

代碼結果:

[{0: 1, 1: 2, 2: 3}, {0: 4, 1: 5, 2: 6}]
df7.to_dict(orient="list")

代碼結果:

{0: [1, 4], 1: [2, 5], 2: [3, 6]}
  • 類似的還有to_records()、to_csv()等

謝謝大家的瀏覽,
希望我的努力能幫助到您,
共勉!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章