python pandas 入門進階

原創

2020-04-26 23:37

本文是 python pandas 教學，入門介紹的繼續，主要介紹pandas 的過濾，排序，分組統計，子集。

過濾

import pandas as pd
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}
purchases = pd.DataFrame(data)
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])
fd=purchases[purchases['apples']>=2]
print(fd)

結果爲：

apples oranges
June 3 0
Robert 2 3

複雜一點的過濾，自定義函數和lambda

import pandas as pd
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}
def add(x, y):
    return (x + y)

purchases = pd.DataFrame(data)
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])


fd=purchases[purchases['apples']>=2]
fd2=purchases[add(purchases['apples'],purchases['oranges'])>=5]
df3=purchases[lambda x:x['apples']+x['oranges']<5]
print(fd2)

顯示爲：

apples oranges
Robert 2 3
Lily 0 7

df3
Out[150]:
apples oranges
June 3 0
David 1 2

排序

繼續上面數據的例子：

按照apples多少排序，缺省是從小到大

df.sort_values(by=['apples'], inplace=True)

顯示如下：

        apples oranges
Lily         0        7
David        1        2
Robert       2        3
June         3        0

反序要怎麼寫呢？

df.sort_values(by=['apples'], inplace=True, ascending=False)

顯示結果：

        apples oranges
June         3        0
Robert       2        3
David        1        2
Lily         0        7

排序也可以多列，如下：

df.sort_values(by=['apples','oranges'], inplace=True)

這裏先按apples排，相同情況下按oranges排，這裏數據少，apples也都不同，所以和上面結果一樣。

分組，統計

參考：https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/

# Group the data frame by month and item and extract a number of stats from each group
data.groupby(
    ['month', 'item']
).agg(
    {
        # Find the min, max, and sum of the duration column
        'duration': [min, max, sum],
        # find the number of network type entries
        'network_type': "count",
        # minimum, first, and number of unique dates
        'date': [min, 'first', 'nunique']
    }
)

下面圖片說明，但我按他輸入代碼，不對一樣。

但下面代碼方式我測試過了的。

data[data['item'] == 'call'].groupby('month').agg(
    # Get max of the duration column for each group
    max_duration=('duration', max),
    # Get min of the duration column for each group
    min_duration=('duration', min),
    # Get sum of the duration column for each group
    total_duration=('duration', sum),
    # Apply a lambda to date column
    num_days=("date", lambda x: (max(x) - min(x)).days)

實際例子：

import pandas as pd
 
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
 
grouped = df.groupby('Year')
 
for name,group in grouped:
   print(name)
   print(group)

顯示結果：

2014
     Team Rank Year Points
0 Riders     1 2014     876
2 Devils     2 2014     863
4   Kings     3 2014     741
9 Royals     4 2014     701
2015
      Team Rank Year Points
1   Riders     2 2015     789
3   Devils     3 2015     673
5    kings     4 2015     812
10 Royals     1 2015     804
2016
     Team Rank Year Points
6   Kings     1 2016     756
8 Riders     2 2016     694
2017
      Team Rank Year Points
7    Kings     1 2017     788
11 Riders     2 2017     690

子集

下面圖片來自：https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

下面操作需要先運行group 的那個例子。

df.iloc[[0,3,6], [0,2,3]]
Out[36]:
     Team Year Points
0 Riders 2014     876
3 Devils 2015     673
6   Kings 2016     756

df.iloc[:, [0,2,3]]
Out[37]:
      Team Year Points
0   Riders 2014     876
1   Riders 2015     789
2   Devils 2014     863
3   Devils 2015     673
4    Kings 2014     741
5    kings 2015     812
6    Kings 2016     756
7    Kings 2017     788
8   Riders 2016     694
9   Royals 2014     701
10 Royals 2015     804
11 Riders 2017     690

df.iloc[:, 0:2]
Out[38]:
      Team Rank
0   Riders     1
1   Riders     2
2   Devils     2
3   Devils     3
4    Kings     3
5    kings     4
6    Kings     1
7    Kings     1
8   Riders     2
9   Royals     4
10 Royals     1
11 Riders     2

df.loc[:5,['Team','Rank','Year']]
Out[49]:
     Team Rank Year
0 Riders     1 2014
1 Riders     2 2015
2 Devils     2 2014
3 Devils     3 2015
4   Kings     3 2014
5   kings     4 2015

df.loc[:5]
Out[50]:
     Team Rank Year Points
0 Riders     1 2014     876
1 Riders     2 2015     789
2 Devils     2 2014     863
3 Devils     3 2015     673
4   Kings     3 2014     741
5   kings     4 2015     812

其他

輸出不顯示index

答案鏈接：https://stackoverflow.com/questions/24644656/how-to-print-pandas-dataframe-without-index

答案是：

print(tmp.loc[:,['dateRep','cases','deaths']].to_string(index=False))

或者

print(tmp.to_string(index=False))

獲取Pandas DataFrame的行索引值作爲列表

答案鏈接：https://www.codenong.com/18358938/

答案：

df.index.values.tolist()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python pandas 入門進階

過濾

排序

分組，統計

子集

輸出不顯示index

獲取Pandas DataFrame的行索引值作爲列表

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

Java ThreadPoolShutdown

5月21日相聚上海張江！與文心大模型一起共建大模型產業應用生態圈

通義千問 2.5 “客串” ChatGPT4，你分的清嗎？

“她”來了，陪伴賽道鉅變！爲GPT-4o加上你的一個數字分身

京東秒送售後系統退款業務重構心得| 京東零售技術團隊

樹莓派上安裝python 的 opencv(非編譯方式)

樹莓派Raspberry Pi上安裝intel realsense 深度攝像頭D435i

用pickle,json 存取python變量到文件

目標檢測的模型haartraining培訓

深度相機的圖像深度實時顯示

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結