pandas學習筆記(第二彈)

注:本教程爲系列教程此章節接前面第一彈
跳轉到專題地址

本文嚮導

9.選取多個DataFrame

9.1 用列表選取多列

# 選取不同演員的FaceBook的粉絲數
movies[["actor_1_facebook_likes","actor_2_facebook_likes","actor_3_facebook_likes"]]

# 錯誤示例:
# movies["actor_1_facebook_likes","actor_2_facebook_likes","actor_3_facebook_likes"]
actor_1_facebook_likes actor_2_facebook_likes actor_3_facebook_likes
0 1000.0 936.0 855.0
1 40000.0 5000.0 1000.0
2 11000.0 393.0 161.0
3 27000.0 23000.0 23000.0
... ... ... ...
4912 841.0 593.0 319.0
4913 0.0 0.0 0.0
4914 946.0 719.0 489.0
4915 86.0 23.0 16.0

4916 rows × 3 columns

9.2 選取單列

movies[["actor_1_facebook_likes"]]
actor_1_facebook_likes
0 1000.0
1 40000.0
2 11000.0
3 27000.0
... ...
4912 841.0
4913 0.0
4914 946.0
4915 86.0

4916 rows × 1 columns

!!!注意在這裏要區別下面的這種取法!!!

print(type(movies["actor_1_facebook_likes"]))
print(type(movies[["actor_1_facebook_likes"]]))
# 一種取出的是DataFrame 一種取出的是Series
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

9.3 通過類型選擇DataFrame

# 這樣就會篩選出字段的類型爲整形和浮點型的列(“number”表示選取所有數值列)
movies.select_dtypes(include=["int","float"])
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 723.0 178.0 0.0 855.0 ... 936.0 7.9 1.78 33000
1 302.0 169.0 563.0 1000.0 ... 5000.0 7.1 2.35 0
2 602.0 148.0 0.0 161.0 ... 393.0 6.8 2.35 85000
3 813.0 164.0 22000.0 23000.0 ... 23000.0 8.5 2.35 164000
... ... ... ... ... ... ... ... ... ...
4912 43.0 43.0 NaN 319.0 ... 593.0 7.5 16.00 32000
4913 13.0 76.0 0.0 0.0 ... 0.0 6.3 NaN 16
4914 14.0 100.0 0.0 489.0 ... 719.0 6.3 2.35 660
4915 43.0 90.0 16.0 16.0 ... 23.0 6.6 1.85 456

4916 rows × 16 columns

9.4 通過過濾器選取DataFrame

9.4.1 like 過濾法

movies.filter(like="actor_1")
actor_1_facebook_likes actor_1_name
0 1000.0 CCH Pounder
1 40000.0 Johnny Depp
2 11000.0 Christoph Waltz
3 27000.0 Tom Hardy
... ... ...
4912 841.0 Natalie Zea
4913 0.0 Eva Boehnke
4914 946.0 Alan Ruck
4915 86.0 John August

4916 rows × 2 columns

9.4.2 items 選取

movies.filter(items=["actor_1_name","actor_1_facebook_likes"])
actor_1_name actor_1_facebook_likes
0 CCH Pounder 1000.0
1 Johnny Depp 40000.0
2 Christoph Waltz 11000.0
3 Tom Hardy 27000.0
... ... ...
4912 Natalie Zea 841.0
4913 Eva Boehnke 0.0
4914 Alan Ruck 946.0
4915 John August 86.0

4916 rows × 2 columns

9.4.3 正則選取

# 選取不是以actor開頭的字段
movies.filter(regex="^(?!actor)")
color director_name num_critic_for_reviews duration ... title_year imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 ... 2009.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 ... 2007.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 ... 2015.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 ... 2012.0 8.5 2.35 164000
... ... ... ... ... ... ... ... ... ...
4912 Color NaN 43.0 43.0 ... NaN 7.5 16.00 32000
4913 Color Benjamin Roberds 13.0 76.0 ... 2013.0 6.3 NaN 16
4914 Color Daniel Hsia 14.0 100.0 ... 2012.0 6.3 2.35 660
4915 Color Jon Gunn 43.0 90.0 ... 2004.0 6.6 1.85 456

4916 rows × 22 columns

10.按照給定的列順序排序

disc_core = ['movie_title','title_year', 'content_rating','genres']
disc_people = ['director_name','actor_1_name', 'actor_2_name','actor_3_name']
disc_other = ['color','country','language','plot_keywords','movie_imdb_link']
cont_fb = ['director_facebook_likes','actor_1_facebook_likes','actor_2_facebook_likes','actor_3_facebook_likes', 'cast_total_facebook_likes', 'movie_facebook_likes']
cont_finance = ['budget','gross']
cont_num_reviews = ['num_voted_users','num_user_for_reviews', 'num_critic_for_reviews']
cont_other = ['imdb_score','duration', 'aspect_ratio','facenumber_in_poster']

new_col_order = disc_core + disc_people + disc_other + cont_fb + cont_finance + cont_num_reviews + cont_other

# 我們把所有的列名按照自己的順序進行排列,最後查看和原來的所有列是否一致
print(set(new_col_order) == set(movies.columns))

# 可以看到列顯示的順序已經按照我們的要求進行顯示了
movies[new_col_order]
True
movie_title title_year content_rating genres ... imdb_score duration aspect_ratio facenumber_in_poster
0 Avatar 2009.0 PG-13 Action|Adventure|Fantasy|Sci-Fi ... 7.9 178.0 1.78 0.0
1 Pirates of the Caribbean: At World's End 2007.0 PG-13 Action|Adventure|Fantasy ... 7.1 169.0 2.35 0.0
2 Spectre 2015.0 PG-13 Action|Adventure|Thriller ... 6.8 148.0 2.35 1.0
3 The Dark Knight Rises 2012.0 PG-13 Action|Thriller ... 8.5 164.0 2.35 0.0
... ... ... ... ... ... ... ... ... ...
4912 The Following NaN TV-14 Crime|Drama|Mystery|Thriller ... 7.5 43.0 16.00 1.0
4913 A Plague So Pleasant 2013.0 NaN Drama|Horror|Thriller ... 6.3 76.0 NaN 0.0
4914 Shanghai Calling 2012.0 PG-13 Comedy|Drama|Romance ... 6.3 100.0 2.35 5.0
4915 My Date with Drew 2004.0 PG Documentary ... 6.6 90.0 1.85 0.0

4916 rows × 28 columns

11 操作整個DataFrame

11.1 獲取DataFrame的行和列數

movies.shape
(4916, 28)

11.2 獲取DataFrame的個數

movies.size
137648

11.3 獲取DataFrame的長度

len(movies)
4916

11.4 獲取DataFrame的維度

movies.ndim
2

11.5 統計DataFrame中各種類型的個數

movies.count()
color                     4897
director_name             4814
num_critic_for_reviews    4867
duration                  4901
                          ... 
actor_2_facebook_likes    4903
imdb_score                4916
aspect_ratio              4590
movie_facebook_likes      4916
Length: 28, dtype: int64

11.6 取DataFrame中各列的最小值

# 都是對數值類型的列進行統計
movies.min()
num_critic_for_reviews     1.00
duration                   7.00
director_facebook_likes    0.00
actor_3_facebook_likes     0.00
                           ... 
actor_2_facebook_likes     0.00
imdb_score                 1.60
aspect_ratio               1.18
movie_facebook_likes       0.00
Length: 16, dtype: float64

11.7 取DataFrame中各列的最大值

movies.max()
num_critic_for_reviews        813.0
duration                      511.0
director_facebook_likes     23000.0
actor_3_facebook_likes      23000.0
                             ...   
actor_2_facebook_likes     137000.0
imdb_score                      9.5
aspect_ratio                   16.0
movie_facebook_likes       349000.0
Length: 16, dtype: float64

11.8 取DataFrame中各列的平均值

movies.mean()
num_critic_for_reviews      137.988905
duration                    107.090798
director_facebook_likes     691.014541
actor_3_facebook_likes      631.276313
                              ...     
actor_2_facebook_likes     1621.923516
imdb_score                    6.437429
aspect_ratio                  2.222349
movie_facebook_likes       7348.294142
Length: 16, dtype: float64

11.9 取DataFrame中各列的中位數

movies.median()
num_critic_for_reviews     108.00
duration                   103.00
director_facebook_likes     48.00
actor_3_facebook_likes     366.00
                            ...  
actor_2_facebook_likes     593.00
imdb_score                   6.60
aspect_ratio                 2.35
movie_facebook_likes       159.00
Length: 16, dtype: float64

11.10 取DataFrame中各列的分位數

movies.quantile([0.5,0.3])
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0.5 108.0 103.0 48.0 366.0 ... 593.0 6.6 2.35 159.0
0.3 60.0 95.0 11.0 176.0 ... 345.0 6.0 1.85 0.0

2 rows × 16 columns

11.11 取DataFrame中各列的標準差

movies.std()
num_critic_for_reviews       120.239379
duration                      25.286015
director_facebook_likes     2832.954125
actor_3_facebook_likes      1625.874802
                               ...     
actor_2_facebook_likes      4011.299523
imdb_score                     1.127802
aspect_ratio                   1.402940
movie_facebook_likes       19206.016458
Length: 16, dtype: float64

11.12 取DataFrame中各列的和

movies.sum()
num_critic_for_reviews       671592.00
duration                     524852.00
director_facebook_likes     3326544.00
actor_3_facebook_likes      3088835.00
                              ...     
actor_2_facebook_likes      7952291.00
imdb_score                    31646.40
aspect_ratio                  10200.58
movie_facebook_likes       36124214.00
Length: 16, dtype: float64

補充說明:上面的統計函數可以傳入skip = bool 選擇是否跳過空值

11.13 獲取DataFrame中各列的統計信息

movies.describe()
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
count 4867.000000 4901.000000 4814.000000 4893.000000 ... 4903.000000 4916.000000 4590.000000 4916.000000
mean 137.988905 107.090798 691.014541 631.276313 ... 1621.923516 6.437429 2.222349 7348.294142
std 120.239379 25.286015 2832.954125 1625.874802 ... 4011.299523 1.127802 1.402940 19206.016458
min 1.000000 7.000000 0.000000 0.000000 ... 0.000000 1.600000 1.180000 0.000000
25% 49.000000 93.000000 7.000000 132.000000 ... 277.000000 5.800000 1.850000 0.000000
50% 108.000000 103.000000 48.000000 366.000000 ... 593.000000 6.600000 2.350000 159.000000
75% 191.000000 118.000000 189.750000 633.000000 ... 912.000000 7.200000 2.350000 2000.000000
max 813.000000 511.000000 23000.000000 23000.000000 ... 137000.000000 9.500000 16.000000 349000.000000

8 rows × 16 columns

11.14 判斷DataFrame中的空值

movies.isnull()
color director_name num_critic_for_reviews duration ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 False False False False ... False False False False
1 False False False False ... False False False False
2 False False False False ... False False False False
3 False False False False ... False False False False
... ... ... ... ... ... ... ... ... ...
4912 False True False False ... False False False False
4913 False False False False ... False False True False
4914 False False False False ... False False False False
4915 False False False False ... False False False False

4916 rows × 28 columns

11.15 判斷DataFrame中的非空值

movies.notnull()
color director_name num_critic_for_reviews duration ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 True True True True ... True True True True
1 True True True True ... True True True True
2 True True True True ... True True True True
3 True True True True ... True True True True
... ... ... ... ... ... ... ... ... ...
4912 True False True True ... True True True True
4913 True True True True ... True True False True
4914 True True True True ... True True True True
4915 True True True True ... True True True True

4916 rows × 28 columns

11.16 判斷DataFrame中的值是否全爲真

# 這樣便可以查看那些是沒有空值的列
movies.notnull().all()
color                     False
director_name             False
num_critic_for_reviews    False
duration                  False
                          ...  
actor_2_facebook_likes    False
imdb_score                 True
aspect_ratio              False
movie_facebook_likes       True
Length: 28, dtype: bool

11.17 在進行對有空值的列進行統計時,填充空值後才能統計

# 必須將空值進行補充後才能統計,不然返回會爲空
movies[["director_name"]].fillna("").max()
director_name    Étienne Faure
dtype: object

12.DataFrame算數運算

12.1準備數據

matrix = pd.DataFrame([np.random.randint(0,10,3) for i in range(3)])
matrix
0 1 2
0 2 0 5
1 8 0 6
2 6 5 9

12.2 加法運算

12.2.1 操作符方式

matrix + 5
0 1 2
0 7 5 10
1 13 5 11
2 11 10 14

12.2.2 函數方式

matrix.add(5)
0 1 2
0 7 5 10
1 13 5 11
2 11 10 14

12.3 減法運算

12.3.1 操作符方法

matrix - 5
0 1 2
0 -3 -5 0
1 3 -5 1
2 1 0 4

12.3.2 函數方法

matrix.sub(5)
0 1 2
0 -3 -5 0
1 3 -5 1
2 1 0 4

12.4 乘法運算

12.4.1 操作符方法

matrix * 5
0 1 2
0 10 0 25
1 40 0 30
2 30 25 45

12.4.2 函數方法

matrix.mul(5)
0 1 2
0 10 0 25
1 40 0 30
2 30 25 45

12.5 除法運算

12.5.1 操作符方法

matrix / 5
0 1 2
0 0.4 0.0 1.0
1 1.6 0.0 1.2
2 1.2 1.0 1.8

12.5.2 函數方法

matrix.div(5)
0 1 2
0 0.4 0.0 1.0
1 1.6 0.0 1.2
2 1.2 1.0 1.8

12.6 整除運算

12.6.1 操作符方法

matrix // 5
0 1 2
0 0 0 1
1 1 0 1
2 1 1 1

12.6.2 函數方法

matrix.floordiv(5)
0 1 2
0 0 0 1
1 1 0 1
2 1 1 1

12.7 取模運算

12.7.1 操作符方法

matrix % 5
0 1 2
0 2 0 0
1 3 0 1
2 1 0 4

12.7.2 函數方法

matrix.mod(5)
0 1 2
0 2 0 0
1 3 0 1
2 1 0 4

13.DataFrame比較運算

13.1 大於

13.1.1 算數方法

matrix > 5
0 1 2
0 False False False
1 True False True
2 True False True

13.1.2 函數方法

matrix.gt(5)
0 1 2
0 False False False
1 True False True
2 True False True

13.2 大於等於

13.2.1 算數方法

matrix >= 5
0 1 2
0 False False True
1 True False True
2 True True True

13.2.2 函數方法

matrix.ge(5)
0 1 2
0 False False True
1 True False True
2 True True True

13.3 小於

13.3.1 算數方法

matrix < 5
0 1 2
0 True True False
1 False True False
2 False False False

13.3.2 函數方法

matrix.lt(5)
0 1 2
0 True True False
1 False True False
2 False False False

13.4 小於等於

13.4.1 算數方法

matrix <= 5
0 1 2
0 True True True
1 False True False
2 False True False

13.4.2 函數方法

matrix.le(5)
0 1 2
0 True True True
1 False True False
2 False True False

13.5 等於

13.5.1 算數方法

matrix == 5
0 1 2
0 False False True
1 False False False
2 False True False

13.5.2 函數方法

matrix.eq(5)
0 1 2
0 False False True
1 False False False
2 False True False

13.6 不等於

13.6.1 算數方法

matrix != 5
0 1 2
0 True True False
1 True True True
2 True False True

13.6.2 函數方法

matrix.ne(5)
0 1 2
0 True True False
1 True True True
2 True False True
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章