pandas學習筆記(第三彈)

注:本教程爲系列教程此章節接前面第一彈

14 選取數據的子集

14.1 選取Series的行

14.1.1 從DataFrame中獲取一列作爲Series

city = college_data["CITY"]
print(city)
print("<"+"="*75+">")
print("類型爲:",type(city))
INSTNM
Alabama A & M University                                            Normal
University of Alabama at Birmingham                             Birmingham
Amridge University                                              Montgomery
University of Alabama in Huntsville                             Huntsville
                                                                ...       
Rasmussen College - Overland Park                            Overland Park
National Personal Training Institute of Cleveland         Highland Heights
Bay Area Medical Academy - San Jose Satellite Location            San Jose
Excel Learning Center-San Antonio South                        San Antonio
Name: CITY, Length: 7535, dtype: object
<===========================================================================>
類型爲: <class 'pandas.core.series.Series'>

14.1.2 iloc 用法

14.1.2.1 傳入整數索引選取一個

city.iloc[0]
'Normal'

14.1.2.2 傳入整數列表選取一個新的Series

# 當傳入列表時發現獲取出來的數據結構依然是一個Series
city.iloc[[0,1,2,3]]
INSTNM
Alabama A & M University                   Normal
University of Alabama at Birmingham    Birmingham
Amridge University                     Montgomery
University of Alabama in Huntsville    Huntsville
Name: CITY, dtype: object

14.1.2.3 分片獲取

# 獲取整數索引 [0,10) 步長爲2,這樣選出的依然爲Series
city[0:10:2]
INSTNM
Alabama A & M University                     Normal
Amridge University                       Montgomery
Alabama State University                 Montgomery
Central Alabama Community College    Alexander City
Auburn University at Montgomery          Montgomery
Name: CITY, dtype: object

14.1.3 loc 用法

傳入索引標籤選取一個

city["Alabama A & M University"]
'Normal'

14.1.3.1 通過標籤列表選取多行

# 這樣通過傳入的標籤索引列表選取多行,返回的依然是Series
city[["Alabama A & M University","Amridge University"]]
INSTNM
Alabama A & M University        Normal
Amridge University          Montgomery
Name: CITY, dtype: object

14.1.3.2 分片選取

# 選取標籤索引 [start_target,end_target] 步長爲1的行,返回的是Series,注意這裏是端點值都能取到
city["Alabama A & M University":"University of Alabama in Huntsville":1]
INSTNM
Alabama A & M University                   Normal
University of Alabama at Birmingham    Birmingham
Amridge University                     Montgomery
University of Alabama in Huntsville    Huntsville
Name: CITY, dtype: object

14.2 選取DataFrame的行

14.2.1 iloc用法

14.2.1.1 傳入一個整數索引值獲取一行數據(返回類型爲Series)

college_data.iloc[0]
CITY                  Normal
STABBR                    AL
HBCU                       1
MENONLY                    0
                       ...  
PCTFLOAN              0.8284
UG25ABV               0.1049
MD_EARN_WNE_P10        30300
GRAD_DEBT_MDN_SUPP     33888
Name: Alabama A & M University, Length: 26, dtype: object

14.2.1.2 傳入一個整數索引列表,返回多行數據,類型爲(DataFrame)

college_data.iloc[[1,3,5,7,9]]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at Birmingham Birmingham AL 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
University of Alabama in Huntsville Huntsville AL 0.0 0.0 ... 0.4596 0.2640 45500 24097
The University of Alabama Tuscaloosa AL 0.0 0.0 ... 0.4010 0.0853 41900 23750
Athens State University Athens AL 0.0 0.0 ... 0.6296 0.6410 39000 18595
Auburn University Auburn AL 0.0 0.0 ... 0.3494 0.0415 45700 21831

5 rows × 26 columns

14.2.1.3 分片獲取

# 獲取索引爲 [1,10) 步長爲2 中的數據行,返回爲DataFrame
college_data.iloc[1:10:2]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at Birmingham Birmingham AL 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
University of Alabama in Huntsville Huntsville AL 0.0 0.0 ... 0.4596 0.2640 45500 24097
The University of Alabama Tuscaloosa AL 0.0 0.0 ... 0.4010 0.0853 41900 23750
Athens State University Athens AL 0.0 0.0 ... 0.6296 0.6410 39000 18595
Auburn University Auburn AL 0.0 0.0 ... 0.3494 0.0415 45700 21831

5 rows × 26 columns

14.2.2 loc用法

14.2.2.1 傳入一個標籤獲取一行

# 獲取標籤索引對應的數據行,返回類型爲Series
college_data.loc["University of Alabama at Birmingham"]
CITY                  Birmingham
STABBR                        AL
HBCU                           0
MENONLY                        0
                         ...    
PCTFLOAN                  0.5214
UG25ABV                   0.2422
MD_EARN_WNE_P10            39700
GRAD_DEBT_MDN_SUPP       21941.5
Name: University of Alabama at Birmingham, Length: 26, dtype: object

14.2.2.2 傳入一個標籤列表獲取多行

# 根據傳入的標籤列表返回相應的數據行,返回類型爲DataFrame
college_data.loc[["University of Alabama at Birmingham","The University of Alabama"]]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at Birmingham Birmingham AL 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
The University of Alabama Tuscaloosa AL 0.0 0.0 ... 0.4010 0.0853 41900 23750

2 rows × 26 columns

14.2.2.3 分片獲取

# 獲取[start_target,end_tartget] 步長爲 1 的數據行,返回爲DataFrame
college_data.loc["University of Alabama at Birmingham":"University of Alabama in Huntsville":1]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at Birmingham Birmingham AL 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
Amridge University Montgomery AL 0.0 0.0 ... 0.7795 0.8540 40100 23370
University of Alabama in Huntsville Huntsville AL 0.0 0.0 ... 0.4596 0.2640 45500 24097

3 rows × 26 columns

14.3 同時選取DataFrame的行和列

14.3.1 獲取前n行m列

14.3.1.1 用 iloc 方法實現

# 獲取前面兩行三列數據
college_data.iloc[:2,:3]
CITY STABBR HBCU
INSTNM
Alabama A & M University Normal AL 1.0
University of Alabama at Birmingham Birmingham AL 0.0

14.3.1.2 用 loc方法 實現

# 獲取行索引從[start_target,end_target]的行,和列索引爲[start,end]的列
college_data.loc[:"University of Alabama at Birmingham",:"HBCU"]
CITY STABBR HBCU
INSTNM
Alabama A & M University Normal AL 1.0
University of Alabama at Birmingham Birmingham AL 0.0

14.3.2 獲取全部行中的前n列

14.3.2.1 用 iloc 方法實現

college_data.iloc[:,:2]
CITY STABBR
INSTNM
Alabama A & M University Normal AL
University of Alabama at Birmingham Birmingham AL
Amridge University Montgomery AL
University of Alabama in Huntsville Huntsville AL
... ... ...
Rasmussen College - Overland Park Overland Park KS
National Personal Training Institute of Cleveland Highland Heights OH
Bay Area Medical Academy - San Jose Satellite Location San Jose CA
Excel Learning Center-San Antonio South San Antonio TX

7535 rows × 2 columns

14.3.2.2 用 loc 方法實現

college_data.loc[:,:"STABBR"]
CITY STABBR
INSTNM
Alabama A & M University Normal AL
University of Alabama at Birmingham Birmingham AL
Amridge University Montgomery AL
University of Alabama in Huntsville Huntsville AL
... ... ...
Rasmussen College - Overland Park Overland Park KS
National Personal Training Institute of Cleveland Highland Heights OH
Bay Area Medical Academy - San Jose Satellite Location San Jose CA
Excel Learning Center-San Antonio South San Antonio TX

7535 rows × 2 columns

14.3.3 選取不連續的行和列

14.3.3.1 用 iloc 方法實現

college_data.iloc[[1,3,5,7],[2,4,6,8]]
HBCU WOMENONLY SATVRMID DISTANCEONLY
INSTNM
University of Alabama at Birmingham 0.0 0.0 570.0 0.0
University of Alabama in Huntsville 0.0 0.0 595.0 0.0
The University of Alabama 0.0 0.0 555.0 0.0
Athens State University 0.0 0.0 NaN 0.0

14.3.3.2 用 loc 方法實現

# 這裏實現的需求同上
college_data.loc[["University of Alabama at Birmingham","University of Alabama in Huntsville","The University of Alabama","Athens State University"],
                ["HBCU","WOMENONLY","SATVRMID","DISTANCEONLY"]]
HBCU WOMENONLY SATVRMID DISTANCEONLY
INSTNM
University of Alabama at Birmingham 0.0 0.0 570.0 0.0
University of Alabama in Huntsville 0.0 0.0 595.0 0.0
The University of Alabama 0.0 0.0 555.0 0.0
Athens State University 0.0 0.0 NaN 0.0

14.3.4 選取某一個標量的值

14.3.4.1 用 iloc方法實現

# 選取第四行四列的值
college_data.iloc[3,3]
0.0

14.3.4.2 用 loc 方法實現

# 實現的需求同上
college_data.loc["Athens State University","MENONLY"]
0.0

14.3.4.3 使用 iat 快速獲取標量

%timeit college_data.iloc[1000,3]
# 可以看到使用iat方法,時間上大概節約了一半
%timeit college_data.iat[1000,3]
7.95 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.74 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

14.3.4.4 使用at快速獲取標量

%timeit college_data.loc["Rasmussen College - Overland Park","CITY"]
# 同樣發現使用at方法比loc的速度也快
%timeit college_data.at["Rasmussen College - Overland Park","CITY"]
6.41 µs ± 58.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.16 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

14.4 補充

14.4.1 惰性切片

# 同樣試用於Series
college_data[2:10:2]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
Amridge University Montgomery AL 0.0 0.0 ... 0.7795 0.8540 40100 23370
Alabama State University Montgomery AL 1.0 0.0 ... 0.7554 0.1270 26600 33118.5
Central Alabama Community College Alexander City AL 0.0 0.0 ... 0.3977 0.3153 27500 16127
Auburn University at Montgomery Montgomery AL 0.0 0.0 ... 0.5803 0.2930 35000 21335

4 rows × 26 columns

# 利用標籤索引獲取
college_data[:"Central Alabama Community College"]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
Alabama A & M University Normal AL 1.0 0.0 ... 0.8284 0.1049 30300 33888
University of Alabama at Birmingham Birmingham AL 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
Amridge University Montgomery AL 0.0 0.0 ... 0.7795 0.8540 40100 23370
University of Alabama in Huntsville Huntsville AL 0.0 0.0 ... 0.4596 0.2640 45500 24097
Alabama State University Montgomery AL 1.0 0.0 ... 0.7554 0.1270 26600 33118.5
The University of Alabama Tuscaloosa AL 0.0 0.0 ... 0.4010 0.0853 41900 23750
Central Alabama Community College Alexander City AL 0.0 0.0 ... 0.3977 0.3153 27500 16127

7 rows × 26 columns

14.4.2 按照字母分片

# 按照字母分片必須先對標籤索引進行排序
college_data.sort_index(ascending=True)["A":"E"]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
A & W Healthcare Educators New Orleans LA 0.0 0.0 ... 0.8596 0.6667 NaN 19022.5
A T Still University of Health Sciences Kirksville MO 0.0 0.0 ... NaN NaN 219800 PrivacySuppressed
ABC Beauty Academy Garland TX 0.0 0.0 ... 0.0000 0.8286 NaN PrivacySuppressed
ABC Beauty College Inc Arkadelphia AR 0.0 0.0 ... 1.0000 0.4688 PrivacySuppressed 16500
... ... ... ... ... ... ... ... ... ...
Durham Technical Community College Durham NC 0.0 0.0 ... 0.1796 0.5961 27200 11069.5
Dutchess BOCES-Practical Nursing Program Poughkeepsie NY 0.0 0.0 ... 0.6275 0.5430 36500 9500
Dutchess Community College Poughkeepsie NY 0.0 0.0 ... 0.1936 0.1806 32500 10250
Dyersburg State Community College Dyersburg TN 0.0 0.0 ... 0.2493 0.3097 26800 7475

1900 rows × 26 columns

# 當然還可以反向獲取
college_data.sort_index()["E":"F"]
CITY STABBR HBCU MENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
E Q School of Hair Design Council Bluffs IA 0.0 0.0 ... 0.6737 0.1471 18100 7830
ECPI University Virginia Beach VA 0.0 0.0 ... 0.5001 0.6633 37000 20000
ECPI University-Charleston North Charleston SC NaN NaN ... NaN NaN NaN 20000
ECPI University-Charlotte Charlotte NC NaN NaN ... NaN NaN NaN 20000
... ... ... ... ... ... ... ... ... ...
Excelsior College Albany NY 0.0 0.0 ... 0.0800 0.9337 PrivacySuppressed 11010
Expertise Cosmetology Institute Las Vegas NV 0.0 0.0 ... 1.0000 0.4828 PrivacySuppressed 8450
Exposito School of Hair Design Amarillo TX 0.0 0.0 ... 0.6267 0.3966 15100 PrivacySuppressed
Expression College for Digital Arts Emeryville CA 0.0 0.0 ... 0.7736 0.3955 PrivacySuppressed 35662

381 rows × 26 columns

14.4.3 更換索引

college_data.set_index("CITY")
STABBR HBCU MENONLY WOMENONLY ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
CITY
Normal AL 1.0 0.0 0.0 ... 0.8284 0.1049 30300 33888
Birmingham AL 0.0 0.0 0.0 ... 0.5214 0.2422 39700 21941.5
Montgomery AL 0.0 0.0 0.0 ... 0.7795 0.8540 40100 23370
Huntsville AL 0.0 0.0 0.0 ... 0.4596 0.2640 45500 24097
... ... ... ... ... ... ... ... ... ...
Overland Park KS NaN NaN NaN ... NaN NaN NaN 21163
Highland Heights OH NaN NaN NaN ... NaN NaN NaN 6333
San Jose CA NaN NaN NaN ... NaN NaN NaN PrivacySuppressed
San Antonio TX NaN NaN NaN ... NaN NaN NaN 12125

7535 rows × 25 columns

14.4.4 復原索引

college_data.reset_index()
INSTNM CITY STABBR HBCU ... PCTFLOAN UG25ABV MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
0 Alabama A & M University Normal AL 1.0 ... 0.8284 0.1049 30300 33888
1 University of Alabama at Birmingham Birmingham AL 0.0 ... 0.5214 0.2422 39700 21941.5
2 Amridge University Montgomery AL 0.0 ... 0.7795 0.8540 40100 23370
3 University of Alabama in Huntsville Huntsville AL 0.0 ... 0.4596 0.2640 45500 24097
... ... ... ... ... ... ... ... ... ...
7531 Rasmussen College - Overland Park Overland Park KS NaN ... NaN NaN NaN 21163
7532 National Personal Training Institute of Cleveland Highland Heights OH NaN ... NaN NaN NaN 6333
7533 Bay Area Medical Academy - San Jose Satellite ... San Jose CA NaN ... NaN NaN NaN PrivacySuppressed
7534 Excel Learning Center-San Antonio South San Antonio TX NaN ... NaN NaN NaN 12125

7535 rows × 27 columns

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章