06_pandas入門教程，引包，使用DataFrame和Series做一些事情

原創

2020-06-20 20:02

引包

import pandas as pd

pandas數據表表示方法

我想表示泰坦尼克號上的乘客的乘客。有很多乘客，我知道的有名字，年齡，和性別數據：

import pandas as pd

df = pd.DataFrame({
    "Name":["Braund,Mr.Owen Harris",
            "Allen,Mr.William Henry",
            "Bonnell,Miss.Elizabeth"],
    "Age":[22,35,58],
    "Sex":["male","male","female"]})

print(df)

輸出結果：

在這裏插入代碼片                     Name  Age     Sex
0   Braund,Mr.Owen Harris   22    male
1  Allen,Mr.William Henry   35    male
2  Bonnell,Miss.Elizabeth   58  female

一個DataFrame是一個2維數據結構，這種結構可以存儲不同類型（包括：字符、整型、浮點型、categorical數據等等），它像電子表格，一個SQL表。

在電子表格的軟件中，非常像我們的數據。

在DataFrame中的每一列都是一個Series
獲取df中的Age的列：

import pandas as pd

df = pd.DataFrame({
    "Name":["Braund,Mr.Owen Harris",
            "Allen,Mr.William Henry",
            "Bonnell,Miss.Elizabeth"],
    "Age":[22,35,58],
    "Sex":["male","male","female"]})

print(df)
print("---------------------------")
print(df["Age"])

輸出結果爲：

                     Name  Age     Sex
0   Braund,Mr.Owen Harris   22    male
1  Allen,Mr.William Henry   35    male
2  Bonnell,Miss.Elizabeth   58  female
---------------------------
0    22
1    35
2    58
Name: Age, dtype: int64

當選擇pandas中的DataFrame中的一列的時候，返回的結果是一個pandas的Series,要想選擇這個列，可以在[]中使用這列的“列名+雙引號”的方式。

注意：如果你熟悉Python中的dictionaries類型，選中的單列非常相似於它+key中。

import pandas as pd

ages = pd.Series([22,35,58],name="Age")
print(ages)

輸出結果爲：

0    22
1    35
2    58
Name: Age, dtype: int64

pandas的Series沒有列的label，因爲它只是一個DataFrame中的一列。一個Series也沒有行的標籤。

使用DataFrame和Series做一些事情

我想知道最大年齡的乘客
我們可以使用DataFrame，選擇一列，並使用max()函數

import pandas as pd

df = pd.DataFrame({
    "Name":["Braund,Mr.Owen Harris",
            "Allen,Mr.William Henry",
            "Bonnell,Miss.Elizabeth"],
    "Age":[22,35,58],
    "Sex":["male","male","female"]})

print(df)
print("---------------------------")
print(df["Age"].max())

輸出結果爲：

                     Name  Age     Sex
0   Braund,Mr.Owen Harris   22    male
1  Allen,Mr.William Henry   35    male
2  Bonnell,Miss.Elizabeth   58  female
---------------------------
58

使用describe獲取DataFrame的總數量，標準差，平均值，最小值，最大值。

import pandas as pd

df = pd.DataFrame({
    "Name":["Braund,Mr.Owen Harris",
            "Allen,Mr.William Henry",
            "Bonnell,Miss.Elizabeth"],
    "Age":[22,35,58],
    "Sex":["male","male","female"]})

print(df)
print("---------------------------")
print(df.describe())

輸出結果：

                     Name  Age     Sex
0   Braund,Mr.Owen Harris   22    male
1  Allen,Mr.William Henry   35    male
2  Bonnell,Miss.Elizabeth   58  female
---------------------------
             Age
count   3.000000
mean   38.333333
std    18.230012
min    22.000000
25%    28.500000
50%    35.000000
75%    46.500000
max    58.000000

再如：

import pandas as pd

titanic = pd.read_csv("foo.csv");
print(titanic)

print("-------------讀取前8條數據----------------------")
print(titanic.head(8))

print("--------------讀取後10條數據--------------------")
print(titanic.tail(10))

print("--------------顯示每列的數據類型----------------")
print(titanic.dtypes)

print("--------------獲取其中的幾列數據----------------")
print(titanic[['A','B']])

print("--------------使用shape得到DF的維度信息--------")
print(titanic[["A","B"]].shape)

輸出結果：

     Unnamed: 0         A         B         C         D
0    2013-01-01 -0.028544 -2.597953 -0.645116  0.403488
1    2013-01-02 -0.109636 -0.866292 -0.629185  1.072633
2    2013-01-03 -1.435202 -0.631815  1.208114 -1.647566
3    2013-01-04  0.368530 -1.073754 -0.712305 -0.513142
4    2013-01-05  0.813674 -0.081024 -1.153747  0.409363
..          ...       ...       ...       ...       ...
995  2015-09-23  0.783858 -0.330685  0.323741  1.767446
996  2015-09-24  0.017313 -1.792078  0.686136  0.122491
997  2015-09-25  0.100742 -1.802797  0.370563  1.297355
998  2015-09-26 -0.279896 -0.439861 -0.595908 -0.663100
999  2015-09-27 -0.519504 -1.476432 -0.877358  0.370039
[1000 rows x 5 columns]
-------------讀取前8條數據----------------------
   Unnamed: 0         A         B         C         D
0  2013-01-01 -0.028544 -2.597953 -0.645116  0.403488
1  2013-01-02 -0.109636 -0.866292 -0.629185  1.072633
2  2013-01-03 -1.435202 -0.631815  1.208114 -1.647566
3  2013-01-04  0.368530 -1.073754 -0.712305 -0.513142
4  2013-01-05  0.813674 -0.081024 -1.153747  0.409363
5  2013-01-06  0.218026  0.284554  0.850285 -0.025403
6  2013-01-07  0.024721  0.068005  0.008694 -1.196519
7  2013-01-08  0.640028  1.153457  1.212075  1.371237
--------------讀取後10條數據--------------------
     Unnamed: 0         A         B         C         D
990  2015-09-18 -1.009017 -0.478119  1.206474  0.788974
991  2015-09-19 -0.493941  0.243774 -0.766409 -0.661123
992  2015-09-20 -0.698089 -0.081359 -1.046874  0.125604
993  2015-09-21  0.249417  0.696110  0.961332  0.647499
994  2015-09-22 -0.410115 -0.554049  1.157751  1.499304
995  2015-09-23  0.783858 -0.330685  0.323741  1.767446
996  2015-09-24  0.017313 -1.792078  0.686136  0.122491
997  2015-09-25  0.100742 -1.802797  0.370563  1.297355
998  2015-09-26 -0.279896 -0.439861 -0.595908 -0.663100
999  2015-09-27 -0.519504 -1.476432 -0.877358  0.370039
--------------顯示每列的數據類型----------------
Unnamed: 0     object
A             float64
B             float64
C             float64
D             float64
dtype: object
--------------獲取其中的幾列數據----------------
            A         B
0   -0.028544 -2.597953
1   -0.109636 -0.866292
2   -1.435202 -0.631815
3    0.368530 -1.073754
4    0.813674 -0.081024
..        ...       ...
995  0.783858 -0.330685
996  0.017313 -1.792078
997  0.100742 -1.802797
998 -0.279896 -0.439861
999 -0.519504 -1.476432
[1000 rows x 2 columns]
--------------使用shape得到DF的維度信息--------
(1000, 2)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

06_pandas入門教程，引包，使用DataFrame和Series做一些事情

引包

pandas數據表表示方法

使用DataFrame和Series做一些事情

06_特徵選擇，特徵選擇的原因，sklearn特徵選擇API

01_sklearn--監督學習——廣義線性模型，普通最小二乘法

機器學習數據資料下載地址

03_數據的特徵抽取，sklearn特徵抽取API，字典特徵抽取DictVectorizer,文本特徵抽取CountVectorizer，TF-IDF(TfidfVectorizer),詳細案例

06_基本的圖像分類案例、導入圖片數據、探索數據的格式、數據預處理、構建模型（設置層、編譯模型）、訓練模型（Fit模型、評估精確度）、得出預測結果（驗證預測結果）、使用訓練過的模型

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結