Pandas 的使用

author by xiaoyao

Numpy提供了方便的數組處理功能，但其缺少的是：數據處理、分析所需要的快速工具。pandas基於Numpy開發，提供了很多的高級數據處理功能。

import pandas as pd
import numpy as np
# pd.set_option("display.show_dimensions", False)
# pd.set_option("display.float_format", "{:4.2g}".format)

Pandas-方便的數據分析庫

import pandas as pd
pd.__version__

'0.25.3'

import pandas as pd
pd.__version__

'0.25.3'

Pandas中的數據對象

`Series`對象

s = pd.Series([1, 2, 3, 4, 5], index=["a", "b", "c", "d", "e"])
print ("索引:", s.index)
print ("值數組:", s.values)

索引: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
值數組: [1 2 3 4 5]

Series爲Pandas中最基本的對象，定義了Numpy的ndarray對象的接口__array__()，因此可以調用Numpy的數組處理函數直接對Series對象進行處理。Series支持使用下標存取元素，也支持使用索引存取元素。

a = pd.Series([2,3,4,5,6],['a','b','c','e','f'])
print(a)

a    2
b    3
c    4
e    5
f    6
dtype: int64

print(s)
print('長度爲:',len(s))

a    1
b    2
c    3
d    4
e    5
dtype: int64
長度爲: 5

print (u"位置下標   s[2]:", s[2])
print (u"標籤下標 s['d']:", s['d'])

位置下標   s[2]: 3
標籤下標 s['d']: 4

# 注意這裏使用的式冒號隔開，而不是逗號
# 使用下標進行存取，截取操作不包括最後的值，使用index進行存取，是包括的。。。注意區別
print(s[1:3])
print(s['b':'d'])

b    2
c    3
dtype: int64
b    2
c    3
d    4
dtype: int64

# %c 5 s[1:3]; s['b':'d']

UsageError: Line magic function `%c` not found.

print(s[1:3],s['b':'d'])

b    2
c    3
dtype: int64 b    2
c    3
d    4
dtype: int64

# 把要查詢的元素下標或者index作爲參數傳入，這裏是以列表形式傳入的
print(s[[1,3,2]],s[['b','d','c']])

b    2
d    4
c    3
dtype: int64 b    2
d    4
c    3
dtype: int64

# Returns an iterator over the dictionary’s (key, value) pairs.就是元素爲鍵值對的列表
list(s.iteritems())

[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]

# 直接進行print，顯示爲zip對象
print(a.iteritems())
a.iteritems()
# 使用list進行格式轉換
list(a.iteritems())

<zip object at 0x00000263B36743C8>





[('a', 2), ('b', 3), ('c', 4), ('e', 5), ('f', 6)]

# s2 = pd.Series([20,30,40,50,60], index=["b","c","d","e","f"])
# %C 5 s; s2; s+s2

     s                s2               s+s2     
------------     ------------     --------------
a    1           b    20          a    nan      
b    2           c    30          b     22      
c    3           d    40          c     33      
d    4           e    50          d     44      
e    5           f    60          e     55      
dtype: int64     dtype: int64     f    nan      
                                  dtype: float64

s2 = pd.Series([20,30,30,40,50], index=['a','b','c','d','e'])
print(list(s2.iteritems()))
print('*'*50)# 此處打印分割線
print('s2-Serirs:{}.'.format(s2))
print('*'*50)# 此處打印分割線
print('s2+s-Series',s2+s)
print('*'*50)# 此處打印分割線
print('s-Series,s2_series,s2+s-Series{}{}{}.'.format(s,s2,s2+s))

[('a', 20), ('b', 30), ('c', 30), ('d', 40), ('e', 50)]
**************************************************
s2-Serirs:a    20
b    30
c    30
d    40
e    50
dtype: int64.
**************************************************
s2+s-Series a    21
b    32
c    33
d    44
e    55
dtype: int64
**************************************************
s-Series,s2_series,s2+s-Seriesa    1
b    2
c    3
d    4
e    5
dtype: int64a    20
b    30
c    30
d    40
e    50
dtype: int64a    21
b    32
c    33
d    44
e    55
dtype: int64.

`DataFrame`對象

`DataFrame`的各個組成元素

DataFrame對象(數據表)是pandas中最常用的數據對象。

%pwd  # 用於查看當前工作目錄

# int, str, sequence of int / str, or False, default ``None``
# Column(s) to use as the row labels of the ``DataFrame``, either given as
# string name or column index. If a sequence of int / str is given, a MultiIndex is used.
df_soil = pd.read_csv("./data/Soils-simple.csv", index_col=[0, 1], parse_dates=["Date"])
df_soil.columns.name = "Measures"

df_soil.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (0-10, Depression) to (10-30, Top)
Data columns (total 6 columns):
pH        6 non-null float64
Dens      6 non-null float64
Ca        6 non-null float64
Conduc    6 non-null float64
Date      6 non-null datetime64[ns]
Name      6 non-null object
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 450.0+ bytes

print(type(df_soil)) # 顯示爲DataFame類型

<class 'pandas.core.frame.DataFrame'>

print(df_soil.dtypes)

Measures
pH               float64
Dens             float64
Ca               float64
Conduc           float64
Date      datetime64[ns]
Name              object
dtype: object

print(df_soil.shape)

(6, 6)

# DataFrame對象擁有行索引和列索引，可以通過索引標籤對其進行存取

index屬性保存行索引，columns屬性保存列索引

# 列索引
print (df_soil.columns)
print (df_soil.columns.name)

Index(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype='object', name='Measures')
Measures

print(df_soil)

Measures              pH    Dens       Ca  Conduc       Date   Name
Depth Contour                                                      
0-10  Depression  5.3525  0.9775  10.6850  1.4725 2015-05-26   Lois
      Slope       5.5075  1.0500  12.2475  2.0500 2015-04-30    Roy
      Top         5.3325  1.0025  13.3850  1.3725 2015-05-21    Roy
10-30 Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
      Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
      Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana

# 行索引
print (df_soil.index)
print (df_soil.index.names)

MultiIndex([( '0-10', 'Depression'),
            ( '0-10',      'Slope'),
            ( '0-10',        'Top'),
            ('10-30', 'Depression'),
            ('10-30',      'Slope'),
            ('10-30',        'Top')],
           names=['Depth', 'Contour'])
['Depth', 'Contour']

print(df_soil["pH"],"\n",df_soil[["Dens", "Ca"]])

Depth  Contour   
0-10   Depression    5.3525
       Slope         5.5075
       Top           5.3325
10-30  Depression    4.8800
       Slope         5.2825
       Top           4.8500
Name: pH, dtype: float64 
 Measures            Dens       Ca
Depth Contour                    
0-10  Depression  0.9775  10.6850
      Slope       1.0500  12.2475
      Top         1.0025  13.3850
10-30 Depression  1.3575   7.5475
      Slope       1.3475   9.5150
      Top         1.3325  10.2375

# 與二維數組類似，DataFame對象也具有兩個軸，他的第0軸爲縱軸，第1軸爲橫軸。當某個方法或者函數具有axis,orient參數的時候

# 該參數可以使用整數0和1，或者"index"和“columns”來表示縱軸和橫軸方向。

# loc可以通過行索引標籤獲得指定的行，當結果爲一行的時候，結果爲Series對象，當結果爲多行，結果爲DataFame對象
print('{}{}.'.format(df_soil.loc["0-10", "Top"],df_soil.loc["10-30"]))
# 或者
# print(df_soil.loc["0-10","Top"],df_soil.loc["10-30"])
print('*'*50)
print(type(df_soil.loc["0-10","Top"]))
print(type(df_soil.loc["10-30"]))

Measures
pH                     5.3325
Dens                   1.0025
Ca                     13.385
Conduc                 1.3725
Date      2015-05-21 00:00:00
Name                      Roy
Name: (0-10, Top), dtype: objectMeasures        pH    Dens       Ca  Conduc       Date   Name
Contour                                                      
Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana.
**************************************************
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

print(df_soil.loc["0-10","Top"])

Measures
pH                     5.3325
Dens                   1.0025
Ca                     13.385
Conduc                 1.3725
Date      2015-05-21 00:00:00
Name                      Roy
Name: (0-10, Top), dtype: object

# values將DataFame對象轉化爲數組。由於這裏的數據類型不統一，故而得到的是一個元素類型爲object的數組。
df_soil.values.dtype

dtype('O')

將內存中的數據轉換爲`DataFrame`對象

調用DataFrame()可以將多種格式的數據轉換成DataFame對象，有三個參數：數據，index，columns,其中，data 可以是：二維數組或者可以轉換爲二維數組的嵌套列表。可以是字典，

# 1首先產生一個形狀爲(4,2)的二維數組，其中取值的範圍是：0到9.通過DataFrame對象，指定好index和columns參數指定行和列的索引。

# 2將字典轉換爲DataFrame對象

# 3將結構數組轉換爲DataFrame對象

df1 = pd.DataFrame(np.random.randint(0, 10, (4, 2)), #❶
                   index=["A", "B", "C", "D"], 
                   columns=["a", "b"])

df2 = pd.DataFrame({"a":[1, 2, 3, 4], "b":[5, 6, 7, 8]},  #❷
                   index=["A", "B", "C", "D"])

arr = np.array([("item1", 1), ("item2", 2), ("item3", 3), ("item4", 4)], 
               dtype=[("name", "10S"), ("count", int)])

df3 = pd.DataFrame(arr) #❸

print("df1的類型:{}.".format(type(df1)))
print("df2的類型:{}.".format(type(df2)))
print("df3的類型:{}.".format(type(df3)))

print("*"*50)
print(df1)
print("*"*50)
print(df2)
print("*"*50)
print(df3)

df1的類型:<class 'pandas.core.frame.DataFrame'>.
df2的類型:<class 'pandas.core.frame.DataFrame'>.
df3的類型:<class 'pandas.core.frame.DataFrame'>.
**************************************************
   a  b
A  8  1
B  3  5
C  3  1
D  9  8
**************************************************
   a  b
A  1  5
B  2  6
C  3  7
D  4  8
**************************************************
       name  count
0  b'item1'      1
1  b'item2'      2
2  b'item3'      3
3  b'item4'      4

# 也可以調用from_開頭的方法，將特定格式的數據轉爲DataFrame對象。from_dict（）將四點轉換爲DataFrame對象，其中的orient參數可以指定字典鍵值

# 對應的方向，默認值爲columns,意思就是將字典的鍵轉換爲列索引，即：字典中的每個值與一列對應。對應的如果orient爲index，就是字典的每個值與

# 一行對應。

dict1 = {"a":[1, 2, 3], "b":[4, 5, 6]}
dict2 = {"a":{"A":1, "B":2}, "b":{"A":3, "C":4}} # 嵌套字典
df1 = pd.DataFrame.from_dict(dict1, orient="index")
df2 = pd.DataFrame.from_dict(dict1, orient="columns")

df3 = pd.DataFrame.from_dict(dict2, orient="index")# 嵌套字典中的缺失數據使用NaN表示，依然遵循：字典中的每個值對應一行
df4 = pd.DataFrame.from_dict(dict2, orient="columns")

# %C 6 df1; df2; df3; df4

print(df1)
print("*"*50)
print(df2)
print("*"*50)
print(df3)
print("*"*50)
print(df4)

   0  1  2
a  1  2  3
b  4  5  6
**************************************************
   a  b
0  1  4
1  2  5
2  3  6
**************************************************
   A    B    C
a  1  2.0  NaN
b  3  NaN  4.0
**************************************************
     a    b
A  1.0  3.0
B  2.0  NaN
C  NaN  4.0

# from_items()將（鍵,值）序列轉換爲DataFrame對象，其中的值：可以是一維數據的列表，數組或者Series對象。當其中的orient參數爲：index的時候，

# 需要通過columns指定列索引。

注意，這裏使用的python3，直接使用from_items()會產生如下的警告提示。

如下，最好使用from_dict(dict(items),…)

D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), …) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
This is separate from the ipykernel package so we can avoid doing imports until
D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:4: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), …) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
after removing the cwd from sys.path.

# dict1 = {"a":[1, 2, 3], "b":[4, 5, 6]}
items = dict1.items()
df1 = pd.DataFrame.from_dict(dict(items), orient="index", columns=["A", "B", "C"])
df2 = pd.DataFrame.from_dict(dict(items), orient="columns")

print(df1)
print("*"*50)
print(df2)

   A  B  C
a  1  2  3
b  4  5  6
**************************************************
   a  b
0  1  4
1  2  5
2  3  6

將`DataFrame`對象轉換爲其它格式的數據

to_dict()方法，將DataFrame對象轉化爲字典，其orient參數決定字典元素的類型：

print ("轉換爲字典列表之後的df2:",df2.to_dict(orient="records")) #字典列表,orient參數爲：records,,或者稱之爲：結構數組
print("*"*50)
print ("轉換爲列表字典之後的df2:",df2.to_dict(orient="list")) #列表字典,orient參數爲：list
print("*"*50)
print ("轉換爲嵌套字典之後的df2:",df2.to_dict(orient="dict")) #嵌套字典,orient參數爲：dict

轉換爲字典列表之後的df2: [{'a': 1, 'b': 4}, {'a': 2, 'b': 5}, {'a': 3, 'b': 6}]
**************************************************
轉換爲列表字典之後的df2: {'a': [1, 2, 3], 'b': [4, 5, 6]}
**************************************************
轉換爲嵌套字典之後的df2: {'a': {0: 1, 1: 2, 2: 3}, 'b': {0: 4, 1: 5, 2: 6}}

# to_records（）方法可以將DataFrame對象轉化爲結構數組,其中，如若index參數值爲True(默認值爲True)，則其返回的數組中包含行索引數據。

print (df2.to_records().dtype)
print (df2.to_records(index=False).dtype)

(numpy.record, [('index', '<i8'), ('a', '<i8'), ('b', '<i8')])
(numpy.record, [('a', '<i8'), ('b', '<i8')])

print(df2.to_records())

[(0, 1, 4) (1, 2, 5) (2, 3, 6)]

`Index`對象 Index對象是隻讀的.

Index對象用來保存索引標籤數據，它可以快速的找到標籤對應的下標，其中values屬性可以實現獲得保存標籤的數組。

index = df_soil.columns # columns獲得數據df_soil的列索引
print(index.values)
index.values

['pH' 'Dens' 'Ca' 'Conduc' 'Date' 'Name']





array(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype=object)

# 爲了觀察方便，在這裏打印輸出數據df_soil
print(df_soil)

Measures              pH    Dens       Ca  Conduc       Date   Name
Depth Contour                                                      
0-10  Depression  5.3525  0.9775  10.6850  1.4725 2015-05-26   Lois
      Slope       5.5075  1.0500  12.2475  2.0500 2015-04-30    Roy
      Top         5.3325  1.0025  13.3850  1.3725 2015-05-21    Roy
10-30 Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
      Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
      Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana

print(index)

Index(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype='object', name='Measures')

print (index[[1, 3]]) # 注意，原始計數從零開始.   打印輸出，下標從1~3的index值，不包括下標3
print (index[index > 'c']) # 打印輸出，index值，首字母大於"c"的值，要知道大小寫字母的AscII碼相差：32，c-C=32
print (index[1::2]) # 從下標1開始，步長爲2，逐個打印輸出

Index(['Dens', 'Conduc'], dtype='object', name='Measures')
Index(['pH'], dtype='object', name='Measures')
Index(['Dens', 'Conduc', 'Name'], dtype='object', name='Measures')

Index對象也是具有字典的映射功能的，通過具體操作，可以實現將數組中的值映射到具體的位置。
– index.get_loc()獲得單個值的下標
– index.get_indexer()獲得一組值的下標

print(type(index))

<class 'pandas.core.indexes.base.Index'>

print (index.get_loc('Ca'))
print (index.get_indexer(['Dens', 'Conduc', 'nothing'])) # 當這裏的值不存在，就直接返回-1

2
[ 1  3 -1]

可以直接調用Index()來創建多個的index對象。然後可以將其傳遞給DataFrame()的index，或者columns參數。由於index對象是不可變的對象，因此多個數據對象的索引可以引用的是同一個index對象.

index = pd.Index(["A", "B", "C", "D", "E"], name="level")
s1 = pd.Series([1, 2, 3, 4, 5], index=index)
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 5], "b":[6, 7, 8, 9, 10]}, index=index)
print (s1.index is df1.index)

True

`MultiIndex`對象

MultiIndex表示多級索引，繼承自Index,其中的多級標籤採用元組對象來表示。依然可以通過;get_loc()和get_indexer()獲取單個或者多個的下標.

mindex = df_soil.index
print (mindex[1])
print (mindex.get_loc(("0-10", "Slope")))
print (mindex.get_indexer([("10-30", "Top"), ("0-10", "Depression"), "nothing"]))

('0-10', 'Slope')
1
[ 5  0 -1]

print(mindex) # 可以看到這裏的mindex是一個多級索引

MultiIndex([( '0-10', 'Depression'),
            ( '0-10',      'Slope'),
            ( '0-10',        'Top'),
            ('10-30', 'Depression'),
            ('10-30',      'Slope'),
            ('10-30',        'Top')],
           names=['Depth', 'Contour'])

# 在多級索引內部，不直接保存元組對象，而是使用多個Index對象來保存索引中每一級的標籤

print (mindex.levels[0])
print (mindex.levels[1])

Index(['0-10', '10-30'], dtype='object', name='Depth')
Index(['Depression', 'Slope', 'Top'], dtype='object', name='Contour')

# 使用多個整數數組保存這些標籤的下標:

如果直接使用mindex.labelx[]將會產生如下的警報，建議使用：.codes

D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.
“”"Entry point for launching an IPython kernel.
D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.

print (mindex.codes[0])
print (mindex.codes[1])

[0 0 0 1 1 1]
[0 1 2 0 1 2]

level0, level1 = mindex.levels
label0, label1 = mindex.codes
zip(level0[label0], level1[label1])

<zip at 0x263b60a99c8>

# 當把一個元組列表傳遞給Index()的時候，將會自動創建MultiIndex對象。

mult = pd.Index([("A", "x"), ("A", "y"), ("B", "x"), ("B", "y")], name=["class1", "class2"])
mult

MultiIndex([('A', 'x'),
            ('A', 'y'),
            ('B', 'x'),
            ('B', 'y')],
           names=['class1', 'class2'])

print(mult[0])

('A', 'x')

print(mult.levels[0])

Index(['A', 'B'], dtype='object', name='class1')

mult.codes[0]

FrozenNDArray([0, 0, 1, 1], dtype='int8')

# 通過from_方法從特定的數據結構創建MultiIndex對象

class1 = ["A", "A", "B", "B"]
class2 = ["x", "y", "x", "y"]
pd.MultiIndex.from_arrays([class1, class2], names=["class1", "class2"])

MultiIndex([('A', 'x'),
            ('A', 'y'),
            ('B', 'x'),
            ('B', 'y')],
           names=['class1', 'class2'])

# from_procuct從多個集合的笛卡爾積創建MultiIndex對象。

midx = pd.MultiIndex.from_product([["A", "B", "C"], ["x", "y"]], 
                           names=["class1", "class2"])
df1 = pd.DataFrame(np.random.randint(0, 10, (6, 6)), columns=midx, index=midx)

df1

	class1	A		B		C
	class2	x	y	x	y	x	y
class1	class2
A	x	5	6	5	5	3	1
A	y	5	7	0	2	3	0
B	x	7	6	2	9	0	7
B	y	7	4	2	1	6	2
C	x	4	8	0	2	4	8
C	y	3	5	9	7	4	8

常用的函數參數

df_soil

	Measures	pH	Dens	Ca	Conduc	Date	Name
Depth	Contour
0-10	Depression	5.3525	0.9775	10.6850	1.4725	2015-05-26	Lois
	Slope	5.5075	1.0500	12.2475	2.0500	2015-04-30	Roy
	Top	5.3325	1.0025	13.3850	1.3725	2015-05-21	Roy
10-30	Depression	4.8800	1.3575	7.5475	5.4800	2015-03-21	Lois
	Slope	5.2825	1.3475	9.5150	4.9100	2015-02-06	Diana
	Top	4.8500	1.3325	10.2375	3.5825	2015-04-11	Diana

print(df_soil.mean())
print("*"*50)
print(df_soil.mean(axis=1)) # 指定運算對應的軸
print("*"*50)
df_soil.mean(level=1)  # 取值爲整數或者索引的級別名，用以指定：運算對應的級別

Measures
pH         5.200833
Dens       1.177917
Ca        10.602917
Conduc     3.144583
dtype: float64
**************************************************
Depth  Contour   
0-10   Depression    4.621875
       Slope         5.213750
       Top           5.273125
10-30  Depression    4.816250
       Slope         5.263750
       Top           5.000625
dtype: float64
**************************************************

Measures	pH	Dens	Ca	Conduc
Contour
Depression	5.11625	1.16750	9.11625	3.47625
Slope	5.39500	1.19875	10.88125	3.48000
Top	5.09125	1.16750	11.81125	2.47750

`DataFrame`的內部結構

DataFrame對象內部使用Numpy數組保存數據，所以也會出現和數組相同的共享數據存儲區的問題。

scpy是用於通過ssh自動將文件和目錄同步到遠程服務器的命令行工具。

#%fig=DataFrame對象的內部結構
# from scpy2.common import GraphvizDataFrame
# %dot GraphvizDataFrame.graphviz(df_soil)

type(df_soil)

pandas.core.frame.DataFrame

DataFrame 對象的columns屬性是index對象，而index屬性表示是多級索引的MultiIndex對象。

type(df_soil.index)

pandas.core.indexes.multi.MultiIndex

type(df_soil.columns)

pandas.core.indexes.base.Index

Index對象的所用功能由其_engine屬性-----這是一個ObjectEngine對象提供。該對象通過哈希表PyObjectHashTable對象將標籤映射到其對應的整數下標。

df_soil.columns._engine.mapping.get_item("Date")

獲取DataFrame對象的某一列，則其與原來的DataFame對象內存共享

s = df_soil["Dens"]
s.values.base is df_soil._data.blocks[0].values

True

當通過使用[]獲取多列，將複製所有的數據。故而保存新的DataFrame對象數據的數組的base屬性爲None.

print (df_soil[["Dens"]]._data.blocks[0].values.base)

None

如果DataFrame對象只有一個數據塊，則通過vaules屬性獲得的數組是數據塊中數組的轉置，故而他與DataFrame對象共享內存。

# df_float中所有元素的類型相同，只有一個數據塊。
df_float = df_soil[['pH', 'Dens', 'Ca', 'Conduc']]
df_float.values.base is df_float._data.blocks[0].values

True

# 當DataFrame對象只有一個數據塊時候，獲取他的行數據得到的Series對象也與其共享內存。
df_float.loc["0-10", "Top"].values.base is df_float._data.blocks[0].values

True

df_soil.values.dtype

dtype('O')

1數據分析庫pandas的使用

Pandas 的使用

Pandas-方便的數據分析庫

Pandas中的數據對象

`Series`對象

`DataFrame`對象

`DataFrame`的各個組成元素

index屬性保存行索引，columns屬性保存列索引

將內存中的數據轉換爲`DataFrame`對象

注意，這裏使用的python3，直接使用from_items()會產生如下的警告提示。

如下，最好使用from_dict(dict(items),…)

將`DataFrame`對象轉換爲其它格式的數據

`Index`對象 Index對象是隻讀的.

`MultiIndex`對象

如果直接使用mindex.labelx[]將會產生如下的警報，建議使用：.codes

常用的函數參數

`DataFrame`的內部結構

DataFrame 對象的columns屬性是index對象，而index屬性表示是多級索引的MultiIndex對象。

Index對象的所用功能由其_engine屬性-----這是一個ObjectEngine對象提供。該對象通過哈希表PyObjectHashTable對象將標籤映射到其對應的整數下標。

獲取DataFrame對象的某一列，則其與原來的DataFame對象內存共享

當通過使用[]獲取多列，將複製所有的數據。故而保存新的DataFrame對象數據的數組的base屬性爲None.

如果DataFrame對象只有一個數據塊，則通過vaules屬性獲得的數組是數據塊中數組的轉置，故而他與DataFrame對象共享內存。

tensorflow2簡潔實現softmax迴歸

tensorflow數據操作

2數據分析庫pandas的使用

SVR模型&python應用

特徵工程中常用的數據處理方式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

1數據分析庫pandas的使用

Pandas 的使用

Pandas-方便的數據分析庫

Pandas中的數據對象

Series對象

DataFrame對象

DataFrame的各個組成元素

index屬性保存行索引，columns屬性保存列索引

將內存中的數據轉換爲DataFrame對象

注意，這裏使用的python3，直接使用from_items()會產生如下的警告提示。

如下，最好使用from_dict(dict(items),…)

將DataFrame對象轉換爲其它格式的數據

Index對象 Index對象是隻讀的.

MultiIndex對象

如果直接使用mindex.labelx[]將會產生如下的警報，建議使用：.codes

常用的函數參數

DataFrame的內部結構

DataFrame 對象的columns屬性是index對象，而index屬性表示是多級索引的MultiIndex對象。

Index對象的所用功能由其_engine屬性-----這是一個ObjectEngine對象提供。該對象通過哈希表PyObjectHashTable對象將標籤映射到其對應的整數下標。

獲取DataFrame對象的某一列，則其與原來的DataFame對象內存共享

當通過使用[]獲取多列，將複製所有的數據。故而保存新的DataFrame對象數據的數組的base屬性爲None.

如果DataFrame對象只有一個數據塊，則通過vaules屬性獲得的數組是數據塊中數組的轉置，故而他與DataFrame對象共享內存。

`Series`對象

`DataFrame`對象

`DataFrame`的各個組成元素

將內存中的數據轉換爲`DataFrame`對象

將`DataFrame`對象轉換爲其它格式的數據

`Index`對象 Index對象是隻讀的.

`MultiIndex`對象

`DataFrame`的內部結構