機器學習NumPy工具pandas

當你拿到了很多數據後(它可以是圖片、音樂、文字、安全的IP地址、日誌、視頻、地圖、地震、醫療方面病症的數據以及一些其他形式存在的數據。),pandas提供很多接口,統一的將其轉化爲我object。然後你再對object進行操作,比如裏面的表格轉化成矩陣或其他形式,接下來進行更復雜的操作。
在這裏插入圖片描述
導入庫:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Creating a Series by passing a list of values ,letting pandas create a default integer index:

s = pd Series([1,3,4,np,nan,6,8])

s

0 1.0

1 3.0

2 5.0

3 NaN

4 6.0

5 8.0

dtype: float64

Creating a DataFrame by passing a numpy array ,with a datetime index and labeled columns:

dates = pd.date.range(‘20130101,periods=6’)

print(“datas:\n{}”.format(dates))

df = pd.DataFrame(np.random.randn(6,4),index=dates, columns=list(‘ABCD’)

df

datas:

DatetimeIndex([‘2013-01-01’,‘2013-01-02’,‘2013-01-03’‘2013-01-04’,‘2013-01-05’,‘2013-01-06’]

                      dtype=‘datetime64[ns]’,freq=‘D’)

在這裏插入圖片描述

然後你就可以設置一些參數來顯示其他的東西。

df2 = pd.DataFrame({‘A’: 1,

                                  ‘B’ : pd.Timestamp(‘20130102’),

                                  ‘C’: pd.Series(1,index=list(range(4)),dtype=‘float32’),

                                  ‘D’:np.array([3]*4,dtype=‘int32’)

                                  ‘E’ :pd.Categorical(“test”,"train”,”test”,”train”]),

                                  ‘F’ :’foo’ })

df2

通過標籤來進行數據提取

Operating with objects that have different dimensionality and need alignment.In addition, pandas automatically broadcasts along the specified dimension.

s = pd.Series([1,3,5,np,nan,6,8], index=dates),shift(2)

s

2013-01-01 NaN

2013-01-02 NaN

2013-01-03 1.0

2013-01-04 3.0

2013-01-05 5.0

2013-01-06 NaN

Freq : D,dtype: float64

df.sub(s, axis=‘index’)

在這裏插入圖片描述
在這裏插入圖片描述
Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

df

在這裏插入圖片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

df

在這裏插入圖片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

在這裏插入圖片描述

Convert the raw grades to a categorical data type.

df[“grade”] = df[“raw_grade”].astype(“category”)

df[“grade”]

Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace)

df[“grade”].cat.categories = [“very good”, “good”,”very bad”]

Reorder the categories and simultaneously add the missing categories (methods under Series.cat return a new Series per default ).

df[“grade”] = df[“grade”.cat.set_categories([“very good”, “bad”,”medium”,“good”,”very bad”])

df[“grade”]

Grouping by a categorical column shows also empty categories.

df.groupby(“grade”).size()

grade

very bad 1

bad 0

medium 0

good 2

very good 3

dtype: int64

Plotting

Plotting docs

ts = pd.Series(np.random.randn(1000), index=pd.date_range(‘1/1/2000’, perios = 1000))

ts = ts.cumsum()

plt.figure(); ts.plot()

plt.show()

On DataFrame, plot() is a convenience to plot all of the columns with labels:

df = pd.DataFrame(np.random.randn(1000, 4),index=ts.index,

                             columns =[‘A’,’B’,’C’,’D’]) 

df = df.cumsum()

plt.figure():df.plot() : plt.legend(loc=‘best’)

plt.show()​​​​

發佈了88 篇原創文章 · 獲贊 59 · 訪問量 9萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章