當你拿到了很多數據後(它可以是圖片、音樂、文字、安全的IP地址、日誌、視頻、地圖、地震、醫療方面病症的數據以及一些其他形式存在的數據。),pandas提供很多接口,統一的將其轉化爲我object。然後你再對object進行操作,比如裏面的表格轉化成矩陣或其他形式,接下來進行更復雜的操作。
導入庫:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Creating a Series by passing a list of values ,letting pandas create a default integer index:
s = pd Series([1,3,4,np,nan,6,8])
s
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
Creating a DataFrame by passing a numpy array ,with a datetime index and labeled columns:
dates = pd.date.range(‘20130101,periods=6’)
print(“datas:\n{}”.format(dates))
df = pd.DataFrame(np.random.randn(6,4),index=dates, columns=list(‘ABCD’)
df
datas:
DatetimeIndex([‘2013-01-01’,‘2013-01-02’,‘2013-01-03’‘2013-01-04’,‘2013-01-05’,‘2013-01-06’]
dtype=‘datetime64[ns]’,freq=‘D’)
然後你就可以設置一些參數來顯示其他的東西。
df2 = pd.DataFrame({‘A’: 1,
‘B’ : pd.Timestamp(‘20130102’),
‘C’: pd.Series(1,index=list(range(4)),dtype=‘float32’),
‘D’:np.array([3]*4,dtype=‘int32’)
‘E’ :pd.Categorical(“test”,"train”,”test”,”train”]),
‘F’ :’foo’ })
df2
通過標籤來進行數據提取
Operating with objects that have different dimensionality and need alignment.In addition, pandas automatically broadcasts along the specified dimension.
s = pd.Series([1,3,5,np,nan,6,8], index=dates),shift(2)
s
2013-01-01 NaN
2013-01-02 NaN
2013-01-03 1.0
2013-01-04 3.0
2013-01-05 5.0
2013-01-06 NaN
Freq : D,dtype: float64
df.sub(s, axis=‘index’)
Grouping
By “group by” we are referring to a process involving >
plitting the data into groups based >
Applying a function to each group independently
Comblining the result into a data structure
df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,
‘foo’, ‘bar’, ‘foo’, ‘foo’],
‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,
‘two’, ‘two’, ‘one’, ‘three’],
‘C’ : np.random.randn(8),
‘D’ : np.random.randn(8),
df
Categoricals
Since version 0.15, pandas can include categorical data in a DataFrame.
df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])
Grouping
By “group by” we are referring to a process involving >
plitting the data into groups based >
Applying a function to each group independently
Comblining the result into a data structure
df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,
‘foo’, ‘bar’, ‘foo’, ‘foo’],
‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,
‘two’, ‘two’, ‘one’, ‘three’],
‘C’ : np.random.randn(8),
‘D’ : np.random.randn(8),
df
Categoricals
Since version 0.15, pandas can include categorical data in a DataFrame.
df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])
Convert the raw grades to a categorical data type.
df[“grade”] = df[“raw_grade”].astype(“category”)
df[“grade”]
Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace)
df[“grade”].cat.categories = [“very good”, “good”,”very bad”]
Reorder the categories and simultaneously add the missing categories (methods under Series.cat return a new Series per default ).
df[“grade”] = df[“grade”.cat.set_categories([“very good”, “bad”,”medium”,“good”,”very bad”])
df[“grade”]
Grouping by a categorical column shows also empty categories.
df.groupby(“grade”).size()
grade
very bad 1
bad 0
medium 0
good 2
very good 3
dtype: int64
Plotting
Plotting docs
ts = pd.Series(np.random.randn(1000), index=pd.date_range(‘1/1/2000’, perios = 1000))
ts = ts.cumsum()
plt.figure(); ts.plot()
plt.show()
On DataFrame, plot() is a convenience to plot all of the columns with labels:
df = pd.DataFrame(np.random.randn(1000, 4),index=ts.index,
columns =[‘A’,’B’,’C’,’D’])
df = df.cumsum()
plt.figure():df.plot() : plt.legend(loc=‘best’)
plt.show()