pandas做數據分析(四):常用函數

一.統計信息

1.pandas.DataFrame.describe

DataFrame.describe(percentiles=None, include=None, exclude=None)

作用:
生成簡要的統計信息,排除NaN值

參數:
percentiles : array-like, 可選,optional
The percentiles to include in the output. Should all be in the interval [0, 1]. By default percentiles is [.25, .5, .75], returning the 25th, 50th, and 75th percentiles.
include, exclude : list-like, ‘all’, or None (default)
Specify the form of the returned result. Either:
None to both (default). The result will include only numeric-typed columns or, if none are, only categorical columns.
A list of dtypes or strings to be included/excluded. To select all numeric types use numpy numpy.number. To select categorical objects use type object. See also the select_dtypes documentation. eg. df.describe(include=[‘O’])
If include is the string ‘all’, the output column-set will match the input one.
Returns:
summary: NDFrame of summary statistics
See also DataFrame.select_dtypes
Notes

The output DataFrame index depends on the requested dtypes:

For numeric dtypes, it will include: count, mean, std, min, max, and lower, 50, and upper percentiles.

For object dtypes (e.g. timestamps or strings), the index will include the count, unique, most common, and frequency of the most common. Timestamps also include the first and last items.

For mixed dtypes, the index will be the union of the corresponding output types. Non-applicable entries will be filled with NaN. Note that mixed-dtype outputs can only be returned from mixed-dtype inputs and appropriate use of the include/exclude arguments.

If multiple values have the highest count, then the count and most common pair will be arbitrarily chosen from among those with the highest count.

The include, exclude arguments are ignored for Series.

三.繪圖相關

1.pandas.DataFrame.hist

使用matplotlib來畫出DataFrame的直方圖.有多少個列,就會畫出多少個子圖.
DataFrame.hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwds)

參數:
data : DataFrame
column : 字符串或者序列,如果傳進去了,就只會畫指定的這些列的直方圖.
by : object, optional
If passed, then used to form histograms for separate groups
grid : 布爾值,默認是True,表示是否顯示網格線.
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels
ax : matplotlib axes object, default None
sharex : boolean, default True if ax is None else False
In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure!
sharey : boolean, default False
In case subplots=True, share y axis and set some y axis labels to invisible
figsize : tuple
The size of the figure to create in inches by default
layout: (optional) a tuple (rows, columns) for the layout of the histograms
bins: 整形,默認是10.表示在直方圖中箱線條的數量.
kwds : other plotting keyword arguments
To be passed to hist function

發佈了99 篇原創文章 · 獲贊 690 · 訪問量 112萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章