Pandas+Matplotlib绘图

原創

2020-02-23 22:22

在pandas中，我们有行标签、列标签以及分组信息。也就是说，要制作一张完整的图表，原本需要一大堆的matplotlib代码，现在只需一两条简洁的语句就可以了。pandas有许多能够利用DataFrame对象数据组织特点来创建标准图标的高级绘图方法(这些函数的数量还在不断增加)。

一，利用Series的plot方法绘图

用Series绘图的原理：Series的索引作为x轴，Series的值作为y轴

from matplotlib import pyplot as plt
import numpy as np
from pandas import DataFrame,Series
import pandas as pd
#用来正常显示中文标签
plt.rcParams['font.sans-serif']=['SimHei'] 
#用来正常显示负号 
plt.rcParams['axes.unicode_minus']=False
#1.利用随机函数创建一组随机序列
series = Series(np.random.randn(10).cumsum())  #数据集累计和
series
#利用series默认的index作为x轴数据，series的value作为y轴值
axes = series.plot(label="折线图",style='ko-')
axes.set_title("利用Series绘制折线图")
axes.legend()

结果显示：

#2.也可以指定series的index的值作为x轴
Index = np.arange(0,100,10)
series1 = Series(np.random.randn(10).cumsum(),index = Index)  #数据集累计和
series1
#打印结果：
0     0.377990
10    1.637537
20    1.325206
30    0.376472
40    0.284113
50    1.102053
60    2.342911
70    3.119967
80    4.412816
90    4.505813
dtype: float64

#利用series指定的index值作为x轴数据，series的value作为y轴值
axes = series1.plot(label="折线图")
axes.set_title("利用Series绘制折线图")
axes.legend()

结果显示：

二，利用DataFrame的plot方法绘图

df = DataFrame(np.random.randn(10,4),
               columns=list('ABCD'),
               index=np.arange(0,100,10))
#Dataframe绘图,每一列绘制一组折线图
df.plot()
plt.show()

结果显示：

df.plot(subplots=True)

结果显示：

df.plot(subplots=True,sharey=True)

结果显示：

#创建画板获取axes对象
fig,ax = plt.subplots(2,1,figsize=(10,10))
# fig.set_size_inches(10,7)  #修改已创建子画板大小

#创建绘图数据
data = Series(np.random.randn(16),
            index=list('abcdefghijklmnop'))
#利用series数据在2行一列的画板上的第一块区域绘制柱状图,每一行都是一条数据
data.plot(kind='bar',ax=ax[0],color='k',alpha=0.7)  #alpha表示图表的填充透明度

#利用series数据在2行一列的画板上的第二块区域绘制条形图
data.plot(kind='barh',ax=ax[1],color='g',alpha=0.5)

打印结果：

三，pandas读取excel文件

df = pd.read_excel("excel/pandas-matplotlib.xlsx","Sheet1")
df

结果显示：

#创建一个空白画板
figure = plt.figure()
#在画板上添加一个axes绘图区域
ax = figure.add_subplot(111)

#在选中的区域里绘直方图
ax.hist(df['Age'],bins=7)
plt.title('Age distribution')
plt.xlabel('Age') 
plt.ylabel('Employee')
plt.show()

结果显示：

#绘制箱线图
figure = plt.figure()

ax = figure.add_subplot(111)
#根据年龄绘制箱线图
ax.boxplot(df.Age)
plt.show()

结果显示：

df.Age.describe()
#打印结果：
count    10.000000
mean     34.700000
std       5.121849
min      26.000000
25%      32.000000
50%      35.000000
75%      36.750000
max      44.000000
Name: Age, dtype: float64

#按照性别分类后统计销量总和
var = df.groupby('Gender').Sales.sum()
var
#打印结果：
Gender
F    506
M    782
Name: Sales, dtype: int64

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_xlabel('Gerder')
ax1.set_ylabel('Sun of Sales')
ax1.set_title('Gender wise Sum of Sales')
#绘制柱状图
var.plot(kind='bar')

结果显示：

#根据BMI分组
var = df.groupby("BMI").Sales.sum()
print(var)
#打印结果：
BMI
Normal         517
Obesity        268
Overweight     114
Underweight    389
Name: Sales, dtype: int64

fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel("BMI")
ax.set_ylabel("Sum of Sales")
ax.set_title('BMI wise Sum of Sales')
var.plot(kind='line')
plt.show()

结果显示：

#按照身体指数和性别分组后统计销售数量
var = df.groupby(['BMI','Gender']).Sales.sum()
result = var.unstack()
result

打印结果：

#绘制柱状图
result.plot(kind='bar')

结果显示：

#绘制堆积图
result.plot(kind='bar',stacked=True,color=['r','b'])
plt.show()

结果显示：

#绘制散点图
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(df['Age'],df['Sales'])
plt.show()

结果显示：

#绘制气泡图
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(df['Age'],df['Sales'],s=df['Income'])  #第三个变量表明根据收入气泡的大小 
plt.show()

结果显示：

#绘制饼图
var = df.groupby(['Gender']).sum()
var

结果显示：

x_list = var['Sales']
label_list = var.index
plt.axis('equal') 
plt.pie(x_list, labels=label_list,
        startangle=90, 
        shadow = True,  #是否显示阴影
        explode =[0.05,0],
        autopct='%1.1f%%') 
plt.title('expense') 
plt.show()

结果显示：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Pandas+Matplotlib绘图

一，利用Series的plot方法绘图

二，利用DataFrame的plot方法绘图

三，pandas读取excel文件

salesforce零基础学习（一百三十八）零碎知识点小总结（十）

关于接口协议，你必须要知道这些！

一键自动化博客发布工具,用过的人都说好(头条篇)

01 稳定性（一）如何应对事故并做好覆盘？

美团一面：项目中有 10000 个 if else 如何优化？想了半天，被问懵了！

京东面试：如何进行JVM调优？

线程池那些坑爹的参数-核心线程数&最大线程数&工作队列

Stream流常用方法总结

Pandas高階篇一(數據合併)

Matplotlib加載csv數據文件進行可視化

Matplotlib-高級篇（優化柱狀/條形圖）

Pandas高階篇二(數據的重塑和旋轉)

機器學習十大算法之一：樸素貝葉斯Bayes

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結