25個常用Matplotlib圖的Python代碼，乾貨收藏！


   
   
   
 
    
    
    
  
     
     
     
   
      
      
      
    
       
       
        
        
 
       
    
       
       
        
         
          
           
            
             
              
             
             
             
 
             
            
            
            感謝關注趣學程序！公衆號內部回覆666獲取熱門教程 
            
            
             
              
             
            
           
          
         
       
    
       
       
       

   
      
      
      
  
     
     
     
 
    
    
    

   
   
   

   
   
   
 
    
    
    


   
   
   

   
   
   
 
    
    
    來源丨Python大本營

大家好，我是小F～

今天分享給大家25個Matplotlib圖的彙總，在數據分析和可視化中最有用。


    
    
    
 
     
     
     # !pip install brewer2mpl
 
     
     
     

 
     
     
     import numpy 
 
     
     
     as np
 
     
     
     

 
     
     
     import pandas 
 
     
     
     as pd
 
     
     
     

 
     
     
     import matplotlib 
 
     
     
     as mpl
 
     
     
     

 
     
     
     import matplotlib.pyplot 
 
     
     
     as plt
 
     
     
     

 
     
     
     import seaborn 
 
     
     
     as sns
 
     
     
     

 
     
     
     import warnings; warnings.filterwarnings(action=
 
     
     
     \'once\')
 
     
     
     

 
     
     
     
large = 
 
     
     
     22; med = 
 
     
     
     16; small = 
 
     
     
     12
 
     
     
     
params = {
 
     
     
     \'axes.titlesize\': large,
 
     
     
     
          
 
     
     
     \'legend.fontsize\': med,
 
     
     
     
          
 
     
     
     \'figure.figsize\': (
 
     
     
     16, 
 
     
     
     10),
 
     
     
     
          
 
     
     
     \'axes.labelsize\': med,
 
     
     
     
          
 
     
     
     \'axes.titlesize\': med,
 
     
     
     
          
 
     
     
     \'xtick.labelsize\': med,
 
     
     
     
          
 
     
     
     \'ytick.labelsize\': med,
 
     
     
     
          
 
     
     
     \'figure.titlesize\': large}
 
     
     
     
plt.rcParams.update(params)
 
     
     
     
plt.style.use(
 
     
     
     \'seaborn-whitegrid\')
 
     
     
     
sns.set_style(
 
     
     
     "white")
 
     
     
     
%matplotlib inline
 
     
     
     

 
     
     
     

 
     
     
     # Version
 
     
     
     

 
     
     
     print(mpl.__version__)  
 
     
     
     #> 3.0.0
 
     
     
     

 
     
     
     print(sns.__version__)  
 
     
     
     #> 0.9.0

1. 散點圖

Scatteplot是用於研究兩個變量之間關係的經典和基本圖。如果數據中有多個組，則可能需要以不同顏色可視化每個組。在Matplotlib，你可以方便地使用。


    
    
    
 
     
     
     # Import dataset 
 
     
     
     
midwest = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Prepare Data 
 
     
     
     

 
     
     
     # Create as many colors as there are unique midwest[\'category\']
 
     
     
     
categories = np.unique(midwest[
 
     
     
     \'category\'])
 
     
     
     
colors = [plt.cm.tab10(i/
 
     
     
     float(len(categories)-1)) 
 
     
     
     for i 
 
     
     
     in range(len(categories))]
 
     
     
     

 
     
     
     

 
     
     
     # Draw Plot for Each Category
 
     
     
     
plt.figure(figsize=(16, 10), dpi= 80, facecolor=
 
     
     
     \'w\', edgecolor=
 
     
     
     \'k\')
 
     
     
     

 
     
     
     

 
     
     
     for i, category 
 
     
     
     in enumerate(categories):
 
     
     
     
    plt.scatter(
 
     
     
     \'area\', 
 
     
     
     \'poptotal\', 
 
     
     
     
                data=midwest.loc[midwest.category==category, :], 
 
     
     
     
                s=20, c=colors[i], label=str(category))
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
plt.gca().
 
     
     
     set(xlim=(0.0, 0.1), ylim=(0, 90000),
 
     
     
     
              xlabel=
 
     
     
     \'Area\', ylabel=
 
     
     
     \'Population\')
 
     
     
     

 
     
     
     
plt.xticks(fontsize=12); plt.yticks(fontsize=12)
 
     
     
     
plt.title(
 
     
     
     "Scatterplot of Midwest Area vs Population", fontsize=22)
 
     
     
     
plt.legend(fontsize=12)    
 
     
     
     
plt.show()

2. 帶邊界的氣泡圖

有時，您希望在邊界內顯示一組點以強調其重要性。在此示例中，您將從應該被環繞的數據幀中獲取記錄，並將其傳遞給下面的代碼中描述的記錄。encircle()


    
    
    
 
     
     
     from matplotlib 
 
     
     
     import patches
 
     
     
     

 
     
     
     from scipy.spatial 
 
     
     
     import ConvexHull
 
     
     
     

 
     
     
     import warnings; warnings.simplefilter(
 
     
     
     \'ignore\')
 
     
     
     
sns.set_style(
 
     
     
     "white")
 
     
     
     

 
     
     
     

 
     
     
     # Step 1: Prepare Data
 
     
     
     
midwest = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
 
     
     
     

 
     
     
     

 
     
     
     # As many colors as there are unique midwest[\'category\']
 
     
     
     
categories = np.unique(midwest[
 
     
     
     \'category\'])
 
     
     
     
colors = [plt.cm.tab10(i/float(len(categories)
 
     
     
     -1)) 
 
     
     
     for i 
 
     
     
     in range(len(categories))]
 
     
     
     

 
     
     
     

 
     
     
     # Step 2: Draw Scatterplot with unique color for each category
 
     
     
     
fig = plt.figure(figsize=(
 
     
     
     16, 
 
     
     
     10), dpi= 
 
     
     
     80, facecolor=
 
     
     
     \'w\', edgecolor=
 
     
     
     \'k\')    
 
     
     
     

 
     
     
     

 
     
     
     for i, category 
 
     
     
     in enumerate(categories):
 
     
     
     
    plt.scatter(
 
     
     
     \'area\', 
 
     
     
     \'poptotal\', data=midwest.loc[midwest.category==category, :], s=
 
     
     
     \'dot_size\', c=colors[i], label=str(category), edgecolors=
 
     
     
     \'black\', linewidths=
 
     
     
     .5)
 
     
     
     

 
     
     
     

 
     
     
     # Step 3: Encircling
 
     
     
     

 
     
     
     # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
 
     
     
     

 
     
     
     def encircle(x,y, ax=None, **kw):
 
     
     
     
    
 
     
     
     if 
 
     
     
     not ax: ax=plt.gca()
 
     
     
     
    p = np.c_[x,y]
 
     
     
     
    hull = ConvexHull(p)
 
     
     
     
    poly = plt.Polygon(p[hull.vertices,:], **kw)
 
     
     
     
    ax.add_patch(poly)
 
     
     
     

 
     
     
     

 
     
     
     # Select data to be encircled
 
     
     
     
midwest_encircle_data = midwest.loc[midwest.state==
 
     
     
     \'IN\', :]                         
 
     
     
     

 
     
     
     

 
     
     
     # Draw polygon surrounding vertices    
 
     
     
     
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec=
 
     
     
     "k", fc=
 
     
     
     "gold", alpha=
 
     
     
     0.1)
 
     
     
     
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec=
 
     
     
     "firebrick", fc=
 
     
     
     "none", linewidth=
 
     
     
     1.5)
 
     
     
     

 
     
     
     

 
     
     
     # Step 4: Decorations
 
     
     
     
plt.gca().set(xlim=(
 
     
     
     0.0, 
 
     
     
     0.1), ylim=(
 
     
     
     0, 
 
     
     
     90000),
 
     
     
     
              xlabel=
 
     
     
     \'Area\', ylabel=
 
     
     
     \'Population\')
 
     
     
     

 
     
     
     
plt.xticks(fontsize=
 
     
     
     12); plt.yticks(fontsize=
 
     
     
     12)
 
     
     
     
plt.title(
 
     
     
     "Bubble Plot with Encircling", fontsize=
 
     
     
     22)
 
     
     
     
plt.legend(fontsize=
 
     
     
     12)    
 
     
     
     
plt.show()

3. 帶線性迴歸最佳擬合線的散點圖

如果你想了解兩個變量如何相互改變，那麼最合適的線就是要走的路。下圖顯示了數據中各組之間最佳擬合線的差異。要禁用分組並僅爲整個數據集繪製一條最佳擬合線，請從下面的調用中刪除該參數。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     
df_select = df.loc[df.cyl.isin([4,8]), :]
 
     
     
     

 
     
     
     

 
     
     
     # Plot
 
     
     
     
sns.set_style(
 
     
     
     "white")
 
     
     
     
gridobj = sns.lmplot(x=
 
     
     
     "displ", y=
 
     
     
     "hwy", hue=
 
     
     
     "cyl", data=df_select, 
 
     
     
     
                     height=7, aspect=1.6, robust=True, palette=\'tab10\', 
 
     
     
     
                     scatter_kws=dict(s=60, linewidths=.7, edgecolors=\'black\'))
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
 
     
     
     
plt.title(
 
     
     
     "Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)

每個迴歸線都在自己的列中

或者，您可以在其自己的列中顯示每個組的最佳擬合線。你可以通過在裏面設置參數來實現這一點。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     
df_select = df.loc[df.cyl.isin([4,8]), :]
 
     
     
     

 
     
     
     

 
     
     
     # Each line in its own column
 
     
     
     
sns.set_style(
 
     
     
     "white")
 
     
     
     
gridobj = sns.lmplot(x=
 
     
     
     "displ", y=
 
     
     
     "hwy", 
 
     
     
     
                     data=df_select, 
 
     
     
     
                     height=7, 
 
     
     
     
                     robust=True, 
 
     
     
     
                     palette=\'Set1\', 
 
     
     
     
                     col=
 
     
     
     "cyl",
 
     
     
     
                     scatter_kws=dict(s=60, linewidths=.7, edgecolors=\'black\'))
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
 
     
     
     
plt.show()

4. 抖動圖

通常，多個數據點具有完全相同的X和Y值。結果，多個點相互繪製並隱藏。爲避免這種情況，請稍微抖動點，以便您可以直觀地看到它們。這很方便使用


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Draw Stripplot
 
     
     
     
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)    
 
     
     
     
sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
plt.title(\'Use jittered plots to avoid overlapping of points\', fontsize=22)
 
     
     
     
plt.show()

5. 計數圖

避免點重疊問題的另一個選擇是增加點的大小，這取決於該點中有多少點。因此，點的大小越大，周圍的點的集中度就越大。


    
    
    
 
     
     
     # Import Data
 
     
     
     

 
     
     
     df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     
df_counts = df.groupby([
 
     
     
     \'hwy\', 
 
     
     
     \'cty\']).size().reset_index(name=
 
     
     
     \'counts\')
 
     
     
     

 
     
     
     

 
     
     
     # Draw Stripplot
 
     
     
     
fig, ax = plt.subplots(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)    
 
     
     
     
sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*
 
     
     
     2, ax=ax)
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
plt.title(
 
     
     
     \'Counts Plot - Size of circle is bigger as more points overlap\', fontsize=
 
     
     
     22)
 
     
     
     
plt.show()

6. 邊緣直方圖

邊緣直方圖具有沿X和Y軸變量的直方圖。這用於可視化X和Y之間的關係以及單獨的X和Y的單變量分佈。該圖如果經常用於探索性數據分析（EDA）。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Create Fig and gridspec
 
     
     
     
fig = plt.figure(figsize=(
 
     
     
     16, 
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
grid = plt.GridSpec(
 
     
     
     4, 
 
     
     
     4, hspace=
 
     
     
     0.
 
     
     
     5, wspace=
 
     
     
     0.
 
     
     
     2)
 
     
     
     

 
     
     
     

 
     
     
     # Define the axes
 
     
     
     
ax_main = fig.add_subplot(grid[
 
     
     
     :-1, 
 
     
     
     :-1])
 
     
     
     
ax_right = fig.add_subplot(grid[
 
     
     
     :-1, -
 
     
     
     1], xticklabels=[], yticklabels=[])
 
     
     
     
ax_bottom = fig.add_subplot(grid[-
 
     
     
     1, 
 
     
     
     0:-1], xticklabels=[], yticklabels=[])
 
     
     
     

 
     
     
     

 
     
     
     # Scatterplot on main ax
 
     
     
     
ax_main.scatter(
 
     
     
     \'displ\', 
 
     
     
     \'hwy\', s=df.cty*
 
     
     
     4, c=df.manufacturer.astype(
 
     
     
     \'category\').cat.codes, alpha=.
 
     
     
     9, data=df, cmap=
 
     
     
     "tab10", edgecolors=
 
     
     
     \'gray\', linewidths=.
 
     
     
     5)
 
     
     
     

 
     
     
     

 
     
     
     # histogram on the right
 
     
     
     
ax_bottom.hist(df.displ, 
 
     
     
     40, histtype=
 
     
     
     \'stepfilled\', orientation=
 
     
     
     \'vertical\', color=
 
     
     
     \'deeppink\')
 
     
     
     
ax_bottom.invert_yaxis()
 
     
     
     

 
     
     
     

 
     
     
     # histogram in the bottom
 
     
     
     
ax_right.hist(df.hwy, 
 
     
     
     40, histtype=
 
     
     
     \'stepfilled\', orientation=
 
     
     
     \'horizontal\', color=
 
     
     
     \'deeppink\')
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
ax_main.set(title=
 
     
     
     \'Scatterplot with Histograms 
 displ vs hwy\', xlabel=
 
     
     
     \'displ\', ylabel=
 
     
     
     \'hwy\')
 
     
     
     
ax_main.title.set_fontsize(
 
     
     
     20)
 
     
     
     

 
     
     
     for item 
 
     
     
     in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
 
     
     
     
    item.set_fontsize(
 
     
     
     14)
 
     
     
     

 
     
     
     
xlabels = ax_main.get_xticks().tolist()
 
     
     
     
ax_main.set_xticklabels(xlabels)
 
     
     
     
plt.show()

7.邊緣箱形圖

邊緣箱圖與邊緣直方圖具有相似的用途。然而，箱線圖有助於精確定位X和Y的中位數，第25和第75百分位數。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Create Fig and gridspec
 
     
     
     
fig = plt.figure(figsize=(
 
     
     
     16, 
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
grid = plt.GridSpec(
 
     
     
     4, 
 
     
     
     4, hspace=
 
     
     
     0.
 
     
     
     5, wspace=
 
     
     
     0.
 
     
     
     2)
 
     
     
     

 
     
     
     

 
     
     
     # Define the axes
 
     
     
     
ax_main = fig.add_subplot(grid[
 
     
     
     :-1, 
 
     
     
     :-1])
 
     
     
     
ax_right = fig.add_subplot(grid[
 
     
     
     :-1, -
 
     
     
     1], xticklabels=[], yticklabels=[])
 
     
     
     
ax_bottom = fig.add_subplot(grid[-
 
     
     
     1, 
 
     
     
     0:-1], xticklabels=[], yticklabels=[])
 
     
     
     

 
     
     
     

 
     
     
     # Scatterplot on main ax
 
     
     
     
ax_main.scatter(
 
     
     
     \'displ\', 
 
     
     
     \'hwy\', s=df.cty*
 
     
     
     5, c=df.manufacturer.astype(
 
     
     
     \'category\').cat.codes, alpha=.
 
     
     
     9, data=df, cmap=
 
     
     
     "Set1", edgecolors=
 
     
     
     \'black\', linewidths=.
 
     
     
     5)
 
     
     
     

 
     
     
     

 
     
     
     # Add a graph in each part
 
     
     
     
sns.boxplot(df.hwy, ax=ax_right, orient=
 
     
     
     "v")
 
     
     
     
sns.boxplot(df.displ, ax=ax_bottom, orient=
 
     
     
     "h")
 
     
     
     

 
     
     
     

 
     
     
     # Decorations ------------------
 
     
     
     

 
     
     
     # Remove x axis name for the boxplot
 
     
     
     
ax_bottom.set(xlabel=
 
     
     
     \'\')
 
     
     
     
ax_right.set(ylabel=
 
     
     
     \'\')
 
     
     
     

 
     
     
     

 
     
     
     # Main Title, Xlabel and YLabel
 
     
     
     
ax_main.set(title=
 
     
     
     \'Scatterplot with Histograms 
 displ vs hwy\', xlabel=
 
     
     
     \'displ\', ylabel=
 
     
     
     \'hwy\')
 
     
     
     

 
     
     
     

 
     
     
     # Set font size of different components
 
     
     
     
ax_main.title.set_fontsize(
 
     
     
     20)
 
     
     
     

 
     
     
     for item 
 
     
     
     in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
 
     
     
     
    item.set_fontsize(
 
     
     
     14)
 
     
     
     

 
     
     
     
plt.show()

8. 相關圖

Correlogram用於直觀地查看給定數據幀（或2D數組）中所有可能的數值變量對之間的相關度量。


    
    
    
 
     
     
     # Import Dataset
 
     
     
     

 
     
     
     df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mtcars.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     12,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap=
 
     
     
     \'RdYlGn\', center=
 
     
     
     0, annot=True)
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
plt.title(
 
     
     
     \'Correlogram of mtcars\', fontsize=
 
     
     
     22)
 
     
     
     
plt.xticks(fontsize=
 
     
     
     12)
 
     
     
     
plt.yticks(fontsize=
 
     
     
     12)
 
     
     
     
plt.show()

9. 矩陣圖

成對圖是探索性分析中的最愛，以理解所有可能的數字變量對之間的關係。它是雙變量分析的必備工具。


    
    
    
 
     
     
     # Load Dataset
 
     
     
     
df = sns.load_dataset(\'iris\')
 
     
     
     

 
     
     
     

 
     
     
     # Plot
 
     
     
     
plt.figure(figsize=(10,8), dpi= 80)
 
     
     
     
sns.pairplot(df, kind=
 
     
     
     "scatter", hue=
 
     
     
     "species", plot_kws=dict(s=80, edgecolor=
 
     
     
     "white", linewidth=2.5))
 
     
     
     
plt.show()


    
    
    
 
     
     
     # Load Dataset
 
     
     
     
df = sns.load_dataset(\'iris\')
 
     
     
     

 
     
     
     

 
     
     
     # Plot
 
     
     
     
plt.figure(figsize=(10,8), dpi= 80)
 
     
     
     
sns.pairplot(df, kind=
 
     
     
     "reg", hue=
 
     
     
     "species")
 
     
     
     
plt.show()

偏差

10. 發散型條形圖

如果您想根據單個指標查看項目的變化情況，並可視化此差異的順序和數量，那麼發散條是一個很好的工具。它有助於快速區分數據中組的性能，並且非常直觀，並且可以立即傳達這一點。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mtcars.csv")
 
     
     
     
x = df.loc[:, [
 
     
     
     \'mpg\']]
 
     
     
     
df[
 
     
     
     \'mpg_z\'] = (x - x.mean())/x.std()
 
     
     
     
df[
 
     
     
     \'colors\'] = [
 
     
     
     \'red\' 
 
     
     
     if x < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'green\' 
 
     
     
     for x 
 
     
     
     in df[
 
     
     
     \'mpg_z\']]
 
     
     
     
df.sort_values(
 
     
     
     \'mpg_z\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     14,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
plt.hlines(y=df.index, xmin=
 
     
     
     0, xmax=df.mpg_z, color=df.colors, alpha=
 
     
     
     0.4, linewidth=
 
     
     
     5)
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
plt.gca().set(ylabel=
 
     
     
     \'$Model$\', xlabel=
 
     
     
     \'$Mileage$\')
 
     
     
     
plt.yticks(df.index, df.cars, fontsize=
 
     
     
     12)
 
     
     
     
plt.title(
 
     
     
     \'Diverging Bars of Car Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     20})
 
     
     
     
plt.grid(linestyle=
 
     
     
     \'--\', alpha=
 
     
     
     0.5)
 
     
     
     
plt.show()

11. 發散型文本

分散的文本類似於發散條，如果你想以一種漂亮和可呈現的方式顯示圖表中每個項目的價值，它更喜歡。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mtcars.csv")
 
     
     
     
x = df.loc[:, [
 
     
     
     \'mpg\']]
 
     
     
     
df[
 
     
     
     \'mpg_z\'] = (x - x.mean())/x.std()
 
     
     
     
df[
 
     
     
     \'colors\'] = [
 
     
     
     \'red\' 
 
     
     
     if x < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'green\' 
 
     
     
     for x 
 
     
     
     in df[
 
     
     
     \'mpg_z\']]
 
     
     
     
df.sort_values(
 
     
     
     \'mpg_z\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     14,
 
     
     
     14), dpi= 
 
     
     
     80)
 
     
     
     
plt.hlines(y=df.index, xmin=
 
     
     
     0, xmax=df.mpg_z)
 
     
     
     

 
     
     
     for x, y, tex 
 
     
     
     in zip(df.mpg_z, df.index, df.mpg_z):
 
     
     
     
    t = plt.text(x, y, round(tex, 
 
     
     
     2), horizontalalignment=
 
     
     
     \'right\' 
 
     
     
     if x < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'left\', 
 
     
     
     
                 verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'color\':
 
     
     
     \'red\' 
 
     
     
     if x < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'green\', 
 
     
     
     \'size\':
 
     
     
     14})
 
     
     
     

 
     
     
     

 
     
     
     # Decorations    
 
     
     
     
plt.yticks(df.index, df.cars, fontsize=
 
     
     
     12)
 
     
     
     
plt.title(
 
     
     
     \'Diverging Text Bars of Car Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     20})
 
     
     
     
plt.grid(linestyle=
 
     
     
     \'--\', alpha=
 
     
     
     0.5)
 
     
     
     
plt.xlim(
 
     
     
     -2.5, 
 
     
     
     2.5)
 
     
     
     
plt.show()

12. 發散型包點圖

發散點圖也類似於發散條。然而，與發散條相比，條的不存在減少了組之間的對比度和差異。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mtcars.csv")
 
     
     
     
x = df.loc[:, [
 
     
     
     \'mpg\']]
 
     
     
     
df[
 
     
     
     \'mpg_z\'] = (x - x.mean())/x.std()
 
     
     
     
df[
 
     
     
     \'colors\'] = [
 
     
     
     \'red\' 
 
     
     
     if x < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'darkgreen\' 
 
     
     
     for x 
 
     
     
     in df[
 
     
     
     \'mpg_z\']]
 
     
     
     
df.sort_values(
 
     
     
     \'mpg_z\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     14,
 
     
     
     16), dpi= 
 
     
     
     80)
 
     
     
     
plt.scatter(df.mpg_z, df.index, s=
 
     
     
     450, alpha=
 
     
     
     .6, color=df.colors)
 
     
     
     

 
     
     
     for x, y, tex 
 
     
     
     in zip(df.mpg_z, df.index, df.mpg_z):
 
     
     
     
    t = plt.text(x, y, round(tex, 
 
     
     
     1), horizontalalignment=
 
     
     
     \'center\', 
 
     
     
     
                 verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'color\':
 
     
     
     \'white\'})
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     

 
     
     
     # Lighten borders
 
     
     
     
plt.gca().spines[
 
     
     
     "top"].set_alpha(
 
     
     
     .3)
 
     
     
     
plt.gca().spines[
 
     
     
     "bottom"].set_alpha(
 
     
     
     .3)
 
     
     
     
plt.gca().spines[
 
     
     
     "right"].set_alpha(
 
     
     
     .3)
 
     
     
     
plt.gca().spines[
 
     
     
     "left"].set_alpha(
 
     
     
     .3)
 
     
     
     

 
     
     
     
plt.yticks(df.index, df.cars)
 
     
     
     
plt.title(
 
     
     
     \'Diverging Dotplot of Car Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     20})
 
     
     
     
plt.xlabel(
 
     
     
     \'$Mileage$\')
 
     
     
     
plt.grid(linestyle=
 
     
     
     \'--\', alpha=
 
     
     
     0.5)
 
     
     
     
plt.xlim(
 
     
     
     -2.5, 
 
     
     
     2.5)
 
     
     
     
plt.show()

13. 帶標記的發散型棒棒糖圖

帶標記的棒棒糖通過強調您想要引起注意的任何重要數據點並在圖表中適當地給出推理，提供了一種可視化分歧的靈活方式。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mtcars.csv")
 
     
     
     
x = df.loc[:, [
 
     
     
     \'mpg\']]
 
     
     
     
df[
 
     
     
     \'mpg_z\'] = (x - x.mean())/x.std()
 
     
     
     
df[
 
     
     
     \'colors\'] = 
 
     
     
     \'black\'
 
     
     
     

 
     
     
     

 
     
     
     # color fiat differently
 
     
     
     
df.loc[df.cars == 
 
     
     
     \'Fiat X1-9\', 
 
     
     
     \'colors\'] = 
 
     
     
     \'darkorange\'
 
     
     
     
df.sort_values(
 
     
     
     \'mpg_z\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     

 
     
     
     import matplotlib.patches 
 
     
     
     as patches
 
     
     
     

 
     
     
     
plt.figure(figsize=(
 
     
     
     14,
 
     
     
     16), dpi= 
 
     
     
     80)
 
     
     
     
plt.hlines(y=df.index, xmin=
 
     
     
     0, xmax=df.mpg_z, color=df.colors, alpha=
 
     
     
     0.4, linewidth=
 
     
     
     1)
 
     
     
     
plt.scatter(df.mpg_z, df.index, color=df.colors, s=[
 
     
     
     600 
 
     
     
     if x == 
 
     
     
     \'Fiat X1-9\' 
 
     
     
     else 
 
     
     
     300 
 
     
     
     for x 
 
     
     
     in df.cars], alpha=
 
     
     
     0.6)
 
     
     
     
plt.yticks(df.index, df.cars)
 
     
     
     
plt.xticks(fontsize=
 
     
     
     12)
 
     
     
     

 
     
     
     

 
     
     
     # Annotate
 
     
     
     
plt.annotate(
 
     
     
     \'Mercedes Models\', xy=(
 
     
     
     0.0, 
 
     
     
     11.0), xytext=(
 
     
     
     1.0, 
 
     
     
     11), xycoords=
 
     
     
     \'data\', 
 
     
     
     
            fontsize=
 
     
     
     15, ha=
 
     
     
     \'center\', va=
 
     
     
     \'center\',
 
     
     
     
            bbox=dict(boxstyle=
 
     
     
     \'square\', fc=
 
     
     
     \'firebrick\'),
 
     
     
     
            arrowprops=dict(arrowstyle=
 
     
     
     \'-[, widthB=2.0, lengthB=1.5\', lw=
 
     
     
     2.0, color=
 
     
     
     \'steelblue\'), color=
 
     
     
     \'white\')
 
     
     
     

 
     
     
     

 
     
     
     # Add Patches
 
     
     
     
p1 = patches.Rectangle((
 
     
     
     -2.0, 
 
     
     
     -1), width=
 
     
     
     .3, height=
 
     
     
     3, alpha=
 
     
     
     .2, facecolor=
 
     
     
     \'red\')
 
     
     
     
p2 = patches.Rectangle((
 
     
     
     1.5, 
 
     
     
     27), width=
 
     
     
     .8, height=
 
     
     
     5, alpha=
 
     
     
     .2, facecolor=
 
     
     
     \'green\')
 
     
     
     
plt.gca().add_patch(p1)
 
     
     
     
plt.gca().add_patch(p2)
 
     
     
     

 
     
     
     

 
     
     
     # Decorate
 
     
     
     
plt.title(
 
     
     
     \'Diverging Bars of Car Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     20})
 
     
     
     
plt.grid(linestyle=
 
     
     
     \'--\', alpha=
 
     
     
     0.5)
 
     
     
     
plt.show()

14.面積圖

通過對軸和線之間的區域進行着色，區域圖不僅強調峯值和低谷，而且還強調高點和低點的持續時間。高點持續時間越長，線下面積越大。


    
    
    
 
     
     
     import numpy 
 
     
     
     as np
 
     
     
     

 
     
     
     import pandas 
 
     
     
     as pd
 
     
     
     

 
     
     
     

 
     
     
     # Prepare Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=[
 
     
     
     \'date\']).head(
 
     
     
     100)
 
     
     
     
x = np.arange(df.shape[
 
     
     
     0])
 
     
     
     
y_returns = (df.psavert.diff().fillna(
 
     
     
     0)/df.psavert.shift(
 
     
     
     1)).fillna(
 
     
     
     0) * 
 
     
     
     100
 
     
     
     

 
     
     
     

 
     
     
     # Plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
plt.fill_between(x[
 
     
     
     1:], y_returns[
 
     
     
     1:], 
 
     
     
     0, where=y_returns[
 
     
     
     1:] >= 
 
     
     
     0, facecolor=
 
     
     
     \'green\', interpolate=
 
     
     
     True, alpha=
 
     
     
     0.7)
 
     
     
     
plt.fill_between(x[
 
     
     
     1:], y_returns[
 
     
     
     1:], 
 
     
     
     0, where=y_returns[
 
     
     
     1:] <= 
 
     
     
     0, facecolor=
 
     
     
     \'red\', interpolate=
 
     
     
     True, alpha=
 
     
     
     0.7)
 
     
     
     

 
     
     
     

 
     
     
     # Annotate
 
     
     
     
plt.annotate(
 
     
     
     \'Peak 
1975\', xy=(
 
     
     
     94.0, 
 
     
     
     21.0), xytext=(
 
     
     
     88.0, 
 
     
     
     28),
 
     
     
     
             bbox=dict(boxstyle=
 
     
     
     \'square\', fc=
 
     
     
     \'firebrick\'),
 
     
     
     
             arrowprops=dict(facecolor=
 
     
     
     \'steelblue\', shrink=
 
     
     
     0.05), fontsize=
 
     
     
     15, color=
 
     
     
     \'white\')
 
     
     
     

 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
xtickvals = [str(m)[:
 
     
     
     3].upper()+
 
     
     
     "-"+str(y) 
 
     
     
     for y,m 
 
     
     
     in zip(df.date.dt.year, df.date.dt.month_name())]
 
     
     
     
plt.gca().set_xticks(x[::
 
     
     
     6])
 
     
     
     
plt.gca().set_xticklabels(xtickvals[::
 
     
     
     6], rotation=
 
     
     
     90, fontdict={
 
     
     
     \'horizontalalignment\': 
 
     
     
     \'center\', 
 
     
     
     \'verticalalignment\': 
 
     
     
     \'center_baseline\'})
 
     
     
     
plt.ylim(
 
     
     
     -35,
 
     
     
     35)
 
     
     
     
plt.xlim(
 
     
     
     1,
 
     
     
     100)
 
     
     
     
plt.title(
 
     
     
     "Month Economics Return %", fontsize=
 
     
     
     22)
 
     
     
     
plt.ylabel(
 
     
     
     \'Monthly returns %\')
 
     
     
     
plt.grid(alpha=
 
     
     
     0.5)
 
     
     
     
plt.show()

排序

15. 有序條形圖

有序條形圖有效地傳達了項目的排名順序。但是，在圖表上方添加度量標準的值，用戶可以從圖表本身獲取精確信息。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df_raw = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     
df = df_raw[[
 
     
     
     \'cty\', 
 
     
     
     \'manufacturer\']].groupby(
 
     
     
     \'manufacturer\').apply(
 
     
     
     lambda x: x.mean())
 
     
     
     
df.sort_values(
 
     
     
     \'cty\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     

 
     
     
     import matplotlib.patches 
 
     
     
     as patches
 
     
     
     

 
     
     
     
fig, ax = plt.subplots(figsize=(
 
     
     
     16,
 
     
     
     10), facecolor=
 
     
     
     \'white\', dpi= 
 
     
     
     80)
 
     
     
     
ax.vlines(x=df.index, ymin=
 
     
     
     0, ymax=df.cty, color=
 
     
     
     \'firebrick\', alpha=
 
     
     
     0.7, linewidth=
 
     
     
     20)
 
     
     
     

 
     
     
     

 
     
     
     # Annotate Text
 
     
     
     

 
     
     
     for i, cty 
 
     
     
     in enumerate(df.cty):
 
     
     
     
    ax.text(i, cty+
 
     
     
     0.5, round(cty, 
 
     
     
     1), horizontalalignment=
 
     
     
     \'center\')
 
     
     
     

 
     
     
     

 
     
     
     

 
     
     
     # Title, Label, Ticks and Ylim
 
     
     
     
ax.set_title(
 
     
     
     \'Bar Chart for Highway Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.set(ylabel=
 
     
     
     \'Miles Per Gallon\', ylim=(
 
     
     
     0, 
 
     
     
     30))
 
     
     
     
plt.xticks(df.index, df.manufacturer.str.upper(), rotation=
 
     
     
     60, horizontalalignment=
 
     
     
     \'right\', fontsize=
 
     
     
     12)
 
     
     
     

 
     
     
     

 
     
     
     # Add patches to color the X axis labels
 
     
     
     
p1 = patches.Rectangle((
 
     
     
     .57, 
 
     
     
     -0.005), width=
 
     
     
     .33, height=
 
     
     
     .13, alpha=
 
     
     
     .1, facecolor=
 
     
     
     \'green\', transform=fig.transFigure)
 
     
     
     
p2 = patches.Rectangle((
 
     
     
     .124, 
 
     
     
     -0.005), width=
 
     
     
     .446, height=
 
     
     
     .13, alpha=
 
     
     
     .1, facecolor=
 
     
     
     \'red\', transform=fig.transFigure)
 
     
     
     
fig.add_artist(p1)
 
     
     
     
fig.add_artist(p2)
 
     
     
     
plt.show()

16. 棒棒糖圖

棒棒糖圖表以一種視覺上令人愉悅的方式提供與有序條形圖類似的目的。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df_raw = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     
df = df_raw[[
 
     
     
     \'cty\', 
 
     
     
     \'manufacturer\']].groupby(
 
     
     
     \'manufacturer\').apply(
 
     
     
     lambda x: x.mean())
 
     
     
     
df.sort_values(
 
     
     
     \'cty\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     
fig, ax = plt.subplots(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
ax.vlines(x=df.index, ymin=
 
     
     
     0, ymax=df.cty, color=
 
     
     
     \'firebrick\', alpha=
 
     
     
     0.7, linewidth=
 
     
     
     2)
 
     
     
     
ax.scatter(x=df.index, y=df.cty, s=
 
     
     
     75, color=
 
     
     
     \'firebrick\', alpha=
 
     
     
     0.7)
 
     
     
     

 
     
     
     

 
     
     
     # Title, Label, Ticks and Ylim
 
     
     
     
ax.set_title(
 
     
     
     \'Lollipop Chart for Highway Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.set_ylabel(
 
     
     
     \'Miles Per Gallon\')
 
     
     
     
ax.set_xticks(df.index)
 
     
     
     
ax.set_xticklabels(df.manufacturer.str.upper(), rotation=
 
     
     
     60, fontdict={
 
     
     
     \'horizontalalignment\': 
 
     
     
     \'right\', 
 
     
     
     \'size\':
 
     
     
     12})
 
     
     
     
ax.set_ylim(
 
     
     
     0, 
 
     
     
     30)
 
     
     
     

 
     
     
     

 
     
     
     # Annotate
 
     
     
     

 
     
     
     for row 
 
     
     
     in df.itertuples():
 
     
     
     
    ax.text(row.Index, row.cty+
 
     
     
     .5, s=round(row.cty, 
 
     
     
     2), horizontalalignment= 
 
     
     
     \'center\', verticalalignment=
 
     
     
     \'bottom\', fontsize=
 
     
     
     14)
 
     
     
     

 
     
     
     
plt.show()

17. 包點圖

點圖表傳達了項目的排名順序。由於它沿水平軸對齊，因此您可以更容易地看到點彼此之間的距離。


    
    
    
 
     
     
     # Prepare Data
 
     
     
     
df_raw = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     
df = df_raw[[
 
     
     
     \'cty\', 
 
     
     
     \'manufacturer\']].groupby(
 
     
     
     \'manufacturer\').apply(
 
     
     
     lambda x: x.mean())
 
     
     
     
df.sort_values(
 
     
     
     \'cty\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Draw plot
 
     
     
     
fig, ax = plt.subplots(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
ax.hlines(y=df.index, xmin=
 
     
     
     11, xmax=
 
     
     
     26, color=
 
     
     
     \'gray\', alpha=
 
     
     
     0.7, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dashdot\')
 
     
     
     
ax.scatter(y=df.index, x=df.cty, s=
 
     
     
     75, color=
 
     
     
     \'firebrick\', alpha=
 
     
     
     0.7)
 
     
     
     

 
     
     
     

 
     
     
     # Title, Label, Ticks and Ylim
 
     
     
     
ax.set_title(
 
     
     
     \'Dot Plot for Highway Mileage\', fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.set_xlabel(
 
     
     
     \'Miles Per Gallon\')
 
     
     
     
ax.set_yticks(df.index)
 
     
     
     
ax.set_yticklabels(df.manufacturer.str.title(), fontdict={
 
     
     
     \'horizontalalignment\': 
 
     
     
     \'right\'})
 
     
     
     
ax.set_xlim(
 
     
     
     10, 
 
     
     
     27)
 
     
     
     
plt.show()

18. 坡度圖

斜率圖最適合比較給定人/項目的“之前”和“之後”位置。


    
    
    
     
     
     
 import matplotlib.lines 
 
     
     
     as mlines
 
     
     
     

 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")
 
     
     
     

 
     
     
     
left_label = [str(c) + 
 
     
     
     \', \'+ str(round(y)) 
 
     
     
     for c, 
 
     
     
     y in zip(df.continent, df[\'1952\'])]
right_label = [str(c) + 
 
     
     
     \', \'+ str(round(y)) 
 
     
     
     for c, 
 
     
     
     y in zip(df.continent, df[\'1957\'])]
klass = [
 
     
     
     \'red\' 
 
     
     
     if (y1-y2) < 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'green\' 
 
     
     
     for y1, 
 
     
     
     y2 in zip(df[\'1952\'], df[\'1957\'])]

# draw line
# https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941
def newline(p1, p2, color=\'black\'):
    ax = plt.gca()
 
     
     
     
    l = mlines.Line2D([p1[
 
     
     
     0],p2[
 
     
     
     0]], [p1[
 
     
     
     1],p2[
 
     
     
     1]], color=
 
     
     
     \'red\' 
 
     
     
     if p1[
 
     
     
     1]-p2[
 
     
     
     1] > 
 
     
     
     0 
 
     
     
     else 
 
     
     
     \'green\', marker=
 
     
     
     \'o\', markersize=
 
     
     
     6)
 
     
     
     
    ax.add_line(l)
 
     
     
     
    
 
     
     
     return l
 
     
     
     

 
     
     
     
fig, ax = plt.subplots(
 
     
     
     1,
 
     
     
     1,figsize=(
 
     
     
     14,
 
     
     
     14), dpi= 
 
     
     
     80)
 
     
     
     

 
     
     
     

 
     
     
     # Vertical Lines
 
     
     
     
ax.vlines(x=
 
     
     
     1, ymin=
 
     
     
     500, ymax=
 
     
     
     13000, color=
 
     
     
     \'black\', alpha=
 
     
     
     0.7, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     
ax.vlines(x=
 
     
     
     3, ymin=
 
     
     
     500, ymax=
 
     
     
     13000, color=
 
     
     
     \'black\', alpha=
 
     
     
     0.7, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     

 
     
     
     

 
     
     
     # Points
 
     
     
     
ax.scatter(y=df[
 
     
     
     \'1952\'], x=np.repeat(
 
     
     
     1, df.shape[
 
     
     
     0]), s=
 
     
     
     10, color=
 
     
     
     \'black\', alpha=
 
     
     
     0.7)
 
     
     
     
ax.scatter(y=df[
 
     
     
     \'1957\'], x=np.repeat(
 
     
     
     3, df.shape[
 
     
     
     0]), s=
 
     
     
     10, color=
 
     
     
     \'black\', alpha=
 
     
     
     0.7)
 
     
     
     

 
     
     
     

 
     
     
     # Line Segmentsand Annotation
 
     
     
     

 
     
     
     for p1, p2, 
 
     
     
     c in zip(df[\'1952\'], df[\'1957\'], df[\'continent\']):
    newline([1,p1], [3,p2])
    ax.text(1-0.05, p1, c + \', \' + str(round(p1)), horizontalalignment=
 
     
     
     \'right\', verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'size\':
 
     
     
     14})
 
     
     
     
    ax.text(
 
     
     
     3+
 
     
     
     0.05, p2, c + 
 
     
     
     \', \' + str(round(p2)), horizontalalignment=
 
     
     
     \'left\', verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'size\':
 
     
     
     14})
 
     
     
     

 
     
     
     

 
     
     
     # \'Before\' and \'After\' Annotations
 
     
     
     
ax.text(
 
     
     
     1-0.05, 
 
     
     
     13000, 
 
     
     
     \'BEFORE\', horizontalalignment=
 
     
     
     \'right\', verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'size\':
 
     
     
     18, 
 
     
     
     \'weight\':
 
     
     
     700})
 
     
     
     
ax.text(
 
     
     
     3+
 
     
     
     0.05, 
 
     
     
     13000, 
 
     
     
     \'AFTER\', horizontalalignment=
 
     
     
     \'left\', verticalalignment=
 
     
     
     \'center\', fontdict={
 
     
     
     \'size\':
 
     
     
     18, 
 
     
     
     \'weight\':
 
     
     
     700})
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
ax.set_title(
 
     
     
     "Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.
 
     
     
     set(xlim=(
 
     
     
     0,
 
     
     
     4), ylim=(
 
     
     
     0,
 
     
     
     14000), ylabel=
 
     
     
     \'Mean GDP Per Capita\')
 
     
     
     
ax.set_xticks([
 
     
     
     1,
 
     
     
     3])
 
     
     
     
ax.set_xticklabels([
 
     
     
     "1952", 
 
     
     
     "1957"])
 
     
     
     
plt.yticks(np.arange(
 
     
     
     500, 
 
     
     
     13000, 
 
     
     
     2000), fontsize=
 
     
     
     12)
 
     
     
     

 
     
     
     

 
     
     
     # Lighten borders
 
     
     
     
plt.gca().spines[
 
     
     
     "top"].set_alpha(
 
     
     
     .0)
 
     
     
     
plt.gca().spines[
 
     
     
     "bottom"].set_alpha(
 
     
     
     .0)
 
     
     
     
plt.gca().spines[
 
     
     
     "right"].set_alpha(
 
     
     
     .0)
 
     
     
     
plt.gca().spines[
 
     
     
     "left"].set_alpha(
 
     
     
     .0)
 
     
     
     
plt.show()

19. 啞鈴圖

啞鈴圖傳達各種項目的“前”和“後”位置以及項目的排序。如果您想要將特定項目/計劃對不同對象的影響可視化，那麼它非常有用。


    
    
    
 
     
     
     import matplotlib.lines 
 
     
     
     as mlines
 
     
     
     

 
     
     
     

 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://raw.githubusercontent.com/selva86/datasets/master/health.csv")
 
     
     
     
df.sort_values(
 
     
     
     \'pct_2014\', inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     

 
     
     
     

 
     
     
     # Func to draw line segment
 
     
     
     

 
     
     
     def newline(p1, p2, color=\'black\'):
 
     
     
     
    ax = plt.gca()
 
     
     
     
    l = mlines.Line2D([p1[
 
     
     
     0],p2[
 
     
     
     0]], [p1[
 
     
     
     1],p2[
 
     
     
     1]], color=
 
     
     
     \'skyblue\')
 
     
     
     
    ax.add_line(l)
 
     
     
     
    
 
     
     
     return l
 
     
     
     

 
     
     
     

 
     
     
     # Figure and Axes
 
     
     
     
fig, ax = plt.subplots(
 
     
     
     1,
 
     
     
     1,figsize=(
 
     
     
     14,
 
     
     
     14), facecolor=
 
     
     
     \'#f7f7f7\', dpi= 
 
     
     
     80)
 
     
     
     

 
     
     
     

 
     
     
     # Vertical Lines
 
     
     
     
ax.vlines(x=
 
     
     
     .05, ymin=
 
     
     
     0, ymax=
 
     
     
     26, color=
 
     
     
     \'black\', alpha=
 
     
     
     1, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     
ax.vlines(x=
 
     
     
     .10, ymin=
 
     
     
     0, ymax=
 
     
     
     26, color=
 
     
     
     \'black\', alpha=
 
     
     
     1, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     
ax.vlines(x=
 
     
     
     .15, ymin=
 
     
     
     0, ymax=
 
     
     
     26, color=
 
     
     
     \'black\', alpha=
 
     
     
     1, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     
ax.vlines(x=
 
     
     
     .20, ymin=
 
     
     
     0, ymax=
 
     
     
     26, color=
 
     
     
     \'black\', alpha=
 
     
     
     1, linewidth=
 
     
     
     1, linestyles=
 
     
     
     \'dotted\')
 
     
     
     

 
     
     
     

 
     
     
     # Points
 
     
     
     
ax.scatter(y=df[
 
     
     
     \'index\'], x=df[
 
     
     
     \'pct_2013\'], s=
 
     
     
     50, color=
 
     
     
     \'#0e668b\', alpha=
 
     
     
     0.7)
 
     
     
     
ax.scatter(y=df[
 
     
     
     \'index\'], x=df[
 
     
     
     \'pct_2014\'], s=
 
     
     
     50, color=
 
     
     
     \'#a3c4dc\', alpha=
 
     
     
     0.7)
 
     
     
     

 
     
     
     

 
     
     
     # Line Segments
 
     
     
     

 
     
     
     for i, p1, p2 
 
     
     
     in zip(df[
 
     
     
     \'index\'], df[
 
     
     
     \'pct_2013\'], df[
 
     
     
     \'pct_2014\']):
 
     
     
     
    newline([p1, i], [p2, i])
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
ax.set_facecolor(
 
     
     
     \'#f7f7f7\')
 
     
     
     
ax.set_title(
 
     
     
     "Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.set(xlim=(
 
     
     
     0,
 
     
     
     .25), ylim=(
 
     
     
     -1, 
 
     
     
     27), ylabel=
 
     
     
     \'Mean GDP Per Capita\')
 
     
     
     
ax.set_xticks([
 
     
     
     .05, 
 
     
     
     .1, 
 
     
     
     .15, 
 
     
     
     .20])
 
     
     
     
ax.set_xticklabels([
 
     
     
     \'5%\', 
 
     
     
     \'15%\', 
 
     
     
     \'20%\', 
 
     
     
     \'25%\'])
 
     
     
     
ax.set_xticklabels([
 
     
     
     \'5%\', 
 
     
     
     \'15%\', 
 
     
     
     \'20%\', 
 
     
     
     \'25%\'])    
 
     
     
     
plt.show()

分配

20. 連續變量的直方圖

直方圖顯示給定變量的頻率分佈。下面的表示基於分類變量對頻率條進行分組，從而更好地瞭解連續變量和串聯變量。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Prepare data
 
     
     
     
x_var = 
 
     
     
     \'displ\'
 
     
     
     
groupby_var = 
 
     
     
     \'class\'
 
     
     
     
df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
 
     
     
     
vals = [df[x_var].values.tolist() 
 
     
     
     for i, df 
 
     
     
     in df_agg]
 
     
     
     

 
     
     
     

 
     
     
     # Draw
 
     
     
     
plt.figure(figsize=(
 
     
     
     16,
 
     
     
     9), dpi= 
 
     
     
     80)
 
     
     
     
colors = [plt.cm.Spectral(i/
 
     
     
     float(len(vals)
 
     
     
     -1)) 
 
     
     
     for i in range(len(vals))]
n, bins, patches = plt.hist(vals, 
 
     
     
     30, stacked=True, density=False, color=colors[:len(vals)])
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
plt.legend({
 
     
     
     group:col 
 
     
     
     for 
 
     
     
     group, 
 
     
     
     col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
plt.xlabel(x_var)
plt.ylabel("Frequency")
plt.ylim(0, 25)
plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])
plt.show()

21. 類型變量的直方圖

分類變量的直方圖顯示該變量的頻率分佈。通過對條形圖進行着色，您可以將分佈與表示顏色的另一個分類變量相關聯。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Prepare data
 
     
     
     
x_var = 
 
     
     
     \'manufacturer\'
 
     
     
     
groupby_var = 
 
     
     
     \'class\'
 
     
     
     
df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
 
     
     
     
vals = [df[x_var].values.tolist() 
 
     
     
     for i, df 
 
     
     
     in df_agg]
 
     
     
     

 
     
     
     

 
     
     
     # Draw
 
     
     
     
plt.figure(figsize=(
 
     
     
     16,
 
     
     
     9), dpi= 
 
     
     
     80)
 
     
     
     
colors = [plt.cm.Spectral(i/
 
     
     
     float(len(vals)
 
     
     
     -1)) 
 
     
     
     for i in range(len(vals))]
n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
plt.legend({
 
     
     
     group:col 
 
     
     
     for 
 
     
     
     group, 
 
     
     
     col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
plt.xlabel(x_var)
plt.ylabel("Frequency")
plt.ylim(0, 40)
plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=
 
     
     
     90, horizontalalignment=
 
     
     
     \'left\')
 
     
     
     
plt.show()

22. 密度圖

密度圖是一種常用工具，可視化連續變量的分佈。通過“響應”變量對它們進行分組，您可以檢查X和Y之間的關係。以下情況，如果出於代表性目的來描述城市裏程的分佈如何隨着汽缸數的變化而變化。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Draw Plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
sns.kdeplot(df.loc[df[
 
     
     
     \'cyl\'] == 
 
     
     
     4, 
 
     
     
     "cty"], shade=
 
     
     
     True, color=
 
     
     
     "g", label=
 
     
     
     "Cyl=4", alpha=
 
     
     
     .7)
 
     
     
     
sns.kdeplot(df.loc[df[
 
     
     
     \'cyl\'] == 
 
     
     
     5, 
 
     
     
     "cty"], shade=
 
     
     
     True, color=
 
     
     
     "deeppink", label=
 
     
     
     "Cyl=5", alpha=
 
     
     
     .7)
 
     
     
     
sns.kdeplot(df.loc[df[
 
     
     
     \'cyl\'] == 
 
     
     
     6, 
 
     
     
     "cty"], shade=
 
     
     
     True, color=
 
     
     
     "dodgerblue", label=
 
     
     
     "Cyl=6", alpha=
 
     
     
     .7)
 
     
     
     
sns.kdeplot(df.loc[df[
 
     
     
     \'cyl\'] == 
 
     
     
     8, 
 
     
     
     "cty"], shade=
 
     
     
     True, color=
 
     
     
     "orange", label=
 
     
     
     "Cyl=8", alpha=
 
     
     
     .7)
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
plt.title(
 
     
     
     \'Density Plot of City Mileage by n_Cylinders\', fontsize=
 
     
     
     22)
 
     
     
     
plt.legend()

23. 直方密度線圖

帶有直方圖的密度曲線將兩個圖表傳達的集體信息彙集在一起，這樣您就可以將它們放在一個圖形而不是兩個圖形中。


    
    
    
 
     
     
     # Import Data
 
     
     
     
df = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Draw Plot
 
     
     
     
plt.figure(figsize=(13,10), dpi= 80)
 
     
     
     
sns.distplot(df.loc[df[
 
     
     
     \'class\'] == 
 
     
     
     \'compact\', 
 
     
     
     "cty"], color=
 
     
     
     "dodgerblue", label=
 
     
     
     "Compact", hist_kws={
 
     
     
     \'alpha\':.7}, kde_kws={
 
     
     
     \'linewidth\':3})
 
     
     
     
sns.distplot(df.loc[df[
 
     
     
     \'class\'] == 
 
     
     
     \'suv\', 
 
     
     
     "cty"], color=
 
     
     
     "orange", label=
 
     
     
     "SUV", hist_kws={
 
     
     
     \'alpha\':.7}, kde_kws={
 
     
     
     \'linewidth\':3})
 
     
     
     
sns.distplot(df.loc[df[
 
     
     
     \'class\'] == 
 
     
     
     \'minivan\', 
 
     
     
     "cty"], color=
 
     
     
     "g", label=
 
     
     
     "minivan", hist_kws={
 
     
     
     \'alpha\':.7}, kde_kws={
 
     
     
     \'linewidth\':3})
 
     
     
     
plt.ylim(0, 0.35)
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
plt.title(
 
     
     
     \'Density Plot of City Mileage by Vehicle Type\', fontsize=22)
 
     
     
     
plt.legend()
 
     
     
     
plt.show()

24. Joy Plot

Joy Plot允許不同組的密度曲線重疊，這是一種可視化相對於彼此的大量組的分佈的好方法。它看起來很悅目，並清楚地傳達了正確的信息。它可以使用joypy基於的包來輕鬆構建matplotlib。


    
    
    
 
     
     
     # !pip install joypy
 
     
     
     

 
     
     
     # Import Data
 
     
     
     
mpg = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     

 
     
     
     

 
     
     
     # Draw Plot
 
     
     
     
plt.figure(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
fig, axes = joypy.joyplot(mpg, column=[
 
     
     
     \'hwy\', 
 
     
     
     \'cty\'], 
 
     
     
     by=
 
     
     
     "class", ylim=
 
     
     
     \'own\', figsize=(
 
     
     
     14,
 
     
     
     10))
 
     
     
     

 
     
     
     

 
     
     
     # Decoration
 
     
     
     
plt.title(
 
     
     
     \'Joy Plot of City and Highway Mileage by Class\', fontsize=
 
     
     
     22)
 
     
     
     
plt.show()

25. 分佈式點圖

分佈點圖顯示按組分割的點的單變量分佈。點數越暗，該區域的數據點集中度越高。通過對中位數進行不同着色，組的真實定位立即變得明顯。


    
    
    
 
     
     
     import matplotlib.patches 
 
     
     
     as mpatches
 
     
     
     

 
     
     
     

 
     
     
     # Prepare Data
 
     
     
     
df_raw = pd.read_csv(
 
     
     
     "https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
 
     
     
     
cyl_colors = {
 
     
     
     4:
 
     
     
     \'tab:red\', 
 
     
     
     5:
 
     
     
     \'tab:green\', 
 
     
     
     6:
 
     
     
     \'tab:blue\', 
 
     
     
     8:
 
     
     
     \'tab:orange\'}
 
     
     
     
df_raw[
 
     
     
     \'cyl_color\'] = df_raw.cyl.map(cyl_colors)
 
     
     
     

 
     
     
     

 
     
     
     # Mean and Median city mileage by make
 
     
     
     
df = df_raw[[
 
     
     
     \'cty\', 
 
     
     
     \'manufacturer\']].groupby(
 
     
     
     \'manufacturer\').apply(
 
     
     
     lambda x: x.mean())
 
     
     
     
df.sort_values(
 
     
     
     \'cty\', ascending=
 
     
     
     False, inplace=
 
     
     
     True)
 
     
     
     
df.reset_index(inplace=
 
     
     
     True)
 
     
     
     
df_median = df_raw[[
 
     
     
     \'cty\', 
 
     
     
     \'manufacturer\']].groupby(
 
     
     
     \'manufacturer\').apply(
 
     
     
     lambda x: x.median())
 
     
     
     

 
     
     
     

 
     
     
     # Draw horizontal lines
 
     
     
     
fig, ax = plt.subplots(figsize=(
 
     
     
     16,
 
     
     
     10), dpi= 
 
     
     
     80)
 
     
     
     
ax.hlines(y=df.index, xmin=
 
     
     
     0, xmax=
 
     
     
     40, color=
 
     
     
     \'gray\', alpha=
 
     
     
     0.5, linewidth=
 
     
     
     .5, linestyles=
 
     
     
     \'dashdot\')
 
     
     
     

 
     
     
     

 
     
     
     # Draw the Dots
 
     
     
     

 
     
     
     for i, make 
 
     
     
     in enumerate(df.manufacturer):
 
     
     
     
    df_make = df_raw.loc[df_raw.manufacturer==make, :]
 
     
     
     
    ax.scatter(y=np.repeat(i, df_make.shape[
 
     
     
     0]), x=
 
     
     
     \'cty\', data=df_make, s=
 
     
     
     75, edgecolors=
 
     
     
     \'gray\', c=
 
     
     
     \'w\', alpha=
 
     
     
     0.5)
 
     
     
     
    ax.scatter(y=i, x=
 
     
     
     \'cty\', data=df_median.loc[df_median.index==make, :], s=
 
     
     
     75, c=
 
     
     
     \'firebrick\')
 
     
     
     

 
     
     
     

 
     
     
     # Annotate    
 
     
     
     
ax.text(
 
     
     
     33, 
 
     
     
     13, 
 
     
     
     "$red ; dots ; are ; the : median$", fontdict={
 
     
     
     \'size\':
 
     
     
     12}, color=
 
     
     
     \'firebrick\')
 
     
     
     

 
     
     
     

 
     
     
     # Decorations
 
     
     
     
red_patch = plt.plot([],[], marker=
 
     
     
     "o", ms=
 
     
     
     10, ls=
 
     
     
     "", mec=
 
     
     
     None, color=
 
     
     
     \'firebrick\', label=
 
     
     
     "Median")
 
     
     
     
plt.legend(handles=red_patch)
 
     
     
     
ax.set_title(
 
     
     
     \'Distribution of City Mileage by Make\', fontdict={
 
     
     
     \'size\':
 
     
     
     22})
 
     
     
     
ax.set_xlabel(
 
     
     
     \'Miles Per Gallon (City)\', alpha=
 
     
     
     0.7)
 
     
     
     
ax.set_yticks(df.index)
 
     
     
     
ax.set_yticklabels(df.manufacturer.str.title(), fontdict={
 
     
     
     \'horizontalalignment\': 
 
     
     
     \'right\'}, alpha=
 
     
     
     0.7)
 
     
     
     
ax.set_xlim(
 
     
     
     1, 
 
     
     
     40)
 
     
     
     
plt.xticks(alpha=
 
     
     
     0.7)
 
     
     
     
plt.gca().spines[
 
     
     
     "top"].set_visible(
 
     
     
     False)    
 
     
     
     
plt.gca().spines[
 
     
     
     "bottom"].set_visible(
 
     
     
     False)    
 
     
     
     
plt.gca().spines[
 
     
     
     "right"].set_visible(
 
     
     
     False)    
 
     
     
     
plt.gca().spines[
 
     
     
     "left"].set_visible(
 
     
     
     False)   
 
     
     
     
plt.grid(axis=
 
     
     
     \'both\', alpha=
 
     
     
     .4, linewidth=
 
     
     
     .1)
 
     
     
     
plt.show()

往期推薦

MySQL用得好好的，爲啥非要轉ES?

人人都能讀懂的編譯器原理

如何高效地準備技術面試？

爲什麼我推薦你使用 IDEA 而不是Eclipse ？

【趣學程序】資源分享頁面

【趣學程序】Hadoop安裝運行

【趣學程序】Hadoop運行模式

【趣學程序】Springboot 整合 Netty 實戰（附源碼）

因用了 Insert into select 語句，這個碼農被開除了！

9個讓你愛不釋手的Json工具

本文分享自微信公衆號 - 趣學程序（quxuecx）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

25個常用Matplotlib圖的Python代碼，乾貨收藏！

如何使用 JS 判斷用戶是否處於活躍狀態

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

數據可視化大屏展示模板整合

知乎面試官：爲什麼不建議在 MySQL 中使用 UTF-8？

25個常用Matplotlib圖的Python代碼，乾貨收藏！

相親被拒。。。這些理由都很扎心！

微信表白代碼，單身汪們還不快收藏！

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結