目錄
繪製折線圖 橫軸代表特徵,縱軸代表對應的取值
繪製散點圖 橫縱軸都是特徵
2. 讀取數據和簡單的數據探索 以鳶尾花數據集爲例,查看不同特徵對於類別的區分度
1. matplotlib基礎
繪製折線圖
(1) 繪製基本曲線
常用的是matplotlib.pyplot模塊
每次執行plt.plot(橫座標, 縱座標, color=" ", linestyle="--")都是添加一條曲線
知道執行plt.show(), 繪製最終的曲線
>>> import matplotlib as mpl
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 10, 100)
>>> x
array([ 0. , 0.1010101 , 0.2020202 , 0.3030303 , 0.4040404 ,
0.50505051, 0.60606061, 0.70707071, 0.80808081, 0.90909091,
1.01010101, 1.11111111, 1.21212121, 1.31313131, 1.41414141,
1.51515152, 1.61616162, 1.71717172, 1.81818182, 1.91919192,
2.02020202, 2.12121212, 2.22222222, 2.32323232, 2.42424242,
2.52525253, 2.62626263, 2.72727273, 2.82828283, 2.92929293,
3.03030303, 3.13131313, 3.23232323, 3.33333333, 3.43434343,
3.53535354, 3.63636364, 3.73737374, 3.83838384, 3.93939394,
4.04040404, 4.14141414, 4.24242424, 4.34343434, 4.44444444,
4.54545455, 4.64646465, 4.74747475, 4.84848485, 4.94949495,
5.05050505, 5.15151515, 5.25252525, 5.35353535, 5.45454545,
5.55555556, 5.65656566, 5.75757576, 5.85858586, 5.95959596,
6.06060606, 6.16161616, 6.26262626, 6.36363636, 6.46464646,
6.56565657, 6.66666667, 6.76767677, 6.86868687, 6.96969697,
7.07070707, 7.17171717, 7.27272727, 7.37373737, 7.47474747,
7.57575758, 7.67676768, 7.77777778, 7.87878788, 7.97979798,
8.08080808, 8.18181818, 8.28282828, 8.38383838, 8.48484848,
8.58585859, 8.68686869, 8.78787879, 8.88888889, 8.98989899,
9.09090909, 9.19191919, 9.29292929, 9.39393939, 9.49494949,
9.5959596 , 9.6969697 , 9.7979798 , 9.8989899 , 10. ])
>>> y = np.sin(x)
>>> y
array([ 0. , 0.10083842, 0.20064886, 0.2984138 , 0.39313661,
0.48385164, 0.56963411, 0.64960951, 0.72296256, 0.78894546,
0.84688556, 0.8961922 , 0.93636273, 0.96698762, 0.98775469,
0.99845223, 0.99897117, 0.98930624, 0.96955595, 0.93992165,
0.90070545, 0.85230712, 0.79522006, 0.73002623, 0.65739025,
0.57805259, 0.49282204, 0.40256749, 0.30820902, 0.21070855,
0.11106004, 0.01027934, -0.09060615, -0.19056796, -0.28858706,
-0.38366419, -0.47483011, -0.56115544, -0.64176014, -0.7158225 ,
-0.7825875 , -0.84137452, -0.89158426, -0.93270486, -0.96431712,
-0.98609877, -0.99782778, -0.99938456, -0.99075324, -0.97202182,
-0.94338126, -0.90512352, -0.85763861, -0.80141062, -0.73701276,
-0.66510151, -0.58640998, -0.50174037, -0.41195583, -0.31797166,
-0.22074597, -0.12126992, -0.0205576 , 0.0803643 , 0.18046693,
0.27872982, 0.37415123, 0.46575841, 0.55261747, 0.63384295,
0.7086068 , 0.77614685, 0.83577457, 0.8868821 , 0.92894843,
0.96154471, 0.98433866, 0.99709789, 0.99969234, 0.99209556,
0.97438499, 0.94674118, 0.90944594, 0.86287948, 0.8075165 ,
0.74392141, 0.6727425 , 0.59470541, 0.51060568, 0.42130064,
0.32770071, 0.23076008, 0.13146699, 0.03083368, -0.07011396,
-0.17034683, -0.26884313, -0.36459873, -0.45663749, -0.54402111])
>>> plt.plot(x, y)
[<matplotlib.lines.Line2D object at 0x7f6f2e0df128>]
>>> plt.show()
######繪製多條曲線
>>> siny = y.copy()
>>> cosy = np.cos(x)
>>> cosy
array([ 1. , 0.99490282, 0.97966323, 0.95443659, 0.91948007,
0.87515004, 0.8218984 , 0.76026803, 0.69088721, 0.61446323,
0.53177518, 0.44366602, 0.35103397, 0.25482335, 0.15601496,
0.0556161 , -0.04534973, -0.14585325, -0.24486989, -0.34139023,
-0.43443032, -0.52304166, -0.60632092, -0.68341913, -0.75355031,
-0.81599952, -0.87013012, -0.91539031, -0.95131866, -0.97754893,
-0.9938137 , -0.99994717, -0.9958868 , -0.981674 , -0.95745366,
-0.92347268, -0.88007748, -0.82771044, -0.76690542, -0.69828229,
-0.6225406 , -0.54045251, -0.45285485, -0.36064061, -0.26474988,
-0.16616018, -0.06587659, 0.03507857, 0.13567613, 0.23489055,
0.33171042, 0.4251487 , 0.51425287, 0.59811455, 0.67587883,
0.74675295, 0.8100144 , 0.86501827, 0.91120382, 0.94810022,
0.97533134, 0.99261957, 0.99978867, 0.99676556, 0.98358105,
0.96036956, 0.9273677 , 0.88491192, 0.83343502, 0.77346177,
0.70560358, 0.63055219, 0.54907273, 0.46199582, 0.37020915,
0.27464844, 0.17628785, 0.07613012, -0.0248037 , -0.12548467,
-0.2248864 , -0.32199555, -0.41582217, -0.50540974, -0.58984498,
-0.66826712, -0.7398767 , -0.8039437 , -0.859815 , -0.90692104,
-0.94478159, -0.97301068, -0.99132055, -0.99952453, -0.99753899,
-0.98538417, -0.96318398, -0.93116473, -0.88965286, -0.83907153])
>>> plt.plot(x, siny)
[<matplotlib.lines.Line2D object at 0x7f6f2e89b0b8>]
########指定曲線顏色,樣式
>>> plt.plot(x, siny)
[<matplotlib.lines.Line2D object at 0x7f6f2e7e3518>]
>>> plt.plot(x, cosy, color="red", linestyle="--")
[<matplotlib.lines.Line2D object at 0x7f6f2e7e3898>]
>>> plt.show()
>>>
(2) 添加更多描述信息
可以調整橫縱軸的範圍:
plt.xlim(-5, 15)
plt.ylim(-2, 2)
也可以同時調節(前兩個是x, 後兩個是y軸):
ply.axis([-1, 11, -2, 2])
對於座標軸的label設置:
plt.xlabel("x axis")
plt.ylabel("y value")
對於曲線的說明:
plt.plot(x, siny, label="sin(x)")
plt.legend()#必須在show之前加上,才能顯示出label
>>> plt.plot(x, siny, label="sin(x)")
[<matplotlib.lines.Line2D object at 0x7f6f2d7cbcf8>]
>>> plt.plot(x, cosy, label="cos(x)")
[<matplotlib.lines.Line2D object at 0x7f6f2d7cbe48>]
>>> plt.xlabel("x axis")
Text(0.5, 0, 'x axis')
>>> plt.ylabel("y axis")
Text(0, 0.5, 'y axis')
####必須加上下面一句, label才能顯示出來!!!
>>> plt.legend()
<matplotlib.legend.Legend object at 0x7f6f2d7d34a8>
>>>plt.title("welcome!")#爲整張圖加上標題
>>> plt.show()
>>>
繪製散點圖
折線圖和散點圖的使用語法基本相同,只是變成plt.scatter()
>>> plt.scatter(x, siny)
<matplotlib.collections.PathCollection object at 0x7f6f2d7f80f0>
>>> plt.scatter(x, cosy, color="red")
<matplotlib.collections.PathCollection object at 0x7f6f2e73dd68>
>>> plt.show()
但是折線圖和散點圖運用的場景不同:
折線圖的橫軸是特徵,縱軸是取值
而散點圖的橫縱軸都是特徵,所以散點圖一般這麼用:
其中alpha參數代表繪製的點的透明度
>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> plt.scatter(x, y, alpha="0.3")
<matplotlib.collections.PathCollection object at 0x7f6f2e0df4e0>
>>> plt.show()
2. 讀取數據和簡單的數據探索
以鳶尾花數據集爲例, 在datasets中是以字典的形式進行存儲
通過keys()可以查看其都有哪些字段
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> iris.keys()
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
可以打印出鳶尾花數據集的介紹:
共有150個樣本
每個樣本有4個特徵
共有三類
>>> print(iris.DESCR)
.. _iris_dataset:
Iris plants dataset
--------------------
**Data Set Characteristics:**
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
相關的數據查看(每一類不同樣式顯示)
#查看樣本數據的shape,一共150個樣本,每個樣本那包含4個屬性
>>> iris.data.shape
(150, 4)
#查看樣本數據的標籤
>>> iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
>>> iris.target.shape
(150,)
>>> iris.target
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>>
數據的處理與繪圖
只基於前兩個特徵,觀察類別分佈情況:
>>> y = iris.target
>>> y.shape
(150,)
>>> X = iris.data
>>> X.shape
(150, 4)
>>> X_be = X[:,:2]
>>> X_be.shape
(150, 2)
>>> plt.scatter(X_be[y==0, 0], X_be[y==0, 1], color="red", marker="o")
<matplotlib.collections.PathCollection object at 0x7f6efc3e5a58>
>>> plt.scatter(X_be[y==1, 0], X_be[y==1, 1], color="blue", marker="x")
<matplotlib.collections.PathCollection object at 0x7f6efc3e5d68>
>>> plt.scatter(X_be[y==2, 0], X_be[y==2, 1], color="green", marker="+")
<matplotlib.collections.PathCollection object at 0x7f6efc3e5eb8>
>>> plt.show()
>>>
可見,前兩個特徵可以將第一類和第二類很好的分隔開, 但是第二類與第三類不能很好的分隔開
下面, 只基於後兩個特徵觀察類別的分佈情況:
>>> X_af = X[:, 2:]
>>> X_af.shape
(150, 2)
>>> plt.scatter(X_af[y==0, 0], X_af[y==0, 1], color="red", marker="o")
<matplotlib.collections.PathCollection object at 0x7f6efc3e5470>
>>> plt.scatter(X_af[y==1, 0], X_af[y==1, 1], color="blue", marker="x")
<matplotlib.collections.PathCollection object at 0x7f6efc3e5d68>
>>> plt.scatter(X_af[y==2, 0], X_af[y==2, 1], color="green", marker="+")
<matplotlib.collections.PathCollection object at 0x7f6efc38ce80>
>>> plt.show()
可見對於鳶尾花數據集,後兩個特徵對於分類更有區分度