numpy 在机器学习中常用函数总结

1. 矩阵的重建：

numpy.reshape(a, newshape, order='C')

矩阵重建，将矩阵a变成新的形状如：

>>> a = np.arange(6).reshape((3, 2))
>>> a
array([[0, 1],
       [2, 3],
       [4, 5]])

>>> np.reshape(a, (2, 3)) # C-like index ordering
array([[0, 1, 2],
       [3, 4, 5]])

>>> np.reshape(a, (3,-1))       # the unspecified value is inferred to be 2
array([[1, 2],
       [3, 4],
       [5, 6]])

2. 矩阵相乘

numpy.matmul(a, b, out=None)

将两个矩阵a,b相乘，即数学中的矩阵乘法

For 2-D arrays it is the matrix product:

>>> a = [[1, 0], [0, 1]]
>>> b = [[4, 1], [2, 2]]
>>> np.matmul(a, b)
array([[4, 1],
       [2, 2]])


For 2-D mixed with 1-D, the result is the usual>>

>>> a = [[1, 0], [0, 1]]
>>> b = [1, 2]
>>> np.matmul(a, b)
array([1, 2])
>>> np.matmul(b, a)
array([1, 2])

3. 生成服从正态分布(均值为0，方差为1)的数据

np.random.randn(d1,d2,d3....)

1) 当函数括号内没有参数时，则返回一个浮点数；
2）当函数括号内有一个参数时，则返回秩为1的数组，不能表示向量和矩阵；
3）当函数括号内有两个及以上参数时，则返回对应维度的数组，能表示向量或矩阵；
4）np.random.standard_normal（）函数与np.random.randn()类似，但是np.random.standard_normal（）
的输入参数为元组（tuple）.
5) np.random.randn()的输入通常为整数，但是如果为浮点数，则会自动直接截断转换为整数。

import numpy as np
>>>np.random.randn()
>>>np.random.randn(1)
>>>np.random.randn(2)
>>>np.random.randn(3,3)
>>>np.random.randn(5,2)

4. 生成多元正态分布矩阵：

multivariate_normal(mean, cov, size=None, check_valid=None, tol=None)

其中mean和cov为必要的传参，而size和check_valid以及tol为可选参数。

mean：多为分布的均值　cov：协方差矩阵，如果是需要自己生成时需要注意，协方差矩阵应该是半正定矩阵

size：指定生成的正态分布矩阵的维度

5 . 生成均匀分布的数据样本

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

在指定的间隔内返回均匀间隔的数字。

start：起始值　　stop：结束值　　num：需要的样本个数

综合使用：自己生成样本数据，然后利用逻辑回归进行拟合：

import numpy as np

num_ob = 100
np.random.seed(12)

#利用高斯分布来生成样本，样本的个数为100个，每一个样本都是20维，相当于20个属性，
#因此需要设计每个属性的均值，这也就是为什么第一个参数是20维，同时也要设计属性与属性之间的方差，也就是协方差矩阵为20*20
x = np.random.rand(20,20)
#这一行将cov = x的转置×ｘ　是保证矩阵为半正定
cov = np.matmul(x.T, x)
#利用高斯分布生成样本，第一个参数为均值，第二个为协方差护着方差，第三个为样本数量
X1 = np.random.multivariate_normal(np.random.rand(20),cov, num_ob)
X2 = np.random.multivariate_normal(np.random.rand(20)+5,cov, num_ob)
#以横向或者竖向的方式来堆叠样本， horizontal 和vertical
X = np.vstack((X1,X2)).astype(np.float32)
y = np.hstack((np.zeros(num_ob), np.ones(num_ob)))

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(fit_intercept=True, C = 0.1, penalty='l1',solver='liblinear')
clf.fit(X,y)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

numpy 在机器学习中常用函数总结

2024年DataOps趋势预测：AI不会取代数据工程师

云原生周刊：K8s 中的服务和网络｜ 2024.4.29

[转帖]cpupower

今天，昨天，近七天，近30天，近90天，js封装

华为云云原生FinOps解决方案，释放云原生最大价值

cartographer編譯過程遇到未定義的dlclose@@GLIBC_2.2.5

【從理論到代碼】旋轉矩陣與歐拉角一

Unknown CMake command "rosbuild_add_executable".

SLAM中直接法分類及對應的項目

ORB_SLAM2中的疑難雜症

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

numpy 在机器学习中 常用函数总结

numpy 在机器学习中常用函数总结