數據分析也斷斷續續學了很多，一直覺得numpy.random 有點模糊，今天來補一下。

隨機抽樣 (numpy.random)

簡單的隨機數據

rand(d0, d1, ..., dn)	隨機值 >>> np.random.rand(3,2) array([[ 0.14022471, 0.96360618], #random [ 0.37601032, 0.25528411], #random [ 0.49313049, 0.94909878]]) #random
randn(d0, d1, ..., dn)	返回一個樣本，具有標準正態分佈。 Examples >>> np.random.randn() 2.1923875335537315 #random Two-by-four array of samples from N(3, 6.25): >>> 2.5 * np.random.randn(2, 4) + 3 array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], #random [ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) #random
randint(low=0, high, [size])	返回隨機的整數，位於半開區間 [low, high)。 >>> np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) >>> np.random.randint(1, size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Generate a 2 x 4 array of ints between 0 and 4, inclusive: >>> np.random.randint(5, size=(2, 4)) array([[4, 0, 2, 1], [3, 2, 2, 0]])
random_integers(low=0, high, [size])	返回隨機的整數，位於閉區間 [low, high]。 >>> np.random.random_integers(5) 4 >>> type(np.random.random_integers(5)) <type 'int'> >>> np.random.random_integers(5, size=(3.,2.)) array([[5, 4], [3, 3], [4, 5]]) Choose five random numbers from the set of five evenly-spaced numbers between 0 and 2.5, inclusive >>> 2.5 * (np.random.random_integers(5, size=(5,)) - 1) / 4. array([ 0.625, 1.25 , 0.625, 0.625, 2.5 ]) Roll two six sided dice 1000 times and sum the results: >>> d1 = np.random.random_integers(1, 6, 1000) >>> d2 = np.random.random_integers(1, 6, 1000) >>> dsums = d1 + d2 Display results as a histogram: >>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(dsums, 11, normed=True) >>> plt.show()
random_sample([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。 Examples >>> np.random.random_sample() 0.47108547995356098 >>> type(np.random.random_sample()) <type 'float'> >>> np.random.random_sample((5,)) array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428]) Three-by-two array of random numbers from [-5, 0): >>> 5 * np.random.random_sample((3, 2)) - 5 array([[-3.99149989, -0.52338984], [-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])
random([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample完全一樣）
ranf([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample完全一樣）
sample([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample完全一樣）
choice(a[, size, replace, p])	生成一個隨機樣本，從一個給定的一維數組 Examples Generate a uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3) array([0, 3, 4]) >>> #This is equivalent to np.random.randint(0,5,3) Generate a non-uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) Generate a uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False) array([3,1,0]) >>> #This is equivalent to np.random.permutation(np.arange(5))[:3] Generate a non-uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance: >>> aa_milne_arr = [‘pooh‘, ‘rabbit‘, ‘piglet‘, ‘Christopher‘] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array([‘pooh‘, ‘pooh‘, ‘pooh‘, ‘Christopher‘, ‘piglet‘], dtype=‘\|S11‘)
bytes(length)	返回隨機字節。 >>> np.random.bytes(10) ‘ eh\x85\x022SZ\xbf\xa4‘ #random

排列

shuffle(x)

現場修改序列，改變自身內容。（類似洗牌，打亂順序）

>>> arr = np.arange(10)
>>> np.random.shuffle(arr)
>>> arr
[1 7 5 2 9 4 3 6 0 8]

This function only shuffles the array along the first index of a multi-dimensional array:

>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.shuffle(arr)
>>> arr
array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])

permutation(x)

返回一個隨機排列

>>> np.random.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])

>>> np.random.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12])

>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.permutation(arr)
array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

分佈

beta(a, b[, size])	貝塔分佈樣本，在 [0, 1]內。
binomial(n, p[, size])	二項分佈的樣本。
chisquare(df[, size])	卡方分佈樣本。
dirichlet(alpha[, size])	狄利克雷分佈樣本。
exponential([scale, size])	指數分佈
f(dfnum, dfden[, size])	F分佈樣本。
gamma(shape[, scale, size])	伽馬分佈
geometric(p[, size])	幾何分佈
gumbel([loc, scale, size])	耿貝爾分佈。
hypergeometric(ngood, nbad, nsample[, size])	超幾何分佈樣本。
laplace([loc, scale, size])	拉普拉斯或雙指數分佈樣本
logistic([loc, scale, size])	Logistic分佈樣本
lognormal([mean, sigma, size])	對數正態分佈
logseries(p[, size])	對數級數分佈。
multinomial(n, pvals[, size])	多項分佈
multivariate_normal(mean, cov[, size])	多元正態分佈。 >>> mean = [0,0] >>> cov = [[1,0],[0,100]] # diagonal covariance, points lie on x or y-axis >>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, ‘x‘); plt.axis(‘equal‘); plt.show()
negative_binomial(n, p[, size])	負二項分佈
noncentral_chisquare(df, nonc[, size])	非中心卡方分佈
noncentral_f(dfnum, dfden, nonc[, size])	非中心F分佈
normal([loc, scale, size])	正態(高斯)分佈 Notes The probability density for the Gaussian distribution is where is the mean and the standard deviation. The square of the standard deviation, , is called the variance. The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at and [R217]). Examples Draw samples from the distribution: >>> mu, sigma = 0, 0.1 # mean and standard deviation >>> s = np.random.normal(mu, sigma, 1000) Verify the mean and the variance: >>> abs(mu - np.mean(s)) < 0.01 True >>> abs(sigma - np.std(s, ddof=1)) < 0.01 True Display the histogram of the samples, along with the probability density function: >>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 30, normed=True) >>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * ... np.exp( - (bins - mu)*2 / (2 sigma**2) ), ... linewidth=2, color=‘r‘) >>> plt.show()
pareto(a[, size])	帕累託（Lomax）分佈
poisson([lam, size])	泊松分佈
power(a[, size])	Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
rayleigh([scale, size])	Rayleigh 分佈
standard_cauchy([size])	標準柯西分佈
standard_exponential([size])	標準的指數分佈
standard_gamma(shape[, size])	標準伽馬分佈
standard_normal([size])	標準正態分佈 (mean=0, stdev=1).
standard_t(df[, size])	Standard Student’s t distribution with df degrees of freedom.
triangular(left, mode, right[, size])	三角形分佈
uniform([low, high, size])	均勻分佈
vonmises(mu, kappa[, size])	von Mises分佈
wald(mean, scale[, size])	瓦爾德（逆高斯）分佈
weibull(a[, size])	Weibull 分佈
zipf(a[, size])	齊普夫分佈

隨機數生成器

RandomState	Container for the Mersenne Twister pseudo-random number generator.
seed([seed])	Seed the generator.
get_state()	Return a tuple representing the internal state of the generator.
set_state(state)	Set the internal state of the generator from a tuple.

其中在數據分析中使用比較多的是RandomState(SEED=0) 其用處是生產可重複獲得的隨機數序列，seed的值可以自己確定。

最後感謝這位大佬的整理：點我

numpy之random小記

隨機抽樣 (numpy.random)

簡單的隨機數據

排列

分佈

隨機數生成器

ETL 之kettle 8下載

Pyspark ValueError: Cannot run multiple SparkContexts at once 解決之道

八斗十六期系列學習比記--The authenticity of host 'node2 (xxx.xxx.xxx.xxx)' can't be established.

windows 和 Linux 添加環境變量

Vscode python debug過程中Terminal 終端路徑的設置

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結