numpy之random小記

數據分析也斷斷續續學了很多,一直覺得numpy.random 有點模糊,今天來補一下。

 

隨機抽樣 (numpy.random)

簡單的隨機數據

rand(d0, d1, ..., dn)

隨機值

>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
       [ 0.37601032,  0.25528411],  #random
       [ 0.49313049,  0.94909878]]) #random

randn(d0, d1, ..., dn)

返回一個樣本,具有標準正態分佈。

 

Examples

>>> np.random.randn()
2.1923875335537315 #random

Two-by-four array of samples from N(3, 6.25):

>>> 2.5 * np.random.randn(2, 4) + 3
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],  #random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]]) #random

randint(low=0, high, [size])

返回隨機的整數,位於半開區間 [low, high)。

>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])

random_integers(low=0, high, [size])

返回隨機的整數,位於閉區間 [low, high]。

>>> np.random.random_integers(5)
4
>>> type(np.random.random_integers(5))
<type 'int'>
>>> np.random.random_integers(5, size=(3.,2.))
array([[5, 4],
       [3, 3],
       [4, 5]])

Choose five random numbers from the set of five evenly-spaced numbers between 0 and 2.5, inclusive 

>>> 2.5 * (np.random.random_integers(5, size=(5,)) - 1) / 4.
array([ 0.625,  1.25 ,  0.625,  0.625,  2.5  ])

Roll two six sided dice 1000 times and sum the results:

>>> d1 = np.random.random_integers(1, 6, 1000)
>>> d2 = np.random.random_integers(1, 6, 1000)
>>> dsums = d1 + d2

Display results as a histogram:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(dsums, 11, normed=True)
>>> plt.show()

 

random_sample([size])

返回隨機的浮點數,在半開區間 [0.0, 1.0)。

 

Examples

>>> np.random.random_sample()
0.47108547995356098
>>> type(np.random.random_sample())
<type 'float'>
>>> np.random.random_sample((5,))
array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428])

Three-by-two array of random numbers from [-5, 0):

>>> 5 * np.random.random_sample((3, 2)) - 5
array([[-3.99149989, -0.52338984],
       [-2.99091858, -0.79479508],
       [-1.23204345, -1.75224494]])

 

random([size])

返回隨機的浮點數,在半開區間 [0.0, 1.0)。

(官網例子與random_sample完全一樣)

ranf([size])

返回隨機的浮點數,在半開區間 [0.0, 1.0)。

(官網例子與random_sample完全一樣)

sample([size])

返回隨機的浮點數,在半開區間 [0.0, 1.0)。

(官網例子與random_sample完全一樣)

choice(a[, size, replace, p])

生成一個隨機樣本,從一個給定的一維數組

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4])
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0])

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = [‘pooh‘, ‘rabbit‘, ‘piglet‘, ‘Christopher‘]
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array([‘pooh‘, ‘pooh‘, ‘pooh‘, ‘Christopher‘, ‘piglet‘],
      dtype=‘|S11‘)

 

bytes(length)

返回隨機字節。

>>> np.random.bytes(10)
‘ eh\x85\x022SZ\xbf\xa4‘ #random

 

排列

shuffle(x)

現場修改序列,改變自身內容。(類似洗牌,打亂順序)

>>> arr = np.arange(10)
>>> np.random.shuffle(arr)
>>> arr
[1 7 5 2 9 4 3 6 0 8]

 

This function only shuffles the array along the first index of a multi-dimensional array:

>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.shuffle(arr)
>>> arr
array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])

 

permutation(x)

返回一個隨機排列

>>> np.random.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])
>>> np.random.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12])
>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.permutation(arr)
array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

 

分佈

beta(a, b[, size])

貝塔分佈樣本,在 [0, 1]內。

binomial(n, p[, size])

二項分佈的樣本。

chisquare(df[, size])

卡方分佈樣本。

dirichlet(alpha[, size])

狄利克雷分佈樣本。

exponential([scale, size])

指數分佈

f(dfnum, dfden[, size])

F分佈樣本。

gamma(shape[, scale, size])

伽馬分佈

geometric(p[, size])

幾何分佈

gumbel([loc, scale, size])

耿貝爾分佈。

hypergeometric(ngood, nbad, nsample[, size])

超幾何分佈樣本。

laplace([loc, scale, size])

拉普拉斯或雙指數分佈樣本

logistic([loc, scale, size])

Logistic分佈樣本

lognormal([mean, sigma, size])

對數正態分佈

logseries(p[, size])

對數級數分佈。

multinomial(n, pvals[, size])

多項分佈

multivariate_normal(mean, cov[, size])

多元正態分佈。

>>> mean = [0,0]
>>> cov = [[1,0],[0,100]] # diagonal covariance, points lie on x or y-axis
>>> import matplotlib.pyplot as plt
>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, ‘x‘); plt.axis(‘equal‘); plt.show()

 

negative_binomial(n, p[, size])

負二項分佈

noncentral_chisquare(df, nonc[, size])

非中心卡方分佈

noncentral_f(dfnum, dfden, nonc[, size])

非中心F分佈

normal([loc, scale, size])

正態(高斯)分佈

Notes

The probability density for the Gaussian distribution is

技術分享

where 技術分享 is the mean and 技術分享 the standard deviation. The square of the standard deviation, 技術分享, is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at 技術分享 and 技術分享 [R217]).

 

Examples

Draw samples from the distribution:

>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> s = np.random.normal(mu, sigma, 1000)

Verify the mean and the variance:

>>> abs(mu - np.mean(s)) < 0.01
True
>>> abs(sigma - np.std(s, ddof=1)) < 0.01
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 30, normed=True)
>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
...                np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
...          linewidth=2, color=‘r‘)
>>> plt.show()

 

pareto(a[, size])

帕累託(Lomax)分佈

poisson([lam, size])

泊松分佈

power(a[, size])

Draws samples in [0, 1] from a power distribution with positive exponent a - 1.

rayleigh([scale, size])

Rayleigh 分佈

standard_cauchy([size])

標準柯西分佈

standard_exponential([size])

標準的指數分佈

standard_gamma(shape[, size])

標準伽馬分佈

standard_normal([size])

標準正態分佈 (mean=0, stdev=1).

standard_t(df[, size])

Standard Student’s t distribution with df degrees of freedom.

triangular(left, mode, right[, size])

三角形分佈

uniform([low, high, size])

均勻分佈

vonmises(mu, kappa[, size])

von Mises分佈

wald(mean, scale[, size])

瓦爾德(逆高斯)分佈

weibull(a[, size])

Weibull 分佈

zipf(a[, size])

齊普夫分佈

隨機數生成器

RandomState

Container for the Mersenne Twister pseudo-random number generator. 

seed([seed])

Seed the generator.

get_state()

Return a tuple representing the internal state of the generator.

set_state(state)

Set the internal state of the generator from a tuple.

 

其中在數據分析中使用比較多的是RandomState(SEED=0) 其用處是生產可重複獲得的隨機數序列,seed的值可以自己確定。

最後感謝這位大佬的整理:點我

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章