取樣問題-獲取隨機樣本

需求：在n份的調查中隨機獲取m份樣本(m<=n)，設計程序完成。並且m個樣本中沒有重複。

第一種解決方案：

根據概率論的知識：我們假設(n=5，m=2)，我們選擇數字0的概率爲：2/5。可以用下面語句實現：

if((rand() % 5) < 2)

當我們在選擇0的概率下選擇1的概率爲1/4，在沒有選擇0的概率下選擇1的概率爲2/4，可以分別通過下面語句實現。

//選擇0 那麼 m--,同時n--
if((rand() % (n-1) < m-1)
//沒有選擇0 只需要n--
if((rand() % n < m)

所以，我們只需要通過遍歷樣本的n元素，然後分別判斷概率即可獲得，m個隨機數。實現代碼如下：

#include <iostream>
#include <cstdlib>
#include <ctime>

using namespace std;

int main()
{
   srand((unsigned)time(NULL));
   int m = 5;
   int n = 20;

   for(int i=0; i<n; i++)
   {
        if(rand()%(n-i) < m)
        {
            cout<<i<<endl;
            m--;
        }
   }
    return 0;
}

程序輸出如下：

分析：該算法需要遍歷整個集合的n個樣本，時間複雜度爲O(n)，空間複雜度爲O(m)。

第二種解決方案：

可以使用C++模板庫中的set集合，來實現無重複元素的插入，思路比較清晰，實現如下：

#include <iostream>
#include <cstdlib>
#include <ctime>
#include <set>

using namespace std;

int main()
{
   srand((unsigned)time(NULL));
   int m = 5;
   int n = 20;

   set<int> S;
   while(S.size() < m)
        S.insert(rand() % n);
   
   set<int>::iterator it;
   for(it=S.begin(); it!=S.end(); ++it)
        cout<<*it<<" ";

   cout<<endl;
    return 0;
}

程序運行結果：

分析：該算法使用STL的關聯容器set集合，而set使用紅黑樹實現的，紅黑樹又能保證在最壞的情況下每次插入新元素只需要 O(logm) 的時間，而遍歷集合需要 O(m) ，所以需要 O(m log m) 的時間複雜度，但是所需要數據結構的開銷比較大。

第三種解決方案：

思路：因爲需要的是m(m<=n)個元素，所以只需要將前m個元素的順序打亂（通過產生一個隨機數，交換他倆的位置來實現），然後排序輸出，前m個元素的內容就可以。

#include <iostream>
#include <cstdlib>
#include <ctime>
#include <set>
#include <algorithm>

using namespace std;

const int m = 5;
const int n = 20;

int main()
{
   srand((unsigned)time(NULL));
  
   int *x = new int[n];
   for(int i=0; i<n; ++i)//初始化n個元素值
        x[i] = i;

    int ran;
    int tmp;
    for(int i=0; i<m; ++i)
    {
         ran = i + rand()%(n-1-i);//產生一個隨機數[i, n-1)，包括它本身。
         tmp = x[i];
         x[i] = x[ran];
         x[ran] = tmp;
    }

    sort(x, x+m);

    for(int i=0; i<m; i++)
        cout<<x[i]<<" ";
    cout<<endl;


    return 0;
}

程序輸出：3 8 12 14 17 [Finished in 0.1s]

分析：初始化n個元素需要O(n)的時間及空間，以及O(m log m)的排序時間，所以的效率不如第一種方法。

三種方法的總結：

第一種需要遍歷整個n所以適合當m較大時，即（m>n/2)

第二種方案的複雜度跟m關係很大，所以適合當m較小的時候。

第三種方案：兩種都比較適合，比如說我們需要n爲100萬時，m爲n-10時，我們只需要生成10個數，然後把剩下的數字排序輸出就可以。

參考：《編程珠璣》第十二章取樣問題。

開始研究boost。++

取樣問題-獲取隨機樣本

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

本地SSL證書過期輸入命令在IIS自動生成

算法入門數學題hdoj 1004

阿里忙收購，李彥宏在忙…“深度學習”

一些做的過的零散的題目(poj)（二）:

vim 快捷鍵整理

動態規劃-最大子串 HDOJ1003

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結