這篇 blog 將展示用 matlab 計算並畫出大量數據的 CDF (累計分佈函數)的兩種方法。第一種是我自己於2012年寫的,後來用的過程中發現有缺陷;後來2014年寫另一篇paper時,搜尋到第二種簡易又高效的方法。這裏我給出它們各自的用例,包括畫圖用的數據與腳本,以及效果圖。For your reference.
============================================================================================
Section A. 第一種方法
今天(2012-10-17)有一些數據需要處理,這些數據好不容易從文件中剝離了出來,然後自己寫了一個function,計算並控制 plot 這些數據的 CDF 圖。因爲第一種方法用到的例子的數據文件太大,就沒有貼上來。如果有想親自試驗一下這個過程的同學,請參照下文中第二個方法中的完整用例。
% ----------------------- 自實現 CDF 計算 function: funcCDF.m
- % para@1: CNT_pnts, the number of points to denote the CDF;
- % para@2: Range_low, the lower bound of variable;
- % para@3: Range_up, the upper bound of variable;
- % para@4 : arr_Vals, array of the values to be processed.
- function [x, CDF_Vals] = funcCDF(CNT_pnts, Range_low, Range_up, arr_Vals)
- data = sort( arr_Vals' ); % T', horizon arrays of T.
- N = length(data);
- stepLen = (Range_up-Range_low)/CNT_pnts;
- Counter = zeros(1,CNT_pnts);
- for i = 1:1:N
- for j = 1:1:CNT_pnts
- if ( data(1,i) <= (Range_low + j*stepLen) )
- Counter(1,j) = Counter(1,j) + 1;
- end
- end
- end
- CDF = Counter(1,:)./N;
- CDF_Vals = CDF(1,:)';
- x = (Range_low+stepLen):stepLen:Range_up;
- % ---- end of func.
% --------------------- 2 use cases:
- CNT_pnts = 100;
- deadline_N500r1 = 550;
- deadline_N500r3 = 270;
- deadline_N500r5 = 240;
- PntVal_N500Tau100r1 = textread('N500Tau100r1.tr','%*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %.2f');
- [x_r1,cdf_r1] = funcCDF(CNT_pnts, 0, deadline_N500r1, PntVal_N500Tau100r1);
- plot(x_r1, cdf_r1, 'ob')
- hold on
- PntVal_N500Tau100r3 = textread('N500Tau100r3.tr','%*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %.2f');
- [x_r3,cdf_r3] = funcCDF(CNT_pnts, 0, deadline_N500r3, PntVal_N500Tau100r3);
- plot(x_r3, cdf_r3, 'or')
- hold on
- PntVal_N500Tau100r5 = textread('N500Tau100r5.tr','%*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %.2f');
- [x_r5,cdf_r5] = funcCDF(CNT_pnts, 0, deadline_N500r5, PntVal_N500Tau100r5);
- plot(x_r5, cdf_r5, 'oc')
- grid
% --------------------- 3 效果圖:
Fig.1 CDF_N200r3--Tau-60-80-100-100Pnts
Fig.2 CDF_N500Tau100--r-1-3-5-100Pnts
當把參數 CNT_pnts = 100; 調爲 CNT_pnts = 50; 後,顯示在圖中的點就會減少一半,shows as follow:
Fig.3 CDF_N500Tau100--r-1-3-5-50Pnts
Davy_H (2012-10-17)
============================================================================================
Section B. 第二種方法
今天(2014-10-15) 回過頭來看這篇blog,前邊貼的圖太醜,而且其實第一種方法有不完美的地方,即數據少的時候,曲線有時不會從原點開始畫。後來尋到更好的方法來畫 CDF 圖,爲了對得起2000+的訪問量,所以,今日我決定花些時間,把更好的例子分享出來。
廢話不多說:1)效果圖;2)部分數據文件;3)畫圖的腳本。
1) ------------------
3) ------------------ Codes:
- clear;
- % --------------- A. Read the Data.
- X_ = textread('_Trace_file.tr','%*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%f');
- CNT_resolve_times = textread('_Trace_file.tr','%*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%*s %*s%d %*s%*s' );
- % --------------- B. Count the the Costs.
- % --------- X_items is the "-Threshold"
- X_items =[0.0,0.01,0.05,0.1,0.2,0.3];
- CNT_X = length(X_items);
- % --------- Define the range_x of the x_coordinate in the figure.
- step = 1;
- range_end = 50;
- range_x = 0:step:range_end;
- figure
- % ---------- Format of figure:
- TextFontSize=18;
- LegendFontSize = 16;
- set(0,'DefaultAxesFontName','Times',...
- 'DefaultLineLineWidth',2,...
- 'DefaultLineMarkerSize',8);
- set(gca,'FontName','Times New Roman','FontSize',TextFontSize);
- set(gcf,'Units','inches','Position',[0 0 6.0 4.0]);
- % ---------- Format of figure:~
- % ------ Plot lines
- for i = 1:1:CNT_X
- Val_item = X_items(i);
- idx_it_Lazy = find( X_ == Val_item );
- % --- 1 CNT_STimes
- CNT_Re_times_its = [];
- CNT_Re_times_its = CNT_resolve_times( idx_it_Lazy );
- % --- 2 Plot CDF of CNT_Resloving_times, i.e., the "CNT_STimes" in the trace file.
- if (i==1) linePoint_type = '-sk'; step = 5; range_x = 0:step:range_end;
- elseif (i==2) linePoint_type = '-^r';
- elseif (i==3) linePoint_type = '-+b'; step = 1; range_x = 0:step:range_end;
- elseif (i==4) linePoint_type = '-c'; step = 1; range_x = 0:step:range_end;
- elseif (i==5) linePoint_type = '--g'; step = 1; range_x = 0:step:range_end;
- elseif (i==6) linePoint_type = '-.m'; step = 1; range_x = 0:step:range_end;
- end
- %%% ====== Critical Code of CDF-Ploting :
- h_rtl = hist( CNT_Re_times_its, range_x );
- pr_approx_cdf = cumsum(h_rtl) / ( sum(h_rtl) );
- %%% ====== Critical Code of CDF-Ploting :~
- handler = plot( range_x, pr_approx_cdf, linePoint_type );
- if (i==4) h4 = handler;
- elseif (i==5) h5 = handler;
- elseif (i==6) h6 = handler;
- end
- hold on
- end
- % --------- Set the other formats of the figure :
- grid off
- axis([0 range_end 0 1.0])
- ylabel('CDF')
- xlabel('Resolving times')
- % --------- Plot the multi-legends :
- hg1=legend('{\it \chi_0}=0', '{\it \chi_0}=0.01', '{\it \chi_0}=0.05', 0);
- set(hg1,'FontSize',LegendFontSize);
- ah1 = axes('position',get(gca,'position'), 'visible','off');
- hg2 = legend(ah1, [h4,h5,h6], '{\it \chi_0}=0.10','{\it \chi_0}=0.20','{\it \chi_0}=0.30', 0);
- set(hg2,'FontSize',LegendFontSize);
- % --------- Plot the multi-legends :~
- % --------- Set the other formats of the figure :~
關鍵代碼處,我已經做了註釋,此處再強調一下:
1. 畫 CDF 的2句關鍵代碼,其中的3個 functions 請自己查詢。
%%% ====== Critical operation of CDF-Ploting :
h_rtl = hist
( CNT_Re_times_its, range_x );
pr_approx_cdf =
cumsum(h_rtl) / ( sum(h_rtl) );
%%% ====== Critical operation of CDF-Ploting :~
2. for 循環中的那一段 if else 語句,是爲了設置各條曲線的點線型( linePoint type ) 與 各條線上的取樣點的密度。
3. 此外,從這個腳本里,也可以額外獲取畫多個圖例 (plot multiple legends) 的方法。
y = evrnd(0,3,100,1);
cdfplot(y);
hold on;
x = -20:0.1:10;
f = evcdf(x,0,3);
plot(x,f,'m');
legend('Empirical','Theoretical','Location','NW')