matlab 讀取txt文件以及進行數據處理

      週日做了信息論的小project,差不讀熬了一個晚上加週一的早上,終於利用matlab成功的讀取了txt文件中的英文單詞以及簡單的數據處理,現在進行簡單的分享。
       百度經驗:matlab如何讀取txt文件:                     https://jingyan.baidu.com/article/b87fe19e6b478852183568e1.html
      
 代碼:
  1. function [] = work3()
  2. clc
  3. clear all
  4. close all
  5. %% read data
  6. ch = fileread('harry1.txt');  
  7. ch = strrep(ch,',',' ');   %%將逗號換成空格
  8. ch = lower(ch);           %%排成一行
  9. ch = reshape(strsplit(ch),[],1);     %%將cell類型的數據轉換成char,每個單詞一行。
  10. %% 1-gram
  11. gram1=ch;
  12. [words,~,idx] = unique(char(gram1),'rows');
  13. numOccurrences = histcounts(idx,length(words));
  14. numOccurrences =sort(numOccurrences );
  15. [err1,H1]=errH(numOccurrences);
  16. %% 2-gram
  17. clear words idx numOccurrences
  18. gram2=char(ch);
  19. [gram2_row,~]=size(gram2);
  20. for i=1:gram2_row/2
  21.     gram2_reshap(i,:)=[gram2(2*i-1,:) gram2(2*i,:)];
  22. end
  23. [words,~,idx] = unique(gram2_reshap,'rows');
  24. [word_row,~]=size(words);
  25. numOccurrences = histcounts(idx,word_row);
  26. numOccurrences =sort(numOccurrences );
  27. [err2,H2]=errH(numOccurrences);
  28. %% 3-gram
  29. clear words idx numOccurrences
  30. gram3=char(ch);
  31. [gram3_row,~]=size(gram3);
  32. for i=1:gram3_row/3
  33.     gram3_reshap(i,:)=[gram3(3*i-2,:) gram3(3*i-1,:) gram3(3*i,:)];
  34. end
  35. [words,~,idx] = unique(gram3_reshap,'rows');
  36. [word_row,~]=size(words);
  37. numOccurrences = histcounts(idx,word_row);
  38. numOccurrences =sort(numOccurrences );
  39. [err3,H3]=errH(numOccurrences);






  40. figure
  41. stairs(err1,H1/H1(1),'r')
  42. titleName = ['N=',num2str(H1(1))];
  43. hold on
  44. stairs(err2,H2/H2(1),'b')
  45. hold on
  46. stairs(err3,H3/H3(1),'k')

  47. title(titleName,'fontsize',16,'fontweight','bold');
  48. xlabel('誤差','fontsize',16,'fontweight','bold');
  49. ylabel('H/N','fontsize',16,'fontweight','bold');
  50. legend('1-gram','2-gram','3-gram');


  51. end

  52. %%
  53. %% 糾錯函數
  54. function [err,H]=errH(numOccurrences)
  55. x_remain=sum(numOccurrences);
  56. p_num=numOccurrences /sum(numOccurrences);
  57. num=1;
  58. err(1)=0;
  59. H(1)=log2(x_remain);
  60. for r=1:length(numOccurrences)
  61. %     hwait=waitbar(num/sum(numOccurrences),'請等待>>>>>>>>');

  62.     for n=1:numOccurrences(r)
  63.     num=num+1;
  64.     x_remain=x_remain-1;
  65.     err(num)=err(num-1)+p_num(r)/numOccurrences(r);
  66.     H(num)=log2(x_remain);
  67.     end
  68. end
  69. end
  70.   

發佈了78 篇原創文章 · 獲贊 17 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章