統計不同單詞在文本中出現的次數

原創

柚子的power

2020-06-27 09:05

題目：編寫程序，讀入幾行文本，並打印一個表格，顯示每個不同單詞在文本中出現的次數。

算法步驟：

1.預處理。讀入文本文件，將文本中每一行讀入到字符串tmp中，使用append函數將tmp添加到字符串s中。

ifstream input("file.txt");
	if (!input)
	{
		cerr << "The file cannot open." << endl;
		exit(1);
	}
	string s="", tmp;
	
	//將所有文本讀入放在s中；
	while (getline(input, tmp))
	{
		s.append(tmp);
		tmp.clear();
	}

2.去除標點符號。將s中的所有標點符號用空格" "代替，關鍵點：使用ispunct()函數判斷是否爲標點符號。

//將s中的所有標點符號用" "代替
	for (size_t i = 0; i < s.length(); i++)
	{
		if (ispunct(s[i]))//判斷是否爲標點符號
			s[i]=' ';//string串中的每個元素是字符而不是字符串
	}

3.統計單詞。在處理好的字符串s中提取每個單詞，方法：創建stringstream對象ss，將s輸入到緩衝區ss 中，再將緩衝區中的每個單詞輸出到string類型的word中。創建unordered_map,判斷再unordered_map中是否存在word，若存在則在對應關鍵字的value上進行+1操作，否則，插入一個鍵值對。Ps：stringstream的用法參見https://blog.csdn.net/nwpu_yike/article/details/22100615。

//統計單詞
	string word;
	stringstream ss(s);//將字符串s放到s流緩衝區中
	unordered_map<string, int> strMap;
	unordered_map<string, int>::iterator it;
	while (ss >> word)
	{
		it= strMap.find(word);
		if (it == strMap.end())
		{
			strMap.insert(make_pair(word, 1));
		}
		else
			strMap[word]++;
	}

運行結果：

完整代碼：

#include <iostream>
#include <fstream>
#include <string>
#include <unordered_map>
#include <iomanip>
#include <sstream>
using namespace std;

int main()
{
	ifstream input("file.txt");
	if (!input)
	{
		cerr << "The file cannot open." << endl;
		exit(1);
	}
	string s="", tmp;
	
	//將所有文本讀入放在s中；
	while (getline(input, tmp))
	{
		s.append(tmp);
		tmp.clear();
	}
	
	//將s中的所有標點符號用" "代替
	for (size_t i = 0; i < s.length(); i++)
	{
		if (ispunct(s[i]))//判斷是否爲標點符號
			s[i]=' ';//string串中的每個元素是字符而不是字符串
	} 
	
	//統計單詞
	string word;
	stringstream ss(s);//將字符串s放到s流緩衝區中
	unordered_map<string, int> strMap;
	unordered_map<string, int>::iterator it;
	while (ss >> word)
	{
		it= strMap.find(word);
		if (it == strMap.end())
		{
			strMap.insert(make_pair(word, 1));
		}
		else
			strMap[word]++;
	}

	cout << setw(10) << "Words" << setw(10) << "number"<<endl;
	for (auto item : strMap)
	{
		cout << setw(10) << item.first << setw(10) << item.second << endl;
	}
}

參考博文：https://blog.csdn.net/Gorgeous_mj/article/details/90317704

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

統計不同單詞在文本中出現的次數

Python 潮流週刊#52：Python 處理 Excel 的資源

位運算方法判斷是否爲2的冪次和4的冪次

判斷二進制中有幾個1

1005. K 次取反後最大化的數組和

459.重複的字字符串

統計不同單詞在文本中出現的次數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結