前言
Kumo是一個開源的Java詞雲工具,可以快速構建詞雲圖片。
開源地址
- Github:
https://github.com/kennycason/kumo
- 碼雲:
https://gitee.com/lyy289065406/kumo
Maven
<dependency>
<groupId>com.kennycason</groupId>
<artifactId>kumo-core</artifactId>
<version>1.13</version>
</dependency>
<dependency>
<groupId>com.kennycason</groupId>
<artifactId>kumo-tokenizers</artifactId>
<version>1.12</version>
</dependency>
使用方法
首先創建一個FrequencyAnalyzer
(詞頻分析)對象,該對象用來統計我們所需要構造詞雲數據的頻率,主要包含以下方法:
方法名 | 參數 | 說明 |
---|---|---|
load | InputStream | 從流中獲取語料 |
load | File | 從文件中獲取語料 |
load | String | 從路徑中打開文件獲取語料 |
load | URL | 從URL中獲取語料 |
load | List | 從字符串數組中獲取語料 |
setWordTokenizer | WordTokenizer | 設置分詞器 主要有ChineseWordTokenizer 和EnglishWordTokenizer |
setWordFrequenciesToReturn | int | 設置返回數據長度 |
setMinWordLength | int | 最小分詞長度 |
setMaxWordLength | int | 最大分詞長度 |
public static String getWordCloud(List<String> words) {
// 新建FrequencyAnalyzer 對象
FrequencyAnalyzer frequencyAnalyzer = new FrequencyAnalyzer();
// 設置分詞返回數量(頻率最高的600個詞)
frequencyAnalyzer.setWordFrequenciesToReturn(600);
// 最小分詞長度
frequencyAnalyzer.setMinWordLength(2);
// 引入中文解析器
frequencyAnalyzer.setWordTokenizer(new ChineseWordTokenizer());
// 獲取詞語頻率數據
final List<WordFrequency> wordFrequencyList = frequencyAnalyzer.load(words);
}
創建一個Dimension對象,用於設置生成圖片的分辨率。
// 設置圖片分辨率
Dimension dimension = new Dimension(500, 500);
創建一個詞雲對象WordCloud
WordCloud wordCloud = new WordCloud(dimension, CollisionMode.PIXEL_PERFECT);
// 我也不知道有啥用 但是不加中文會亂碼
java.awt.Font font = new java.awt.Font("STSong-Light", 2, 18);
wordCloud.setKumoFont(new KumoFont(font));
// 設置邊緣留空
wordCloud.setPadding(2);
// 設置顏色頻率越高用越靠前的顏色
wordCloud.setColorPalette(new ColorPalette(new Color(0xed1941), new Color(0xf26522), new Color(0x845538)));
// 設置形狀 這裏用的圓 參數爲半徑
wordCloud.setBackground(new CircleBackground(200));
// 設置字體大小範圍
wordCloud.setFontScalar(new SqrtFontScalar(10, 40));
// 設置背景色
wordCloud.setBackgroundColor(new Color(255, 255, 255));
調用詞雲對象的build
方法,參數爲剛纔生成的分詞頻率。
// 生成詞雲
wordCloud.build(wordFrequencyList);
最後可以對生成的圖片進行處理,這裏需要返回到前端,使用Base64編碼
OutputStream output = new ByteArrayOutputStream();
wordCloud.writeToStream("png", output);
byte[] outputByte = ((ByteArrayOutputStream)output).toByteArray();
return org.apache.commons.codec.binary.Base64.encodeBase64String(outputByte);
完整代碼如下
public static String getWordCloud(List<String> words) {
FrequencyAnalyzer frequencyAnalyzer = new FrequencyAnalyzer();
frequencyAnalyzer.setWordFrequenciesToReturn(600);
frequencyAnalyzer.setMinWordLength(2);
// 引入中文解析器
frequencyAnalyzer.setWordTokenizer(new ChineseWordTokenizer());
final List<WordFrequency> wordFrequencyList = frequencyAnalyzer.load(words);
// 設置圖片分辨率
Dimension dimension = new Dimension(500, 500);
// 此處的設置採用內置常量即可,生成詞雲對象
WordCloud wordCloud = new WordCloud(dimension, CollisionMode.PIXEL_PERFECT);
java.awt.Font font = new java.awt.Font("STSong-Light", 2, 18);
wordCloud.setKumoFont(new KumoFont(font));
wordCloud.setPadding(2);
wordCloud.setColorPalette(new ColorPalette(new Color(0xed1941), new Color(0xf26522), new Color(0x845538),new Color(0x8a5d19),new Color(0x7f7522),new Color(0x5c7a29),new Color(0x1d953f),new Color(0x007d65),new Color(0x65c294)));
wordCloud.setBackground(new CircleBackground(200));
wordCloud.setFontScalar(new SqrtFontScalar(10, 40));
wordCloud.setBackgroundColor(new Color(255, 255, 255));
// 生成詞雲
wordCloud.build(wordFrequencyList);
OutputStream output = new ByteArrayOutputStream();
wordCloud.writeToStream("png", output);
byte[] outputByte = ((ByteArrayOutputStream)output).toByteArray();
return org.apache.commons.codec.binary.Base64.encodeBase64String(outputByte);
}
生成的效果圖如下