中文分詞——正向最大匹配法

原創

2018-09-01 20:07

中文分詞應用很廣泛，網上也有很多開源項目。我在這裏主要講一下中文分詞裏面算法的簡單實現，廢話不多說了，現在先上代碼

[java]view
plain copy
 print?

package com;  

import java.util.ArrayList;  

import java.util.List;  

public class Segmentation1 {  

    private List<String> dictionary = new ArrayList<String>();  

    private String request = "北京大學生前來應聘";  

    public void setDictionary() {  

        dictionary.add("北京");  

        dictionary.add("北京大學");  

        dictionary.add("大學");  

        dictionary.add("大學生");  

        dictionary.add("生前");  

        dictionary.add("前來");  

        dictionary.add("應聘");  

    }  

    public String leftMax() {  

        String response = "";  

        String s = "";  

        for(int i=0; i<request.length(); i++) {  

            s += request.charAt(i);  

            if(isIn(s, dictionary) && aheadCount(s, dictionary)==1) {  

                response += (s + "/");  

                s = "";  

            } else if(aheadCount(s, dictionary) > 0) {  

            } else {  

                response += (s + "/");  

                s = "";  

            }  

        }  

        return response;  

    }  

    private boolean isIn(String s, List<String> list) {  

        for(int i=0; i<list.size(); i++) {  

            if(s.equals(list.get(i))) return true;  

        }  

        return false;  

    }  

    private int aheadCount(String s, List<String> list) {  

        int count = 0;  

        for(int i=0; i<list.size(); i++) {  

            if((s.length()<=list.get(i).length()) && (s.equals(list.get(i).substring(0, s.length())))) count ++;  

        }  

        return count;  

    }  

    public static void main(String[] args) {  

        Segmentation1 seg = new Segmentation1();  

        seg.setDictionary();  

        String response1 = seg.leftMax();  

        System.out.println(response1);  

    }  

}

可以看到運行結果是：北京大學/生前/來/應聘/

算法的核心就是從前往後搜索，然後找到最長的字典分詞。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

中文分詞——正向最大匹配法

工作中用到的腳本合集

24-5-18 X

蟻羣算法

遺傳算法

java JNI 實現原理 (一)

中文分詞——正向最大匹配法

中文分詞選取-不成詞個數判斷法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結