在MYSQL中使用全文索引(FULLTEXT index)

MYSQL的一個很有用的特性是使用全文索引(FULLTEXT index)查找文本的能力.目前只有使用MyISAM類型表的時候有效(MyISAM是默認的表類型,如果你不知道使用的是什麼類型的表,那很可能就是 MyISAM).全文索引可以建立在TEXT,CHAR或者VARCHAR類型的字段,或者字段組合上.我們將建立一個簡單的表用來解釋各種特性.
簡單用法(MATCH()函數)對3.23.23以後的版本有效,複雜的用法(IN BOOLEAN MODE修飾語)對4以後的版本有效,本文的第一部分着重簡單用法,第二部分講複雜用法.
一個簡單的表
我們將在整個過程中使用下面的表.
CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;
如果你沒有把默認的表類型設置成MyISAM以外的類型那麼TYPE=MyISAM可以省略.建表之後,向其中填充一些數據,例如:
INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');

如果你已經建立好了一個表,你可以使用ALTER TABLE(就像CREATE INDEX語句一樣)語句添加一個全文索引,例如:
ALTER TABLE fulltext_sample ADD FULLTEXT(copy)
查找文本
全文索引搜索的語法很簡單,你只要MATCH字段,AGAINST你要查找的文本,例如:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('love');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+

在全文索引上進行搜索是不區分大小寫的,因此下面的語句也可以正常運行:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('LOVE');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+

全文索引通常用來搜索自然語言文本,例如報紙文章,網頁內容等等.因此MySQL爲這類搜索添加了很多特性.MySQL不索引任何長度小於等於3的文本, 也不索引有50%機會出現的單詞.這意味着如果你的表少於2條記錄,基於全文索引的搜索不會返回任何東西.將來,MySQL會使這項功能更靈活,但是現在 它應該可以適合大部分自然語言的使用.如果你的數據庫中的大部分記錄都包含”music”,你很可能不希望返回這些記錄,你可以使用IN BOOLEAN MODE修飾符來獲得50%左右的閥值,見本文第二部分.
結果將按照關聯性從高到底的順序返回.
主要特性
下面是標準的全文索引搜索的主要特性:
1.排除重複詞語
2.排除長度小於4的詞語
3.排除在多於一半記錄中出現的詞語(就是說只要要有3條記錄)
4.帶連字符的詞語被認爲兩個詞語
5.結果按照關聯度降序返回
6.忽略列表中的詞語也被從搜索結果中排除.忽略列表基於普通的英文單詞,因此如果你的數據用作不同的目的,你可能希望改變忽略列表.不幸的是,這樣作並 不容易.你需要編輯文件myisam/ft_static.c,重新編輯MySQL,並重建索引!這裏有一個忽略列表.注意,這些在不同的版本里有所更 改.
忽略列表
"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", "zero",
讓我們看一下其中的一些詞.如果你懶的輸入,但是想查找”love”這個詞,象下面這樣:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('lov');
Empty set (0.00 sec)

什麼都沒返回,因爲全文索引只包含完整的單詞,不是部分單詞.如果想得到返回,你必須把單詞寫完整,就像第一個例子裏一樣.
就像我們提過的,連字符單詞在全文索引中被排除(它們被作爲單獨的單詞索引),因此下面的語句什麼都不返回:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('all-out');
Empty set (0.00 sec)

很不幸,兩個單詞都小於4個字符,因此單獨搜索時也不會出現,而且通常的搜索中也不會出現.本文的第二部分中使用BOOLEAN MODE搜索可以搜索部分的或者包含連字符的單詞.
你也可以一次搜索多個單詞,用逗號分隔.下面的例子查找包含”here”和”appears”的記錄:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('here','appears');
Empty set (0.01 sec)

出乎意料這個語句沒有返回.但是仔細看看忽略列表,這個詞被列在其中,因此被從索引中排除了.忽略列表可能是人們解釋MySQL全文索引沒有生效的通常原因.如果你的查詢返回了一個結果,那麼你的版本的MySQL的忽略列表不包含”here”這個詞.
關聯度
下面的例子說明記錄返回的優先級
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+
| copy |
+---------------------------+
| A good alert |
| It appears good from here |
| An all-out alert |
+---------------------------+

記錄”A good alert”首先出現,因爲它同時包含要搜索的兩個詞.你不必相信我-只需要看看MySQL在結果中顯示的優先級.簡單的在字段列表中重複MATCH()函數,例如:
mysql> SELECT copy,MATCH(copy) AGAINST('good,alert') AS relevance
FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+------------------+
| copy | relevance |
+---------------------------+------------------+
| A good alert | 1.3551264824316 |
| An all-out alert | 0.68526663197496 |
| It appears good from hear | 0.67003110026735 |
+---------------------------+------------------+

關聯度的計算非常複雜,它基於索引中單詞的數量,記錄中不同單詞的個數,索引和返回結果中單詞的總數,以及單詞的重要程度.這個數字可能在你的MySQL版本中有所不同,MySQL偶爾會強化計算邏輯.
對大多數應用來說標準的全文索引搜索非常有用而充分,MySQL 4讓它更加強大.

原文地址:http://www.databasejournal.com/features/mysql/article.php/1578331

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章