前言
redmine_dmsf插件全文檢索xapian(蝦片)缺省不支持對 .c .cpp .rb. reg等文件的。
適用範圍
其它是text文件類型的處理本文方法應該都適用。
象.doc, .pdf這類需要轉換程序的本文不適用。
環境
- Ubuntu 20.04.2LTS server
- Ruby 2.7.2p137, Rails 5.2.4.2, gem 3.1.4, rake, version 13.0.3
- Redmine 4.1.1
- redmine_dmfs 2.4.5
參考
參考官方文章:Add support for a new format to Omega 實現。
文中有3種方法, 這裏採用第3種(直接修改源代碼)。另外兩種還沒有搞懂。
1、修改文件~/xapian/xapian-omega-1.4.18/index_file.cc
~/xapian/xapian-omega-1.4.18/index_file.cc
753 created = p.created;
754 md5_string(text, md5);
755 } else if (mimetype == "text/plain") {
756 // Currently we assume that text files are UTF-8 unless they have a
757 // byte-order mark.
758 dump = d.file_to_string();
759 md5_string(dump, md5);
760
761 // Look for Byte-Order Mark (BOM).
762 if (startswith(dump, "\xfe\xff") || startswith(dump, "\xff\xfe")) {
763 // UTF-16 in big-endian/little-endian order - we just convert
764 // it as "UTF-16" and let the conversion handle the BOM as that
765 // way we avoid the copying overhead of erasing 2 bytes from
766 // the start of dump.
767 convert_to_utf8(dump, "UTF-16");
768 } else if (startswith(dump, "\xef\xbb\xbf")) {
769 // UTF-8 with stupid Windows not-the-byte-order mark.
770 dump.erase(0, 3);
771 } else {
772 // FIXME: What charset is the file? Look at contents?
773 }
774 } else if (mimetype == "application/pdf") {
755行} else if (mimetype == "text/plain") {
修改爲:
} else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
|| mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
) {
差別:
# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:10:35]
$ diff index_file.cc.orig index_file.cc
755c755,757
< } else if (mimetype == "text/plain") {
---
> } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
> || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
> ) {
2、重新編譯
# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:13:21] C:1 $ ./configure --prefix=/opt/xapian XAPIAN_CONFIG=/opt/xapian/bin/xapian-config ## 這條命令只需要執行一次。 $ make $ sudo make install
3、重建索引,可以看到生效了
注意 ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R 命令中的-R參數不能少,要求對以前失敗的文件進行重建工作,否則不會進行。
# samxiao @ rm411 in ~/redmine-4.1.1 [11:03:06] C:100 $ ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R Trying to load Redmine environment <</home/samxiao/redmine-4.1.1/config/environment.rb>>... Redmine environment [RAILS_ENV=production] correctly loaded ... /opt/xapian/bin/omindex -s english --db /home/samxiao/redmine-4.1.1/dmsf_index/english /home/samxiao/redmine-4.1.1/files/dmsf --url / --depth-limit=0 -v --retry-failed [Entering directory ""] [Entering directory "2021/"] [Entering directory "2021/03/"] Indexing "2021/03/210313014821_28_CODE_OF_CONDUCT.md" as text/markdown ... already indexed Indexing "2021/03/210313014821_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210312143519_2_test.idcard-sz.txt" as text/plain ... already indexed Indexing "2021/03/210313014821_21_README.md" as text/markdown ... already indexed Indexing "2021/03/210312143519_1_test.idcard-sz" as text/plain ... already indexed Indexing "2021/03/210312143519_3_test.idcard-sz-ASCII.txt" as text/plain ... already indexed Indexing "2021/03/210313014654_24_1248.cpp" as text/x-c++ ... added Indexing "2021/03/210313014654_23_1213.cpp" as text/x-c++ ... added Indexing "2021/03/210312143622_6_p9x9.cpp" as text/x-c++ ... added Indexing "2021/03/210313014654_26_1256.cpp" as text/x-c++ ... added Indexing "2021/03/210312143810_8_crc16.c" as text/x-c ... added Indexing "2021/03/210312143519_5_____COM___________.reg" as text/x-ms-regedit ... added Indexing "2021/03/210313014654_25_1308.cpp" as text/x-c++ ... added Indexing "2021/03/210313014543_18_after_init.rb" as text/x-ruby ... added Indexing "2021/03/210313014543_22_1327-V2.cpp" as text/x-c++ ... added Indexing "2021/03/210313014821_29_redmine_impersonate-2.0.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210313014543_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210312143810_11_JS__________.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed Indexing "2021/03/210313014821_27_CHANGELOG.md" as text/markdown ... already indexed Indexing "2021/03/210313014543_15_scp22.bat" as text/plain ... already indexed Indexing "2021/03/210313014543_16_scp135.bat" as text/plain ... already indexed Indexing "2021/03/210312143810_9_SQLServer2000_____V2.0_-171127.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed Indexing "2021/03/210313014543_20_LICENSE.md" as text/markdown ... already indexed Indexing "2021/03/210313014543_17_xapian-test.py" as text/x-python ... Skipping - unknown MIME type 'text/x-python' Indexing "2021/03/210313014543_14_History.json" as application/json ... Skipping - unknown MIME type 'application/json' Indexing "2021/03/210312143837_12_LRS-150F-spec-cn_1_.pdf" as application/pdf ... already indexed Indexing "2021/03/210313014543_19_init.rb" as text/plain ... already indexed Indexing "2021/03/210313014543_21_README.md" as text/markdown ... already indexed Indexing "2021/03/210312143810_7_users.csv" as text/csv ... already indexed Indexing "2021/03/210312143810_10_noi______.doc" as application/msword ... already indexed Indexing "2021/03/210312143519_4_test.idsz-ASCII" as text/plain ... already indexed Redmine DMS documents indexed
檢索測試
.reg
.rb
.cpp .c .md