前言
redmine_dmsf插件全文检索xapian(虾片)缺省不支持对 .c .cpp .rb. reg等文件的。
适用范围
其它是text文件类型的处理本文方法应该都适用。
象.doc, .pdf这类需要转换程序的本文不适用。
环境
- Ubuntu 20.04.2LTS server
- Ruby 2.7.2p137, Rails 5.2.4.2, gem 3.1.4, rake, version 13.0.3
- Redmine 4.1.1
- redmine_dmfs 2.4.5
参考
参考官方文章:Add support for a new format to Omega 实现。
文中有3种方法, 这里采用第3种(直接修改源代码)。另外两种还没有搞懂。
1、修改文件~/xapian/xapian-omega-1.4.18/index_file.cc
~/xapian/xapian-omega-1.4.18/index_file.cc
753 created = p.created;
754 md5_string(text, md5);
755 } else if (mimetype == "text/plain") {
756 // Currently we assume that text files are UTF-8 unless they have a
757 // byte-order mark.
758 dump = d.file_to_string();
759 md5_string(dump, md5);
760
761 // Look for Byte-Order Mark (BOM).
762 if (startswith(dump, "\xfe\xff") || startswith(dump, "\xff\xfe")) {
763 // UTF-16 in big-endian/little-endian order - we just convert
764 // it as "UTF-16" and let the conversion handle the BOM as that
765 // way we avoid the copying overhead of erasing 2 bytes from
766 // the start of dump.
767 convert_to_utf8(dump, "UTF-16");
768 } else if (startswith(dump, "\xef\xbb\xbf")) {
769 // UTF-8 with stupid Windows not-the-byte-order mark.
770 dump.erase(0, 3);
771 } else {
772 // FIXME: What charset is the file? Look at contents?
773 }
774 } else if (mimetype == "application/pdf") {
755行} else if (mimetype == "text/plain") {
修改为:
} else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
|| mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
) {
差别:
# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:10:35]
$ diff index_file.cc.orig index_file.cc
755c755,757
< } else if (mimetype == "text/plain") {
---
> } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
> || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
> ) {
2、重新编译
# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:13:21] C:1 $ ./configure --prefix=/opt/xapian XAPIAN_CONFIG=/opt/xapian/bin/xapian-config ## 这条命令只需要执行一次。 $ make $ sudo make install
3、重建索引,可以看到生效了
注意 ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R 命令中的-R参数不能少,要求对以前失败的文件进行重建工作,否则不会进行。
# samxiao @ rm411 in ~/redmine-4.1.1 [11:03:06] C:100 $ ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R Trying to load Redmine environment <</home/samxiao/redmine-4.1.1/config/environment.rb>>... Redmine environment [RAILS_ENV=production] correctly loaded ... /opt/xapian/bin/omindex -s english --db /home/samxiao/redmine-4.1.1/dmsf_index/english /home/samxiao/redmine-4.1.1/files/dmsf --url / --depth-limit=0 -v --retry-failed [Entering directory ""] [Entering directory "2021/"] [Entering directory "2021/03/"] Indexing "2021/03/210313014821_28_CODE_OF_CONDUCT.md" as text/markdown ... already indexed Indexing "2021/03/210313014821_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210312143519_2_test.idcard-sz.txt" as text/plain ... already indexed Indexing "2021/03/210313014821_21_README.md" as text/markdown ... already indexed Indexing "2021/03/210312143519_1_test.idcard-sz" as text/plain ... already indexed Indexing "2021/03/210312143519_3_test.idcard-sz-ASCII.txt" as text/plain ... already indexed Indexing "2021/03/210313014654_24_1248.cpp" as text/x-c++ ... added Indexing "2021/03/210313014654_23_1213.cpp" as text/x-c++ ... added Indexing "2021/03/210312143622_6_p9x9.cpp" as text/x-c++ ... added Indexing "2021/03/210313014654_26_1256.cpp" as text/x-c++ ... added Indexing "2021/03/210312143810_8_crc16.c" as text/x-c ... added Indexing "2021/03/210312143519_5_____COM___________.reg" as text/x-ms-regedit ... added Indexing "2021/03/210313014654_25_1308.cpp" as text/x-c++ ... added Indexing "2021/03/210313014543_18_after_init.rb" as text/x-ruby ... added Indexing "2021/03/210313014543_22_1327-V2.cpp" as text/x-c++ ... added Indexing "2021/03/210313014821_29_redmine_impersonate-2.0.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210313014543_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip' Indexing "2021/03/210312143810_11_JS__________.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed Indexing "2021/03/210313014821_27_CHANGELOG.md" as text/markdown ... already indexed Indexing "2021/03/210313014543_15_scp22.bat" as text/plain ... already indexed Indexing "2021/03/210313014543_16_scp135.bat" as text/plain ... already indexed Indexing "2021/03/210312143810_9_SQLServer2000_____V2.0_-171127.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed Indexing "2021/03/210313014543_20_LICENSE.md" as text/markdown ... already indexed Indexing "2021/03/210313014543_17_xapian-test.py" as text/x-python ... Skipping - unknown MIME type 'text/x-python' Indexing "2021/03/210313014543_14_History.json" as application/json ... Skipping - unknown MIME type 'application/json' Indexing "2021/03/210312143837_12_LRS-150F-spec-cn_1_.pdf" as application/pdf ... already indexed Indexing "2021/03/210313014543_19_init.rb" as text/plain ... already indexed Indexing "2021/03/210313014543_21_README.md" as text/markdown ... already indexed Indexing "2021/03/210312143810_7_users.csv" as text/csv ... already indexed Indexing "2021/03/210312143810_10_noi______.doc" as application/msword ... already indexed Indexing "2021/03/210312143519_4_test.idsz-ASCII" as text/plain ... already indexed Redmine DMS documents indexed
检索测试
.reg
.rb
.cpp .c .md