Redmine-4.1.1/redmine_dmsf插件全文检索xapian(虾片) 如何支持.c .cpp .rb .reg等类型的文件

前言

redmine_dmsf插件全文检索xapian(虾片)缺省不支持对 .c .cpp .rb. reg等文件的。

适用范围

其它是text文件类型的处理本文方法应该都适用。

象.doc, .pdf这类需要转换程序的本文不适用。

环境

  • Ubuntu 20.04.2LTS server
  • Ruby 2.7.2p137, Rails 5.2.4.2,  gem 3.1.4, rake, version 13.0.3
  • Redmine 4.1.1
  • redmine_dmfs 2.4.5

参考

参考官方文章:Add support for a new format to Omega 实现。

文中有3种方法, 这里采用第3种(直接修改源代码)。另外两种还没有搞懂。

1、修改文件~/xapian/xapian-omega-1.4.18/index_file.cc

~/xapian/xapian-omega-1.4.18/index_file.cc 

 753         created = p.created;
 754         md5_string(text, md5);
 755     } else if (mimetype == "text/plain") {
 756         // Currently we assume that text files are UTF-8 unless they have a
 757         // byte-order mark.
 758         dump = d.file_to_string();
 759         md5_string(dump, md5);
 760
 761         // Look for Byte-Order Mark (BOM).
 762         if (startswith(dump, "\xfe\xff") || startswith(dump, "\xff\xfe")) {
 763         // UTF-16 in big-endian/little-endian order - we just convert
 764         // it as "UTF-16" and let the conversion handle the BOM as that
 765         // way we avoid the copying overhead of erasing 2 bytes from
 766         // the start of dump.
 767         convert_to_utf8(dump, "UTF-16");
 768         } else if (startswith(dump, "\xef\xbb\xbf")) {
 769         // UTF-8 with stupid Windows not-the-byte-order mark.
 770         dump.erase(0, 3);
 771         } else {
 772         // FIXME: What charset is the file?  Look at contents?
 773         }
 774     } else if (mimetype == "application/pdf") {

755行} else if (mimetype == "text/plain") {修改为:

       } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
                || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
                        ) {

差别:

# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:10:35]
$ diff index_file.cc.orig index_file.cc
755c755,757
<       } else if (mimetype == "text/plain") {
---
>       } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
>                || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
>                        ) {

2、重新编译

# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:13:21] C:1 
$ ./configure --prefix=/opt/xapian XAPIAN_CONFIG=/opt/xapian/bin/xapian-config  ## 这条命令只需要执行一次。
$ make 
$ sudo make install

3、重建索引,可以看到生效了

注意 ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R   命令中的-R参数不能少,要求对以前失败的文件进行重建工作,否则不会进行。

# samxiao @ rm411 in ~/redmine-4.1.1 [11:03:06] C:100
$ ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R
Trying to load Redmine environment <</home/samxiao/redmine-4.1.1/config/environment.rb>>...
Redmine environment [RAILS_ENV=production] correctly loaded ...
/opt/xapian/bin/omindex -s english --db /home/samxiao/redmine-4.1.1/dmsf_index/english /home/samxiao/redmine-4.1.1/files/dmsf --url / --depth-limit=0 -v --retry-failed
[Entering directory ""]
[Entering directory "2021/"]
[Entering directory "2021/03/"]
Indexing "2021/03/210313014821_28_CODE_OF_CONDUCT.md" as text/markdown ... already indexed
Indexing "2021/03/210313014821_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210312143519_2_test.idcard-sz.txt" as text/plain ... already indexed
Indexing "2021/03/210313014821_21_README.md" as text/markdown ... already indexed
Indexing "2021/03/210312143519_1_test.idcard-sz" as text/plain ... already indexed
Indexing "2021/03/210312143519_3_test.idcard-sz-ASCII.txt" as text/plain ... already indexed
Indexing "2021/03/210313014654_24_1248.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014654_23_1213.cpp" as text/x-c++ ... added
Indexing "2021/03/210312143622_6_p9x9.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014654_26_1256.cpp" as text/x-c++ ... added
Indexing "2021/03/210312143810_8_crc16.c" as text/x-c ... added
Indexing "2021/03/210312143519_5_____COM___________.reg" as text/x-ms-regedit ... added
Indexing "2021/03/210313014654_25_1308.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014543_18_after_init.rb" as text/x-ruby ... added
Indexing "2021/03/210313014543_22_1327-V2.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014821_29_redmine_impersonate-2.0.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210313014543_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210312143810_11_JS__________.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed
Indexing "2021/03/210313014821_27_CHANGELOG.md" as text/markdown ... already indexed
Indexing "2021/03/210313014543_15_scp22.bat" as text/plain ... already indexed
Indexing "2021/03/210313014543_16_scp135.bat" as text/plain ... already indexed
Indexing "2021/03/210312143810_9_SQLServer2000_____V2.0_-171127.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed
Indexing "2021/03/210313014543_20_LICENSE.md" as text/markdown ... already indexed
Indexing "2021/03/210313014543_17_xapian-test.py" as text/x-python ... Skipping - unknown MIME type 'text/x-python'
Indexing "2021/03/210313014543_14_History.json" as application/json ... Skipping - unknown MIME type 'application/json'
Indexing "2021/03/210312143837_12_LRS-150F-spec-cn_1_.pdf" as application/pdf ... already indexed
Indexing "2021/03/210313014543_19_init.rb" as text/plain ... already indexed
Indexing "2021/03/210313014543_21_README.md" as text/markdown ... already indexed
Indexing "2021/03/210312143810_7_users.csv" as text/csv ... already indexed
Indexing "2021/03/210312143810_10_noi______.doc" as application/msword ... already indexed
Indexing "2021/03/210312143519_4_test.idsz-ASCII" as text/plain ... already indexed
Redmine DMS documents indexed

 

检索测试

.reg

.rb

.cpp .c .md

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章