Redmine-4.1.1/redmine_dmsf插件全文檢索xapian(蝦片) 如何支持.c .cpp .rb .reg等類型的文件

前言

redmine_dmsf插件全文檢索xapian(蝦片)缺省不支持對 .c .cpp .rb. reg等文件的。

適用範圍

其它是text文件類型的處理本文方法應該都適用。

象.doc, .pdf這類需要轉換程序的本文不適用。

環境

  • Ubuntu 20.04.2LTS server
  • Ruby 2.7.2p137, Rails 5.2.4.2,  gem 3.1.4, rake, version 13.0.3
  • Redmine 4.1.1
  • redmine_dmfs 2.4.5

參考

參考官方文章:Add support for a new format to Omega 實現。

文中有3種方法, 這裏採用第3種(直接修改源代碼)。另外兩種還沒有搞懂。

1、修改文件~/xapian/xapian-omega-1.4.18/index_file.cc

~/xapian/xapian-omega-1.4.18/index_file.cc 

 753         created = p.created;
 754         md5_string(text, md5);
 755     } else if (mimetype == "text/plain") {
 756         // Currently we assume that text files are UTF-8 unless they have a
 757         // byte-order mark.
 758         dump = d.file_to_string();
 759         md5_string(dump, md5);
 760
 761         // Look for Byte-Order Mark (BOM).
 762         if (startswith(dump, "\xfe\xff") || startswith(dump, "\xff\xfe")) {
 763         // UTF-16 in big-endian/little-endian order - we just convert
 764         // it as "UTF-16" and let the conversion handle the BOM as that
 765         // way we avoid the copying overhead of erasing 2 bytes from
 766         // the start of dump.
 767         convert_to_utf8(dump, "UTF-16");
 768         } else if (startswith(dump, "\xef\xbb\xbf")) {
 769         // UTF-8 with stupid Windows not-the-byte-order mark.
 770         dump.erase(0, 3);
 771         } else {
 772         // FIXME: What charset is the file?  Look at contents?
 773         }
 774     } else if (mimetype == "application/pdf") {

755行} else if (mimetype == "text/plain") {修改爲:

       } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
                || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
                        ) {

差別:

# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:10:35]
$ diff index_file.cc.orig index_file.cc
755c755,757
<       } else if (mimetype == "text/plain") {
---
>       } else if (mimetype == "text/plain" || mimetype == "text/x-c" || mimetype == "text/x-c++"
>                || mimetype == "text/x-ms-regedit" || mimetype == "text/x-ruby"
>                        ) {

2、重新編譯

# samxiao @ rm411 in ~/xapian/xapian-omega-1.4.18 [11:13:21] C:1 
$ ./configure --prefix=/opt/xapian XAPIAN_CONFIG=/opt/xapian/bin/xapian-config  ## 這條命令只需要執行一次。
$ make 
$ sudo make install

3、重建索引,可以看到生效了

注意 ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R   命令中的-R參數不能少,要求對以前失敗的文件進行重建工作,否則不會進行。

# samxiao @ rm411 in ~/redmine-4.1.1 [11:03:06] C:100
$ ruby plugins/redmine_dmsf/extra/xapian_indexer.rb -v -R
Trying to load Redmine environment <</home/samxiao/redmine-4.1.1/config/environment.rb>>...
Redmine environment [RAILS_ENV=production] correctly loaded ...
/opt/xapian/bin/omindex -s english --db /home/samxiao/redmine-4.1.1/dmsf_index/english /home/samxiao/redmine-4.1.1/files/dmsf --url / --depth-limit=0 -v --retry-failed
[Entering directory ""]
[Entering directory "2021/"]
[Entering directory "2021/03/"]
Indexing "2021/03/210313014821_28_CODE_OF_CONDUCT.md" as text/markdown ... already indexed
Indexing "2021/03/210313014821_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210312143519_2_test.idcard-sz.txt" as text/plain ... already indexed
Indexing "2021/03/210313014821_21_README.md" as text/markdown ... already indexed
Indexing "2021/03/210312143519_1_test.idcard-sz" as text/plain ... already indexed
Indexing "2021/03/210312143519_3_test.idcard-sz-ASCII.txt" as text/plain ... already indexed
Indexing "2021/03/210313014654_24_1248.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014654_23_1213.cpp" as text/x-c++ ... added
Indexing "2021/03/210312143622_6_p9x9.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014654_26_1256.cpp" as text/x-c++ ... added
Indexing "2021/03/210312143810_8_crc16.c" as text/x-c ... added
Indexing "2021/03/210312143519_5_____COM___________.reg" as text/x-ms-regedit ... added
Indexing "2021/03/210313014654_25_1308.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014543_18_after_init.rb" as text/x-ruby ... added
Indexing "2021/03/210313014543_22_1327-V2.cpp" as text/x-c++ ... added
Indexing "2021/03/210313014821_29_redmine_impersonate-2.0.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210313014543_13_redmine_logs-0.2.0.zip" as application/zip ... Skipping - unknown MIME type 'application/zip'
Indexing "2021/03/210312143810_11_JS__________.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed
Indexing "2021/03/210313014821_27_CHANGELOG.md" as text/markdown ... already indexed
Indexing "2021/03/210313014543_15_scp22.bat" as text/plain ... already indexed
Indexing "2021/03/210313014543_16_scp135.bat" as text/plain ... already indexed
Indexing "2021/03/210312143810_9_SQLServer2000_____V2.0_-171127.docx" as application/vnd.openxmlformats-officedocument.wordprocessingml.document ... already indexed
Indexing "2021/03/210313014543_20_LICENSE.md" as text/markdown ... already indexed
Indexing "2021/03/210313014543_17_xapian-test.py" as text/x-python ... Skipping - unknown MIME type 'text/x-python'
Indexing "2021/03/210313014543_14_History.json" as application/json ... Skipping - unknown MIME type 'application/json'
Indexing "2021/03/210312143837_12_LRS-150F-spec-cn_1_.pdf" as application/pdf ... already indexed
Indexing "2021/03/210313014543_19_init.rb" as text/plain ... already indexed
Indexing "2021/03/210313014543_21_README.md" as text/markdown ... already indexed
Indexing "2021/03/210312143810_7_users.csv" as text/csv ... already indexed
Indexing "2021/03/210312143810_10_noi______.doc" as application/msword ... already indexed
Indexing "2021/03/210312143519_4_test.idsz-ASCII" as text/plain ... already indexed
Redmine DMS documents indexed

 

檢索測試

.reg

.rb

.cpp .c .md

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章