本例仅仅对於单表增量的进行了测试。solr增量的采集原理是依赖系统的时间进行了增量采集,所以我们必须保证数据库的系统时间与solr所在jvm时间一致。
开始之前下面的提醒很重要:
DataImportHandler is a data import tool for Solr which makes importing data from Databases, XML files and
HTTP data sources quick and easy.
Important Note
--------------
Although Solr strives to be agnostic of the Locale where the server is
running, some code paths in DataImportHandler are known to depend on the
System default Locale, Timezone, or Charset. It is recommended that when
running Solr you set the following system properties:
-Duser.language=xx -Duser.country=YY -Duser.timezone=ZZZ
where xx, YY, and ZZZ are consistent with any database server's configuration.
- 所以我们需要设置时区,/etc/default/solr.in.sh
# By default the start script uses UTC; override the timezone if needed
SOLR_TIMEZONE="UTC+8"
- 准备数据库表
ALTER TABLE RMS_RESOURCEINFO
ADD (SOLR_LAST_DATE TIMESTAMP );
CREATE INDEX RMS_RESOURCEINFO_INDEX_SOLR ON RMS_RESOURCEINFO (SOLR_LAST_DATE ASC);
alter trigger "PLS"."RMS_RESOURCEINFO_SOLR" disable;
alter trigger "PLS"."RMS_RESOURCEINFO_UPDATE" disable;
UPDATE RMS_RESOURCEINFO SET SOLR_LAST_DATE=CURRENT_TIMESTAMP;
COMMIT;
alter trigger "PLS"."RMS_RESOURCEINFO_SOLR" enable;
alter trigger "PLS"."RMS_RESOURCEINFO_UPDATE" enable;
- 配置data-config.xml
<dataConfig>
<propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type="SimplePropertiesWriter" filename="my_dih.properties" locale="zh-CN" />
<dataSource type="JdbcDataSource"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//"
user=""
password="U2FsdGVkX1/PqBuNUFBIcmLKTb+y41YB6J7b6tAm8Xw="
encryptKeyFile="/var/solr/data/dih-encryptionkey"
/>
<document>
<!-- <entity name="id"
query="select id,name,section,subject from CLASS_TYPE">
<field column="ID" name="id"/>
<field column="NAME" name="solr_name"/>
<field column="SECTION" name="solr_section"/>
<field column="SUBJECT" name="subject_s"/>
</entity>-->
<entity name="info" transformer="DateFormatTransformer" query="select R_CODE,R_TITLE,R_KS_ID,R_DESC,R_TYPECODE,R_FORMAT,SOLR_LAST_DATE FROM RMS_RESOURCEINFO" deltaImportQuery="select R_CODE,R_TITLE,R_KS_ID,R_DESC,R_TYPECODE,R_FORMAT,SOLR_LAST_DATE FROM RMS_RESOURCEINFO where R_CODE='${dataimporter.delta.R_CODE}'" deltaQuery="SELECT R_CODE FROM RMS_RESOURCEINFO WHERE SOLR_LAST_DATE>TO_DATE('${dataimporter.last_index_time}','yyyy-mm-dd hh24:mi:ss')">
<field column="R_CODE" name="id"/>
<field column="R_TITLE" name="rtitle_txt_cjk"/>
<field column="R_KS_ID" name="ksid_s"/>
<field column="R_DESC" name="rdesc_txt_cjk"/>
<field column="R_TYPECODE" name="rtypecode_s"/>
<field column="R_FORMAT" name="rformat_s"/>
<field column="SOLR_LAST_DATE" dateTimeFormat="yyyy-MM-dd HH:mm:ss" name="lastDate_dt"/>
</entity>
</document>
</dataConfig>
- 执行delta-import,full-import命令,注意my_dih.properties时间的变化,这里我自定义了文件,手动需要创建my_dih.properties,并chown solr:solr my_dih.properties