Hbase-二級索引 Hbase+Hbase-indexer+solr （CDH）

原文鏈接：https://www.jianshu.com/p/a7e2079ded33

最近一段時間工作涉及到hbase sql查詢和可視化展示的工作，hbase作爲列存儲，數據單一爲二進制數組，本身就不擅長sql查詢；而且有hive來作爲補充作爲sql查詢和存儲，但是皮皮蝦需要低延遲的sql及複雜sql的查詢（根據值查找數據的情況），這就要用到hbase的二級索引。這裏的二級索引方式採用的 Hbase+Hbase-indexer+solr ，還有Phoenix等方式。

該架構hbase作爲底層存儲，Hbase-indexer會將hbase中的列隱射到solr中作爲索引，這樣就可以在solr集合中直接查詢數據。不過每創建一張hbase表就需要去Hbase-indexer與solr中添加索引配置，比較麻煩。而且Hbase-indexer早已經不更新了，所以需要使用CDH版本的中的各類安裝包。

一、資源安裝

安裝包如下：

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_516.html

基本環境：
OS：CentOS7.x-x86_64
JDK：jdk1.8
hadoop-2.6.0+cdh5.16.2
hbase-solr-1.5+cdh5.16.2
solr-4.10.3-cdh5.16.2
zookeeper-3.4.5-cdh5.16.2
hbase-1.0.0-cdh5.16.2

CDH版本保持相同就ok

節點部署如下：

解壓縮hbase-solr-1.5+cdh5.16.2的tarball，在 hbase-solr-1.5-cdh5.16.2\hbase-indexer-dist\target 下找到hbase-indexer-1.5-cdh5.4.1.tar.gz，後面會用到。

二、部署hbase-indexer

將hbase-indexer安裝部署到hbase分配的HRegionServer上用於同步數據

修改hbase-indexer的參數：關聯zookeeper

vim hbase-indexer-1.5-cdh5.16.2/conf/hbase-indexer-site.xml

<?xml version="1.0"?>
<configuration>
<property>
  <name>hbaseindexer.zookeeper.connectstring</name>
  <!--此處需根據zookeeper集羣的實際配置修改-->
  <value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <!--此處需根據zookeeper集羣的實際配置修改-->
  <value>node1,node2,node3</value>
</property>
</configuration>

配置hbase-indexer-env.sh:關聯java

vim hbase-indexer-1.5-cdh5.16.2/conf/hbase-indexer-env.sh

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase-indexer process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase-indexer, etc.)

# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/usr/java/jdk1.8.0/
#根據實際環境修改

三、Hbase的一些注意事項
修改hbase-site.xml，添加副本設置。

<property>
    <name>hbase.replication</name>
    <value>true</value>
    <description>SEP is basically replication, so enable it</description>
  </property>
  <property>
    <name>replication.source.ratio</name>
    <value>1.0</value>
    <description>Source ratio of 100% makes sure that each SEP consumer is actually used (otherwise, some can sit idle, especially with small clusters)</description>
  </property>
  <property>
    <name>replication.source.nb.capacity</name>
    <value>1000</value>
    <description>Maximum number of hlog entries to replicate in one go. If this is large, and a consumer takes a while to process the events, the HBase rpc call will time out.</description>
  </property>
  <property>
    <name>replication.replicationsource.implementation</name>
    <value>com.ngdata.sep.impl.SepReplicationSource</value>
    <description>A custom replication source that fixes a few things and adds some functionality (doesn't interfere with normal replication usage).</description>
  </property>
 <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node1,node2,node3</value>
    <description>The directory shared by RegionServers</description>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <!--注意這裏配置的是zookeeper集羣的數據目錄，參照zookeeper的zoo.cfg-->
    <value>/home/HBasetest/zookeeperdata</value>
    <description>Property from ZooKeeper's config zoo.cfg.
      The directory where the snapshot is stored.
    </description>
  </property>

修改hbase-env.sh添加Javahome與Hbasehome

export JAVA_HOME=/opt/jdk1.8.0_79
export HBASE_HOME=/home/HBasetest/hbase-1.0.0-cdh5.16.2

將hbase-indexer/lib目錄下的這4個文件賦值到hbase/lib目錄下：

hbase-sep-api-1.5-cdh5.16.2.jar
hbase-sep-impl-1.5-hbase1.0-cdh5.16.2.jar
hbase-sep-impl-common-1.5-cdh5.16.2.jar
hbase-sep-tools-1.5-cdh5.16.2.jar

配置regionservers：

node2
node3

四、測試

1.運行HBase

在node1上執行：

./hbase-1.0.0-cdh5.4.1/bin/start-hbase.sh

2.運行HBase-indexer

分別在node2和node3上執行：

./hbase-indexer-1.5-cdh5.4.1/bin/hbase-indexer server

如果想以後臺方式運行，可以使用screen或者nohup

3.運行Solr

分別在node1上進入solr下面的sample子目錄，執行：

java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=node1:2181,node3:2181,node4:2181/solr -jar start.jar

同樣，如果想以後臺方式運行，可以使用screen或者nohup
使用http://node1:8983/solr/#/訪問solr的主頁

五、數據索引測試

將Hadoop集羣、HBase、HBase-Indexer、Solr都跑起來之後，首先用HBase創建一個數據表：
在任一node上的HBase安裝目錄下運行：

./bin/hbase shell
create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }

在部署了HBase-Indexer的節點上，進入HBase-Indexer部署目錄，使用HBase-Indexer的demo下的配置文件創建一個索引：

創建索引
./bin/hbase-indexer add-indexer -n myindexer -c .demo/user_indexer.xml -cp solr.zk=node1:2181,node2:2181,node3:2181/solr -cp solr.collection=collection1 
查看索引
./hbase-indexer list-indexers -dump
刪除索引
./hbase-indexer delete-indexer --name 'indexer_vip'

編輯hbase-indexer-1.5-cdh5.4.1/demo/下的字段定義文件：

<?xml version="1.0"?>
<indexer table="indexdemo-user">
  <field name="firstname_s" value="info:firstname"/>
  <field name="lastname_s" value="info:lastname"/>
  <field name="age_i" value="info:age" type="int"/>
</indexer>

保存爲indexdemo-indexer.xml

solr中也需要添加映射：

vim solr-4.10/example/solr/collection1/conf/schema.xml


   <field name="firstname_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
   <field name="lastname_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
   <field name="age_i" type="string" indexed="true" stored="true" required="true" multiValued="false" />

添加indexer實例
在hbase-indexer-1.5-cdh5.4.1/demo下運行：

./bin/hbase-indexer add-indexer -n myindexer -c indexdemo-indexer.xml -cp \
solr.zk=node1:2181,node2:2181,node3:2181/solr -cp solr.collection=collection1 -z node1,node2,node3

六、javaApi

依賴包：
<dependency>
 <groupId>org.apache.solr</groupId>
 <artifactId>solr-solrj</artifactId>
 <version>4.10.3</version>
</dependency>







package com.ultrapower.hbase.solrhbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;

public class SolrIndexer {

    /**
     * @param args
     * @throws IOException
     * @throws SolrServerException
     */
    public static void main(String[] args) throws IOException,
            SolrServerException {
        final Configuration conf;
        HttpSolrServer solrServer = new HttpSolrServer(
                "http://192.168.1.10:8983/solr"); // 因爲服務端是用的Solr自帶的jetty容器，默認端口號是8983

        conf = HBaseConfiguration.create();
        HTable table = new HTable(conf, "hb_app_xxxxxx"); // 這裏指定HBase表名稱
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes("d")); // 這裏指定HBase表的列族
        scan.setCaching(500);
        scan.setCacheBlocks(false);
        ResultScanner ss = table.getScanner(scan);

        System.out.println("start ...");
        int i = 0;
        try {
            for (Result r : ss) {
                SolrInputDocument solrDoc = new SolrInputDocument();
                solrDoc.addField("rowkey", new String(r.getRow()));
                for (KeyValue kv : r.raw()) {
                    String fieldName = new String(kv.getQualifier());
                    String fieldValue = new String(kv.getValue());
                    if (fieldName.equalsIgnoreCase("time")
                            || fieldName.equalsIgnoreCase("tebid")
                            || fieldName.equalsIgnoreCase("tetid")
                            || fieldName.equalsIgnoreCase("puid")
                            || fieldName.equalsIgnoreCase("mgcvid")
                            || fieldName.equalsIgnoreCase("mtcvid")
                            || fieldName.equalsIgnoreCase("smaid")
                            || fieldName.equalsIgnoreCase("mtlkid")) {
                        solrDoc.addField(fieldName, fieldValue);
                    }
                }
                solrServer.add(solrDoc);
                solrServer.commit(true, true, true);
                i = i + 1;
                System.out.println("已經成功處理 " + i + " 條數據");
            }
            ss.close();
            table.close();
            System.out.println("done !");
        } catch (IOException e) {
        } finally {
            ss.close();
            table.close();
            System.out.println("erro !");
        }
    }

}

Hbase-二級索引 Hbase+Hbase-indexer+solr （CDH）

985 碩士程序員，空窗 4 個月沒有 Offer！

【入門教程】5分鐘教你快速學會集成Java springboot ~

營銷系統黑名單優化：位圖的應用解析

一文搞懂 Spring 循環依賴

我真的從測試轉成了開發......

盛大發布 | Zabbix 7.0 LTS--性能與擴展的卓越融合

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

H3C S5500三層交換機劃分Vlan與H3C路由組網

apache.zookeeper-3.4與apache.kafka-2.11的安裝

Json轉化與ExtJS樹(後臺處理)

Kubeedge-mapper 實現

Solr-常見問題彙總（持續更新）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結