ruoyi+Hadoop+hbase實現大數據存儲查詢

有個現實的需求,數據量可能在100億條左右。現有的數據庫是SQL Server,隨着採集的數據不斷的填充,查詢的效率越來越慢(現有的SQL Server查詢已經需要數十秒鐘的時間),看看有沒有優化的方案。

考慮過SQL Server加索引、分區表、分庫分表等方案,但數據量增長太快,還是很快就會遇到瓶頸,因此需要更優化的技術。在衆多的NOSQL和大數據技術之下,針對此場景,主要考慮了兩種方案:

1. MongoDB:json文檔型數據庫,可以通過集羣拓展。但更適合列比較複雜的場景快速查詢。

2. Hadoop:大數據領域的瑞士軍刀,周邊有很多相配套的工具可以使用,後期拓展性較強。

因爲此需求只是簡單的根據編碼找到對應的卷號,因此最終選擇Hadoop實現。

一、部署Hadoop

直接去官方下載,https://hadoop.apache.org/。

要注意版本的問題,版本不匹配會帶來很多麻煩。我這裏選擇的是hadoop 3.3.4的版本。

步驟:

1. 找到hadoop對應版本的winutils.exe、hadoop.dll文件

複製hadoop 3.3.4版本對應的winutils.exe和hadoop.dll文件到hadoop的bin文件夾下面。同步複製這兩個文件,到C:\Windows\System32下面。

這兩個文件可以去github上面搜索,一定要注意跟你的hadoop版本一致,否則不通過。

2. 文件配置(下面的配置文件都在 hadoop 3.3.4/etc/hadoop 文件夾內)

a). hadoop-env.cmd文件配置:

set JAVA_HOME=C:\Users\Administrator\.jdks\corretto-11.0.21

注意:這裏的JAVA_HOME是指向的openjdk(開源)的版本,oracle的jdk用不起來。必須要安裝openjdk。

b). core-site.xml

<configuration>
    <property> 
        <name>fs.defaultFS</name> 
        <value>hdfs://localhost:9000</value> 
    </property>
</configuration>

c). hdfs-site.xml

<configuration>
    <property> 
        <name>dfs.replication</name> 
        <value>1</value> 
    </property> 
    <property> 
        <name>dfs.namenode.name.dir</name> 
        <value>/hadoop-3.3.4/data/namenode</value> 
    </property> 
    <property> 
        <name>dfs.datanode.data.dir</name> 
        <value>/hadoop-3.3.4/data/datanode</value> 
    </property> 
</configuration>

d). yarn-site.xml

<configuration>
    <property> 
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
        <name>yarn.nodemanager.auservices.mapreduce.shuffle.class</name> 
        <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
    </property> 
</configuration>

3. 配置環境變量

再添加到Path,%HADOOP_HOME%\bin

可以在控制檯輸入:hadoop version,驗證是否安裝配置正確

最後在控制檯輸入:start-all.cmd ,啓動Hadoop。沒有錯誤信息,表示Hadoop啓動成功。

二、部署Hbase

安裝Hbase可以到官網下載:https://hbase.apache.org/。

同樣要非常關注版本的問題,因爲我上面選擇的Hadoop是3.3.4,與之配套的Hbase的版本是2.5.5。

步驟:

1. 將之前下載的winutils.exe和hadoop.dll文件拷貝到 hbase的bin目錄下,比如我的:E:\hbase-2.5.5\bin。

2. 文件配置

在hbase的conf目錄下,打開hbase-site.xml文件,添加如下內容:

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///E:/hbase-2.5.5/root</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>127.0.0.1</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
</configuration>

按照上述的配置說明,在hbase目錄下,添加root和tmp文件夾。

3.配置環境變量(此處省略,參考上面的hadoop的截圖)

找到hbase的bin目錄下的start-hbase.cmd文件,雙擊啓動。

hbase啓動完成後的界面:

三、基於若依進行二次開發

直接引用ruoyi的項目,在裏面添加功能,當然首先需要導入相應的jar包(這些jar包在hadoop和hbase裏面都有,直接引用即可)。

當然下面還有引用的jar包,這裏就不截圖了,供參考。

該項目基於SpringBoot框架,實現了基於HDFS、hbase的基礎功能。

控制器代碼如下:

package com.ruoyi.web.controller.roll;

import com.ruoyi.common.core.controller.BaseController;
import com.ruoyi.common.core.domain.R;
import com.ruoyi.common.core.domain.entity.SysRole;
import com.ruoyi.common.core.page.TableDataInfo;
import com.ruoyi.common.roll.RollEntity;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.client.coprocessor.AggregationClient;
import org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter;
import org.apache.hadoop.hbase.filter.*;
import org.apache.shiro.authz.annotation.RequiresPermissions;
import org.springframework.stereotype.Controller;
import org.springframework.util.StopWatch;
import org.springframework.web.bind.annotation.*;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.CompareOperator;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.exceptions.DeserializationException;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.mapreduce.Job;

@Controller
@RequestMapping("/roll")
public class RollController extends BaseController {
    private String prefix = "/roll";

    /**
     * 新增角色
     */
    @GetMapping("/add")
    public String add() {
//        long count = rowCountByCoprocessor("mytb");
//        System.out.println("總記錄數->>>"+count + "");
        return prefix + "/add";
    }

    @PostMapping("/list")
    @ResponseBody
    public TableDataInfo list(String inputEPC) {
//        startPage();
//        List<SysRole> list = roleService.selectRoleList(role);

        //String epc = "E280117020000333BF040B34";
        //String epc = "E280119120006618A51D032D"; //查詢的EPC
        String epc = inputEPC;
        String tableName = "mytb";
        String columnFamily = "mycf";

//        create(tableName, columnFamily);
//        insert(tableName,columnFamily);

        long startTime = System.currentTimeMillis();
        //E280119120006BEEA4E5032
        String reVal = query(tableName, columnFamily, epc);
        long endTime = System.currentTimeMillis();
        System.out.println("卷號查詢時間爲:" + (endTime - startTime) + "ms");
        RollEntity model = new RollEntity();
        model.epc = epc;
        model.rollName = reVal;
        model.searchTime = (endTime - startTime) + "ms";
        List<RollEntity> list = new ArrayList<>();
        list.add(model);
        return getDataTable(list);
    }

    // 創建表
    public static void create(String tableName, String columnFamily) {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "localhost");
        try {
            Connection conn = ConnectionFactory.createConnection(conf);

            if (conn.getAdmin().tableExists(TableName.valueOf(tableName))) {
                System.err.println("Table exists!");
            } else {
                HTableDescriptor tableDesc = new HTableDescriptor(TableName.valueOf(tableName));
                try {
                    tableDesc.addFamily(new HColumnDescriptor(columnFamily));
                    conn.getAdmin().createTable(tableDesc);
                    System.err.println("Create Table SUCCESS!");
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    // 插入數據
    public static void insert(String tableName, String columnFamily) {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "localhost");
        try {
            Connection conn = ConnectionFactory.createConnection(conf);

            TableName tn = TableName.valueOf(tableName);
            Table table = conn.getTable(tn);
            try {

//                for (int i = 17742000; i <= 100000000; i++) {
//                    Put put = new Put(Bytes.toBytes("row" + i));
//                    put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"),
//                            Bytes.toBytes("E280119120006BEEA4E5032" + i));
//                    table.put(put);
//                }

//                Put put = new Put(Bytes.toBytes("E280119120006618A51D032D"));
//                put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"),
//                            Bytes.toBytes("CQ-230308009"));
//                table.put(put);

                Put put = new Put(Bytes.toBytes("E280117020000333BF040B34"));
                put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"),
                        Bytes.toBytes("CQ-230309002"));
                table.put(put);


                table.close();// 釋放資源
                System.err.println("record insert SUCCESS!");
            } catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    // 查詢
    public static String query(String tableName, String columnFamily, String rowName) {

        String reVal = "";

        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "localhost");
        try {
            Connection conn = ConnectionFactory.createConnection(conf);

            TableName tn = TableName.valueOf(tableName);
            Table table = conn.getTable(tn);
            try {
                Get get = new Get(rowName.getBytes());
                Result r = table.get(get);
                for (Cell cell : r.rawCells()) {
                    String family = new String(CellUtil.cloneFamily(cell));

                    String qualifier = new String(CellUtil.cloneQualifier(cell));
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println("列:" + family + ":" + qualifier + " 值:" + value);
                    reVal = value;
                    break;
                }
            } catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } finally {
                conn.close();
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        return reVal;
    }

    //過濾查詢
    public static void queryFilter(String tableName, String columnFamily, String rowName, String value) {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "localhost");

        try {
            Connection conn = ConnectionFactory.createConnection(conf);

            TableName tn = TableName.valueOf(tableName);
            Table table = conn.getTable(tn);
            try {
                Scan scan = new Scan();
                Filter filter = new ValueFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes(value)));
                scan.setFilter(filter);
                ResultScanner rs = table.getScanner(scan);
                for (Result res : rs) {
                    System.out.println(res);
                }
            } catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    //讀取HDFS文件
    private static void readHDFSFileContents() {

        InputStream is = null;
        OutputStream os = null;
        BufferedInputStream bufferInput = null;
        BufferedOutputStream bufferOutput = null;
        try {
            is = new URL("hdfs://127.0.0.1:9000/myHadoop/1.txt").openStream();
            bufferInput = new BufferedInputStream(is);
            // IOUtils.copyBytes(is, os, 4096,false);

            byte[] contents = new byte[1024];

            int bytesRead = 0;
            String strFileContents = "";
            while ((bytesRead = is.read(contents)) != -1) {
                strFileContents += new String(contents, 0, bytesRead);
            }
            System.out.println(strFileContents);

        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            // IOUtils.closeStream(is);
        }
    }

    //創建HDFS目錄
    private static void createHDFSDirectory() {
        // TODO Auto-generated method stub
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
            FileSystem fs = FileSystem.get(conf);
            boolean result = fs.mkdirs(new Path("/myHadoop"));
            System.out.println(result);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    //查詢Hbase有多少條記錄
    public long rowCountByCoprocessor(String tablename){
        long count = 0;
        try {
            Configuration conf = HBaseConfiguration.create();
            conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
            conf.set("hbase.zookeeper.quorum", "localhost");

            Connection connection = ConnectionFactory.createConnection(conf);
            //提前創建connection和conf
            Admin admin = connection.getAdmin();
            //admin.enableTable(TableName.valueOf("mytb"));
            TableName name=TableName.valueOf(tablename);
            //先disable表,添加協處理器後再enable表
            //admin.disableTable(name);
            HTableDescriptor descriptor = new HTableDescriptor(name); //admin.getTableDescriptor(name);
            //descriptor.setReadOnly(false);
            String coprocessorClass = "org.apache.hadoop.hbase.coprocessor.AggregateImplementation";
            if (! descriptor.hasCoprocessor(coprocessorClass)) {
                descriptor.addCoprocessor(coprocessorClass);
            }
            //admin.modifyTable(name, descriptor);
            //admin.enableTable(name);

            //計時
            StopWatch stopWatch = new StopWatch();
            stopWatch.start();

            Scan scan = new Scan();
            AggregationClient aggregationClient = new AggregationClient(conf);

            //System.out.println("RowCount: " + aggregationClient.rowCount(name, new LongColumnInterpreter(), scan));

            count = aggregationClient.rowCount(name, new LongColumnInterpreter(), scan);
            stopWatch.stop();
            System.out.println("統計耗時:" +stopWatch.getTotalTimeMillis());
            connection.close();
        } catch (Throwable e) {
            e.printStackTrace();
        }
        return count;
    }
}

最終效果:

 

其它環境配置問題參考:

1. https://blog.csdn.net/meiLin_Ya/article/details/86145895

2. https://blog.csdn.net/qq_45923034/article/details/127850470

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章