有個現實的需求,數據量可能在100億條左右。現有的數據庫是SQL Server,隨着採集的數據不斷的填充,查詢的效率越來越慢(現有的SQL Server查詢已經需要數十秒鐘的時間),看看有沒有優化的方案。
考慮過SQL Server加索引、分區表、分庫分表等方案,但數據量增長太快,還是很快就會遇到瓶頸,因此需要更優化的技術。在衆多的NOSQL和大數據技術之下,針對此場景,主要考慮了兩種方案:
1. MongoDB:json文檔型數據庫,可以通過集羣拓展。但更適合列比較複雜的場景快速查詢。
2. Hadoop:大數據領域的瑞士軍刀,周邊有很多相配套的工具可以使用,後期拓展性較強。
因爲此需求只是簡單的根據編碼找到對應的卷號,因此最終選擇Hadoop實現。
一、部署Hadoop
直接去官方下載,https://hadoop.apache.org/。
要注意版本的問題,版本不匹配會帶來很多麻煩。我這裏選擇的是hadoop 3.3.4的版本。
步驟:
1. 找到hadoop對應版本的winutils.exe、hadoop.dll文件
複製hadoop 3.3.4版本對應的winutils.exe和hadoop.dll文件到hadoop的bin文件夾下面。同步複製這兩個文件,到C:\Windows\System32下面。
這兩個文件可以去github上面搜索,一定要注意跟你的hadoop版本一致,否則不通過。
2. 文件配置(下面的配置文件都在 hadoop 3.3.4/etc/hadoop 文件夾內)
a). hadoop-env.cmd文件配置:
set JAVA_HOME=C:\Users\Administrator\.jdks\corretto-11.0.21
注意:這裏的JAVA_HOME是指向的openjdk(開源)的版本,oracle的jdk用不起來。必須要安裝openjdk。
b). core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
c). hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop-3.3.4/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop-3.3.4/data/datanode</value> </property> </configuration>
d). yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
3. 配置環境變量
再添加到Path,%HADOOP_HOME%\bin
可以在控制檯輸入:hadoop version,驗證是否安裝配置正確
最後在控制檯輸入:start-all.cmd ,啓動Hadoop。沒有錯誤信息,表示Hadoop啓動成功。
二、部署Hbase
安裝Hbase可以到官網下載:https://hbase.apache.org/。
同樣要非常關注版本的問題,因爲我上面選擇的Hadoop是3.3.4,與之配套的Hbase的版本是2.5.5。
步驟:
1. 將之前下載的winutils.exe和hadoop.dll文件拷貝到 hbase的bin目錄下,比如我的:E:\hbase-2.5.5\bin。
2. 文件配置
在hbase的conf目錄下,打開hbase-site.xml文件,添加如下內容:
<configuration> <property> <name>hbase.rootdir</name> <value>file:///E:/hbase-2.5.5/root</value> </property> <property> <name>hbase.cluster.distributed</name> <value>false</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>127.0.0.1</value> </property> <property> <name>hbase.tmp.dir</name> <value>./tmp</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> </configuration>
按照上述的配置說明,在hbase目錄下,添加root和tmp文件夾。
3.配置環境變量(此處省略,參考上面的hadoop的截圖)
找到hbase的bin目錄下的start-hbase.cmd文件,雙擊啓動。
hbase啓動完成後的界面:
三、基於若依進行二次開發
直接引用ruoyi的項目,在裏面添加功能,當然首先需要導入相應的jar包(這些jar包在hadoop和hbase裏面都有,直接引用即可)。
當然下面還有引用的jar包,這裏就不截圖了,供參考。
該項目基於SpringBoot框架,實現了基於HDFS、hbase的基礎功能。
控制器代碼如下:
package com.ruoyi.web.controller.roll; import com.ruoyi.common.core.controller.BaseController; import com.ruoyi.common.core.domain.R; import com.ruoyi.common.core.domain.entity.SysRole; import com.ruoyi.common.core.page.TableDataInfo; import com.ruoyi.common.roll.RollEntity; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.client.coprocessor.AggregationClient; import org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter; import org.apache.hadoop.hbase.filter.*; import org.apache.shiro.authz.annotation.RequiresPermissions; import org.springframework.stereotype.Controller; import org.springframework.util.StopWatch; import org.springframework.web.bind.annotation.*; import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.ByteArrayOutputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.net.MalformedURLException; import java.net.URL; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FsUrlStreamHandlerFactory; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.CellUtil; import org.apache.hadoop.hbase.CompareOperator; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.ZooKeeperConnectionException; import org.apache.hadoop.hbase.exceptions.DeserializationException; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.mapreduce.Job; @Controller @RequestMapping("/roll") public class RollController extends BaseController { private String prefix = "/roll"; /** * 新增角色 */ @GetMapping("/add") public String add() { // long count = rowCountByCoprocessor("mytb"); // System.out.println("總記錄數->>>"+count + ""); return prefix + "/add"; } @PostMapping("/list") @ResponseBody public TableDataInfo list(String inputEPC) { // startPage(); // List<SysRole> list = roleService.selectRoleList(role); //String epc = "E280117020000333BF040B34"; //String epc = "E280119120006618A51D032D"; //查詢的EPC String epc = inputEPC; String tableName = "mytb"; String columnFamily = "mycf"; // create(tableName, columnFamily); // insert(tableName,columnFamily); long startTime = System.currentTimeMillis(); //E280119120006BEEA4E5032 String reVal = query(tableName, columnFamily, epc); long endTime = System.currentTimeMillis(); System.out.println("卷號查詢時間爲:" + (endTime - startTime) + "ms"); RollEntity model = new RollEntity(); model.epc = epc; model.rollName = reVal; model.searchTime = (endTime - startTime) + "ms"; List<RollEntity> list = new ArrayList<>(); list.add(model); return getDataTable(list); } // 創建表 public static void create(String tableName, String columnFamily) { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); conf.set("hbase.zookeeper.quorum", "localhost"); try { Connection conn = ConnectionFactory.createConnection(conf); if (conn.getAdmin().tableExists(TableName.valueOf(tableName))) { System.err.println("Table exists!"); } else { HTableDescriptor tableDesc = new HTableDescriptor(TableName.valueOf(tableName)); try { tableDesc.addFamily(new HColumnDescriptor(columnFamily)); conn.getAdmin().createTable(tableDesc); System.err.println("Create Table SUCCESS!"); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } // 插入數據 public static void insert(String tableName, String columnFamily) { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); conf.set("hbase.zookeeper.quorum", "localhost"); try { Connection conn = ConnectionFactory.createConnection(conf); TableName tn = TableName.valueOf(tableName); Table table = conn.getTable(tn); try { // for (int i = 17742000; i <= 100000000; i++) { // Put put = new Put(Bytes.toBytes("row" + i)); // put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"), // Bytes.toBytes("E280119120006BEEA4E5032" + i)); // table.put(put); // } // Put put = new Put(Bytes.toBytes("E280119120006618A51D032D")); // put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"), // Bytes.toBytes("CQ-230308009")); // table.put(put); Put put = new Put(Bytes.toBytes("E280117020000333BF040B34")); put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("code"), Bytes.toBytes("CQ-230309002")); table.put(put); table.close();// 釋放資源 System.err.println("record insert SUCCESS!"); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } // 查詢 public static String query(String tableName, String columnFamily, String rowName) { String reVal = ""; Configuration conf = HBaseConfiguration.create(); conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); conf.set("hbase.zookeeper.quorum", "localhost"); try { Connection conn = ConnectionFactory.createConnection(conf); TableName tn = TableName.valueOf(tableName); Table table = conn.getTable(tn); try { Get get = new Get(rowName.getBytes()); Result r = table.get(get); for (Cell cell : r.rawCells()) { String family = new String(CellUtil.cloneFamily(cell)); String qualifier = new String(CellUtil.cloneQualifier(cell)); String value = new String(CellUtil.cloneValue(cell)); System.out.println("列:" + family + ":" + qualifier + " 值:" + value); reVal = value; break; } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } finally { conn.close(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return reVal; } //過濾查詢 public static void queryFilter(String tableName, String columnFamily, String rowName, String value) { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); conf.set("hbase.zookeeper.quorum", "localhost"); try { Connection conn = ConnectionFactory.createConnection(conf); TableName tn = TableName.valueOf(tableName); Table table = conn.getTable(tn); try { Scan scan = new Scan(); Filter filter = new ValueFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes(value))); scan.setFilter(filter); ResultScanner rs = table.getScanner(scan); for (Result res : rs) { System.out.println(res); } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } //讀取HDFS文件 private static void readHDFSFileContents() { InputStream is = null; OutputStream os = null; BufferedInputStream bufferInput = null; BufferedOutputStream bufferOutput = null; try { is = new URL("hdfs://127.0.0.1:9000/myHadoop/1.txt").openStream(); bufferInput = new BufferedInputStream(is); // IOUtils.copyBytes(is, os, 4096,false); byte[] contents = new byte[1024]; int bytesRead = 0; String strFileContents = ""; while ((bytesRead = is.read(contents)) != -1) { strFileContents += new String(contents, 0, bytesRead); } System.out.println(strFileContents); } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } finally { // IOUtils.closeStream(is); } } //創建HDFS目錄 private static void createHDFSDirectory() { // TODO Auto-generated method stub try { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://127.0.0.1:9000"); FileSystem fs = FileSystem.get(conf); boolean result = fs.mkdirs(new Path("/myHadoop")); System.out.println(result); } catch (Exception e) { e.printStackTrace(); } } //查詢Hbase有多少條記錄 public long rowCountByCoprocessor(String tablename){ long count = 0; try { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); conf.set("hbase.zookeeper.quorum", "localhost"); Connection connection = ConnectionFactory.createConnection(conf); //提前創建connection和conf Admin admin = connection.getAdmin(); //admin.enableTable(TableName.valueOf("mytb")); TableName name=TableName.valueOf(tablename); //先disable表,添加協處理器後再enable表 //admin.disableTable(name); HTableDescriptor descriptor = new HTableDescriptor(name); //admin.getTableDescriptor(name); //descriptor.setReadOnly(false); String coprocessorClass = "org.apache.hadoop.hbase.coprocessor.AggregateImplementation"; if (! descriptor.hasCoprocessor(coprocessorClass)) { descriptor.addCoprocessor(coprocessorClass); } //admin.modifyTable(name, descriptor); //admin.enableTable(name); //計時 StopWatch stopWatch = new StopWatch(); stopWatch.start(); Scan scan = new Scan(); AggregationClient aggregationClient = new AggregationClient(conf); //System.out.println("RowCount: " + aggregationClient.rowCount(name, new LongColumnInterpreter(), scan)); count = aggregationClient.rowCount(name, new LongColumnInterpreter(), scan); stopWatch.stop(); System.out.println("統計耗時:" +stopWatch.getTotalTimeMillis()); connection.close(); } catch (Throwable e) { e.printStackTrace(); } return count; } }
最終效果:
其它環境配置問題參考:
1. https://blog.csdn.net/meiLin_Ya/article/details/86145895
2. https://blog.csdn.net/qq_45923034/article/details/127850470