如有不妥之處,歡迎隨時留言溝通交流,謝謝~
Impala分爲是三個組件,statestored/catalogd和impalad,其中statestored和catalogd是單點的,沒有高可用的需求,因爲這兩個實例是無狀態的,本身不存儲任何數據,例如catalogd的數據存儲在第三方數據庫(例如mysql中),statestore的數據全都存儲在內存中,可以通過簡單的主備的方式來實現高可用,本文最後會提到。正常情況下只有master提供服務,slave只是運行狀態但是不接受任何請求,當master出現問題之後再slave提升爲master提供服務。
而對於impalad節點,每一個節點都可以提供jdbc和thrift等服務,並且對於連接到該impalad的查詢作爲coordinator節點(需要消耗一定的內存和CPU)存在,爲了保證每一個節點的負載的平衡需要對於這些impalad做一下均衡,負載均衡分爲四層負載均衡和七層負載均衡,前者是針對運輸層的,後者是針對應用層的,區別在於前者不需要了解應用協議,只需要對傳輸層收到的IP數據包進行轉發,而後者需要了解應用協議的,而對於impalad這種SQL服務器,就需要使用SQL協議的代理,所以七層代理對於impalad是有點不切實際的。
主要用的就是haproxy四層交換機的特性,講所有指向haproxy主機和端口的請求,轉發到相應的主機:端口上。
1、安裝haproxy
yum install haproxy
2、配置文件
vim /etc/haproxy/haproxy.cfg
文件內容
global
# To have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
#stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#
# You might need to adjust timing values to prevent timeouts.
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
maxconn 3000
#連接時間需要修改,因爲如果時間較短的話會出現,任務在執行,但是連接已經斷開所以獲取不到結果的情況。
timeout connect 3600000ms
timeout client 3600000ms
timeout server 3600000ms
#
# This sets up the admin page for HA Proxy at port 25002.
#
listen stats :25002
balance
mode http
stats enable
stats auth admin:admin
#配置修改
listen impala-shell :25003
mode tcp
option tcplog
balance leastconn
server impala2 impala-host02:21000
server impala3 impala-host03:21000
server impala4 impala-host04:21000
server impala5 impala-host05:21000
# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
# HAProxy will balance connections among the list of servers listed below.
# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
#配置修改
listen impala-jdbc:25001
# impala負載均衡需要第四層的,所以填tcp
mode tcp
option tcplog
balance leastconn
#主機列表
#impala-host02是impala的主機列表
#由於是要對jdbc的請求進行轉發,所以端口設置的是21050
server impala2 impala-host02:21050
server impala3 impala-host03:21050
server impala4 impala-host04:21050
server impala5 impala-host05:21050
3、啓動
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg
4、負載連接
haproxy-host-ip是haproxy的ip地址,主機名也可以,負載連接時用
impala-shell:haproxy-host-ip:25003
impala-jdbc:haproxy-host-ip:25001
在haproxy上,使用impala-shell(localhost可以改成haproxy-host-ip):
mqq@VM-106-20-ubuntu:~$ impala-shell -i localhost:25003
Starting Impala Shell without Kerberos authentication
Connected to localhost:25003
Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 03c6ddbdcec39238be4f5b14a300d5c4f576097e)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.2 (ed85dce) built on Tue Mar 27 13:39:48 PDT 2018)
You can run a single query from the command line using the '-q' option.
***********************************************************************************
[localhost:25003] > select * from t_sec_hurst_bs_dim limit 1;
Query: select * from t_sec_hurst_bs_dim limit 1
Query submitted at: 2018-07-25 20:16:07 (Coordinator: http://bigdata04:25000)
ERROR: AnalysisException: Could not resolve table reference: 't_sec_hurst_bs_dim'