來自:http://www.zhangdongshengtech.com/article-detials/267
1. pyhive
首先是pyhive的安裝:pyhive這個包依 賴於sasl,thrift,thrift-sasl這三個包,因此請參照下面的順序安裝
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
pip安裝sasl報錯
Debian/Ubuntu:
sudo apt-get install python-dev libsasl2-dev gcc
pip install pyhs2 #代替上邊4個包安裝
CentOS/RHEL:
sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
pip install pyhs2
2. 連接數據庫
連接數據庫,如果需要設置密碼,那麼必須制定auth參數
from pyhive import hive
conn = hive.connect(host="server_ip",port=10000, auth="CUSTOM", database="...",username="...",password="...")
3. 查詢數據
query_sql = "select * from users"
curosr = conn.cursor()
curosr.execute(query_sql)
# 獲得列的信息
clumns = curosr.description
# 獲取全部數據,result是tuple
for result in curosr.fetchall():
print(result)
curosr.close()
查詢結果以tuple的形式返回,與之對應的列信息存放在curosr.description 中,如果想最終以字典的形式獲得數據,那麼需要根據description 和result進行組裝。
4. HiveClient
編寫一個HiveClient 類,只實現query功能,但是支持返回字典格式的數據,同時如果連接斷開,可以進行重連
from itertools import zip_longest
from pyhive import hive
from functools import wraps
class Retry(object):
def __init__(self, retry=3):
self.retry = retry
def __call__(self, func):
@wraps(func)
def wrapped_func(conn, query_sql):
retry_count = 0
while retry_count < self.retry:
try:
return func(conn, query_sql)
except Exception as e:
print(str(e))
conn.init_connection()
retry_count += 1
continue
raise Exception("多次重試仍然失敗,sql語句爲: " + query_sql)
return wrapped_func
class HiveClient(object):
def __init__(self, host, port, username, password, auth='CUSTOM'):
self.host = host
self.port = port
self.username = username
self.password = password
self.auth = auth
self.init_connection()
def init_connection(self):
self.conn = hive.Connection(host=self.host, port=self.port, username=self.username,password=self.password, auth=self.auth)
@Retry()
def query(self, query_sql):
datas = []
curosr = self.conn.cursor()
curosr.execute(query_sql)
clumns = curosr.description
for result in curosr.fetchall():
item = {}
for key, value in zip_longest(clumns, result):
item[key[0]] = value
datas.append(item)
curosr.close()
return datas
hc = HiveClient('ip', 15000, 'username', 'password')
sql = "select * from user"
data = hc.query(sql)
print(data)