環境說明
系統爲:centos系統
在安裝hive之前
請更新pip並更換鏡像,這樣的目的是爲了使下載速度變成光速下載。
pip install pip -U
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
安裝pyhive
如果直接安裝 pip install thrift-sasl , 會報錯加上版本號就ok了。
pip install sasl
pip install thrift
pip install thrift-sasl==0.3.0
pip install PyHive
使用hive
import pandas as pd
from pyhive import hive
conn = hive.Connection(host='****', port=****, username='****', database='****')
cursor = conn.cursor()
sql_hive ="""
select *
from table
"""
cursor.execute(sql_hive)
data = cursor.fetchall()
results = pd.DataFrame(data)
print(results.shape)
安裝impala
在安裝上面的基礎上進操作
pip install impyla
使用impala
from impala.dbapi import connect as impala_connect
from impala.util import as_pandas
def impala_db(sql):
conn = impala_connect(host ='****',port = ****)
cur = conn.cursor()
cur.execute(sql)
results = as_pandas(cur)
print(results.shape)
sql ="""
SELECT *
FROM table
"""
impala_db(sql)