presto配置
1.presto 安裝包下載
https://prestosql.io/download.html
2.cd presto-server-xxx/etc
3.mkdir catalog
4.確保存在以下文件,沒有就創建
(192.168.201.31 換成自己主結點的 IP)
FILE: jvm.config
-server
-Xmx20G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ReservedCodeCacheSize=150M
-XX:CMSInitiatingOccupancyFraction=70
FILE:log.properties
com.facebook.presto=INFO
FILE:config.properties
主結點機器:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=30GB
query.max-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://192.168.201.31:8080
從結點機器:
coordinator=false
http-server.http.port=8080
query.max-memory=30GB
query.max-memory-per-node=2GB
discovery.uri=http://192.168.201.31:8080
FILE:node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/root/soft/presto # 確保這個目錄存在
FILE: catalog/jmx.properties
connector.name=jmx
FILE: catalog/mongodb.properties
connector.name=mongodb
mongodb.seeds=192.168.201.40:27017
mongodb.schema-collection=ods
mongodb.credentials=用戶@密碼@admin
5.~/soft/presto-server-0.149/bin/launcher start
細節參數,自己看文檔微調.
jupyter安裝
pip3 install --upgrade jupyter matplotlib numpy pandas scipy scikit-learn jupyter_contrib_nbextensions
jupyter notebook
presto client組件(jupyter在master結點)
1.pip3 install presto-python-client
2.用這個類debug,具體的話,看文檔 https://github.com/prestodb/presto-python-client
import prestodb
class Presto:
conn=prestodb.dbapi.connect(
host='127.0.0.1',
port=8080,
user='root',
catalog='mongodb',
schema='ods', ### 對應你mongodb的數據庫
)
statistics_conn = prestodb.dbapi.connect(
host='127.0.0.1',
port=8080,
user='root',
catalog='mongodb',
schema='app',
)
eng = None
@classmethod
def query(cls, sql):
print(sql)
cur = Presto.conn.cursor()
cur.execute(sql)
return cur.fetchall()
補充:presto拉大量數據(mongo)慢的解決方案
慢是因爲你拉大量數據,內網傳輸慢,所以可以壓縮mongo,減少數據體積
db.createCollection( "email", { storageEngine: {
wiredTiger: { configString: 'block_compressor=zlib' }}})
經過測試,上億的數據,presto+mongo比較輕鬆,甚至比hive快,如果更大數據庫估計還是要用hive。