SequoiaSQL 和 Spark 的安裝配置放在在這裏不贅述。
Thrift server 最好是配合 Hive 元數據庫使用。所以這裏講一下如何配置,並啓動 Thrift server ,最後可以用 beeline 命令行比較方便清爽的使用 Spark SQL 。下面是配置步驟。
1. Postgresql JDBC 驅動
下載 Postgresql jdbc驅動: https://jdbc.postgresql.org/download.html
注意驅動版本不僅跟Postgresql版本相關,還跟 jdk 版本相關。
下載之後,上傳postgresql jdbc驅動包到spark集羣所有服務器的某一個目錄下:
例如,上傳 postgresql-9.3-1104.jdbc41.jar 到目錄 /opt/spark-2.1.1-bin-hadoop2.7/jars/
2. 配置spark-defaults.conf 或者 spark-env.sh
conf/spark-defaults.conf :
spark.executor.extraClassPath /opt/spark-2.1.1-bin-hadoop2.7/jars/sequoiadb.jar:/opt/spark-2.1.1-bin-hadoop2.7/jars/spark-sequoiadb_2.11-2.6.0.jar:/opt/spark-2.1.1-bin-hadoop2.7/jars/postgresql-9.3-1104.jdbc41.jar
或者 conf/spark-env.sh :
SPARK_CLASSPATH=/opt/spark-2.1.1-bin-hadoop2.7/jars/sequoiadb.jar:/opt/spark-2.1.1-bin-hadoop2.7/jars/spark-sequoiadb_2.11-2.6.0.jar:/opt/spark-2.1.1-bin-hadoop2.7/jars/postgresql-9.3-1104.jdbc41.jar
3. 下載並上傳hive metastore建表腳本:
下載hive的源代碼包:
http://mirror.bit.edu.cn/apache/hive/hive-1.2.2/apache-hive-1.2.2-src.tar.gz
關於下載的hive版本,需要注意 spark 裏邊對應的 hive 版本。 Spark 2.1.1 的 hive 版本是 1.2.1 。不過, Hive 1.2.2 也一樣可以用。
解壓apache-hive-1.2.2-src.tar.gz之後,找到目錄 metastore\scripts\upgrade\postgres\ 下邊的兩個SQL腳本:
hive-schema-1.2.0.postgres.sql
hive-txn-schema-0.13.0.postgres.sql
上傳 這兩個sql腳本 到服務器上某個目錄下,待用。
4. 創建SequoiaSQL數據庫和用戶
首先創建數據庫和用戶:
./bin/psql -p 5432 foo
postgres=# CREATE DATABASE metastore TEMPLATE=template0 ENCODING='UTF8';
postgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';
postgres=# CREATE DATABASE metastore;
postgres=# \c metastore;
You are now connected to database 'metastore'.
然後執行 hive 裏的sql 腳本。這個腳本會創建Hive存儲元數據需要的庫表。
postgres=# \i hive-schema-1.2.0.postgres.sql
然後授權。運行下面這個SQL語句可以生成授權語句:
postgres=# SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' || schemaname || '"."' || tablename || '" TO hiveuser ;' FROM pg_tables WHERE tableowner = CURRENT_USER and schemaname = 'public';
把上一步生成的授權語句都運行一遍。(hive-schema-1.2.0.postgres.sql腳本應該創建了54個表)
然後,爲了方便起見,把所有權限授權給hiveuser:
GRANT ALL ON DATABASE hive_metastore to hiveuser;
5. 配置 Hive 的數據庫連接配置文件 hive-site.xml
在Spark目錄下新建配置文件 conf/hive-site,編寫內容如下 :
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://IP或者主機名:5432或者端口號/數據庫名</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>用戶名</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>密碼</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>
</configuration>
幾個配置項說明:
javax.jdo.option.ConnectionURL 填寫postgresql 的服務地址、端口號和database
javax.jdo.option.ConnectionUserName 填寫postgresql 用戶名
javax.jdo.option.ConnectionPassword 填寫用戶密碼
6. 重新啓動spark , 並啓動 thriftserver
啓動 Spark : sbin/start-all.sh
啓動 thrift server : sbin/start-thriftserver.sh --master spark://10.131.9.62:7077 --total-executor-cores 12 --executor-memory 1g
用jps命令可以看到 thrift server 啓動之後纔出現的 spark-submit 和 CoarseGrainedExecutorBackend 。 例如:
[sdbadmin@pop-s-invquery-a ~]$ jps
2252 BeeLine
1336 Worker
717 Master
1630 CoarseGrainedExecutorBackend
3588 Jps
1634 CoarseGrainedExecutorBackend
903 Worker
1194 Worker
1650 CoarseGrainedExecutorBackend
1633 CoarseGrainedExecutorBackend
1484 SparkSubmit
1047 Worker
7. 進入PG 查看thriftserver是否已經連接到spark
select * from pg_stat_activity where datname='hive_metastore';
輸出示例:
hive_metastore=# select * from pg_stat_activity where datname='hive_metastore';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_s
tart | query_start | state_change | waiting | state | query
-------+----------------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+----------------
---------------+-------------------------------+-------------------------------+---------+--------+----------------------------------------------------------------
27070 | hive_metastore | 3171 | 25592 | hiveuser | | 10.131.9.63 | | 42538 | 2017-06-30 18:31:34.694261+08 |
| 2017-06-30 18:31:34.699031+08 | 2017-06-30 18:31:34.699086+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3172 | 25592 | hiveuser | | 10.131.9.63 | | 42539 | 2017-06-30 18:31:34.700751+08 |
| 2017-06-30 18:31:34.704812+08 | 2017-06-30 18:31:34.704852+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3173 | 25592 | hiveuser | | 10.131.9.63 | | 42540 | 2017-06-30 18:31:35.111469+08 |
| 2017-06-30 18:31:35.1159+08 | 2017-06-30 18:31:35.115951+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3174 | 25592 | hiveuser | | 10.131.9.63 | | 42541 | 2017-06-30 18:31:35.11745+08 |
| 2017-06-30 18:31:35.121512+08 | 2017-06-30 18:31:35.121552+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3461 | 25592 | hiveuser | | 10.131.9.62 | | 37933 | 2017-06-30 18:49:13.029259+08 |
| 2017-06-30 18:49:13.033074+08 | 2017-06-30 18:49:13.033113+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3462 | 25592 | hiveuser | | 10.131.9.62 | | 37934 | 2017-06-30 18:49:13.0348+08 |
| 2017-06-30 18:49:13.038215+08 | 2017-06-30 18:49:13.038248+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3489 | 25592 | hiveuser | | 10.131.9.62 | | 37935 | 2017-06-30 18:50:42.775977+08 |
| 2017-06-30 18:50:42.77983+08 | 2017-06-30 18:50:42.779869+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3490 | 25592 | hiveuser | | 10.131.9.62 | | 37936 | 2017-06-30 18:50:42.781191+08 |
| 2017-06-30 18:50:42.784706+08 | 2017-06-30 18:50:42.784739+08 | f | idle | SET extra_float_digits = 3
27070 | hive_metastore | 3636 | 10 | sdbadmin | psql | | | -1 | 2017-06-30 18:52:03.679178+08 | 2017-06-30 18:5
2:09.564151+08 | 2017-06-30 18:52:09.564151+08 | 2017-06-30 18:52:09.564158+08 | f | active | select * from pg_stat_activity where datname='hive_metastore';
(9 rows)
8. 使用 beeline
/opt/spark-2.1.1-bin-hadoop2.7/bin/beeline -u jdbc:hive2://10.131.9.62:10000 -n hiveuser -p hiveuser
beeline 命令小提示:
連接數據庫: !connect <url> <username> <password> [driver]
退出: !quit