Table of Contents
環境
Hadoop 2.10.0
Spark version: 2.4.5
Hive version: 3.1.2
準備測試數據
> beeline -u jdbc:hive2://localhost:10000 -n root
create table default.member_phone as select '00001' as member_srl, '136****3896' as phone_num;
依賴包準備
compile 'org.apache.spark:spark-sql_2.12:2.4.5' compile 'org.apache.spark:spark-hive_2.12:2.4.5'
根據Hive是否啓動metastore service有兩種不同的連接方式。
第一種 metastore service 已啓動
若已經啓動metastore service,則spark sql直接做爲metastore service的thrift client來訪問metastore數據。
.config("hive.metastore.uris", "thrift://localhost:9083")
爲SparkSession對象添加上面的metastore 服務器地址即可。
測試代碼如下:
import org.apache.spark.sql.SparkSession; public class SparkHive { public static void main(String[] args) { SparkSession session = null; try { session = SparkSession.builder() .appName("spark hive") .master("local[*]") .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate(); session.sql("select * from default.member_phone") .show(); } finally { if (session != null) { session.close(); } } } }
第二種 Metastore Service 未啓動
若metastore service 以embedded 模式作爲hive server的一部分運行的,那它就不能獨立提供metastore service。就沒有9083端口的thrift metastore service. 那麼這時就需要spark sql 自己根據metastore database的配置創建metastore server訪問hive的meta數據。這裏的metastore database 應該是remote模式運行的。
配置文件 resources/hive-site.xml
<configuration> <!-- database configuration --> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?useSSL=false</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>abc123</value> <description>password to use against metastore database</description> </property> </configuration>
加入mysql的driver jar
compile 'mysql:mysql-connector-java:5.1.41'
Java測試代碼
import org.apache.spark.sql.SparkSession; public class SparkHive { public static void main(String[] args) { SparkSession session = null; try { session = SparkSession.builder() .appName("spark hive") .master("local[*]") .enableHiveSupport() .getOrCreate(); session.sql("select * from default.member_phone") .show(); } finally { if (session != null) { session.close(); } } } }
打印出數據庫裏的信息就表示訪問成功。
+----------+-----------+
|member_srl| phone_num|
+----------+-----------+
| 0001|136****3896|
+----------+-----------+
hive的不同啓動模式參考:https://blog.csdn.net/adorechen/article/details/104934057
常見錯誤問題
Unable to instantiate SparkSession with Hive support because Hive classes are not found.
沒有找到spark-hive相關的包,加入spark-hive依賴。
compile 'org.apache.spark:spark-hive_2.12:2.4.5'
參考文章:
http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html