Table of Contents
环境
Hadoop 2.10.0
Spark version: 2.4.5
Hive version: 3.1.2
准备测试数据
> beeline -u jdbc:hive2://localhost:10000 -n root
create table default.member_phone as select '00001' as member_srl, '136****3896' as phone_num;
依赖包准备
compile 'org.apache.spark:spark-sql_2.12:2.4.5' compile 'org.apache.spark:spark-hive_2.12:2.4.5'
根据Hive是否启动metastore service有两种不同的连接方式。
第一种 metastore service 已启动
若已经启动metastore service,则spark sql直接做为metastore service的thrift client来访问metastore数据。
.config("hive.metastore.uris", "thrift://localhost:9083")
为SparkSession对象添加上面的metastore 服务器地址即可。
测试代码如下:
import org.apache.spark.sql.SparkSession; public class SparkHive { public static void main(String[] args) { SparkSession session = null; try { session = SparkSession.builder() .appName("spark hive") .master("local[*]") .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate(); session.sql("select * from default.member_phone") .show(); } finally { if (session != null) { session.close(); } } } }
第二种 Metastore Service 未启动
若metastore service 以embedded 模式作为hive server的一部分运行的,那它就不能独立提供metastore service。就没有9083端口的thrift metastore service. 那么这时就需要spark sql 自己根据metastore database的配置创建metastore server访问hive的meta数据。这里的metastore database 应该是remote模式运行的。
配置文件 resources/hive-site.xml
<configuration> <!-- database configuration --> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?useSSL=false</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>abc123</value> <description>password to use against metastore database</description> </property> </configuration>
加入mysql的driver jar
compile 'mysql:mysql-connector-java:5.1.41'
Java测试代码
import org.apache.spark.sql.SparkSession; public class SparkHive { public static void main(String[] args) { SparkSession session = null; try { session = SparkSession.builder() .appName("spark hive") .master("local[*]") .enableHiveSupport() .getOrCreate(); session.sql("select * from default.member_phone") .show(); } finally { if (session != null) { session.close(); } } } }
打印出数据库里的信息就表示访问成功。
+----------+-----------+
|member_srl| phone_num|
+----------+-----------+
| 0001|136****3896|
+----------+-----------+
hive的不同启动模式参考:https://blog.csdn.net/adorechen/article/details/104934057
常见错误问题
Unable to instantiate SparkSession with Hive support because Hive classes are not found.
没有找到spark-hive相关的包,加入spark-hive依赖。
compile 'org.apache.spark:spark-hive_2.12:2.4.5'
参考文章:
http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html