無聊玩了一下Spark命令行模式的命令,測試讀寫Parquet格式的操作,發現執行personRDD.toDF時候報以下錯誤:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
......
然後發現是由於一下錯誤引起的:
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/Administrator/spark-warehouse
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 96 more
看了一下源碼:
private void initialize(String scheme, String authority, String path,
String fragment) {
try {
this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)
.normalize();
} catch (URISyntaxException e) {
throw new IllegalArgumentException(e);
}
}
折騰了一下(幾分鐘吧),然後發現是讀取hive-site.xml時候,${system:java.io.tmpdir}和${system:user.name}替換的問題,需要在hive-site.xml裏面配置絕對路徑(並事先創建):
<property>
<name>system:java.io.tmpdir</name>
<value>C:/Users</value>
<description/>
</property>
<property>
<name>system:user.name</name>
<value>Administrator</value>
<description/>
</property>
重新執行personRDD.toDF,問題解決 :)
下面是具體的測試命令行:
scala> case class Person(firstName: String, lastName: String, age:Int)
scala> val personRDD = sc.textFile("hdfs://localhost:9000/person").map(line => line.split(",")).map(p => Person(p(0),p(1),p(2).toInt))
scala> val personDF = personRDD.toDF
scala> personDF.registerTempTable("person")
scala> val people = sql("select * from person")
scala> people.collect.foreach(println)
我hdfs裏面的測試文件是person.txt,內容比較簡單:
C:\Users\Administrator>hdfs dfs -cat /person/person.txt
Barack,Obama,53
George,Bush,68
Bill,Clinton,68
scala> people.collect.foreach(println)的執行結果是:
scala> people.collect.foreach(println)
[Stage 2:> (0 + 0) / 2]
[Barack,Obama,53]
[George,Bush,68]
[Bill,Clinton,68]