Spark命令行測試轉換RDD to DataFrame報Relative path in absolute URI錯誤-Win7

無聊玩了一下Spark命令行模式的命令,測試讀寫Parquet格式的操作,發現執行personRDD.toDF時候報以下錯誤:

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
        ......

然後發現是由於一下錯誤引起的:

Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/Administrator/spark-warehouse
  at java.net.URI.checkPath(URI.java:1823)
  at java.net.URI.<init>(URI.java:745)
  at org.apache.hadoop.fs.Path.initialize(Path.java:203)
  ... 96 more

看了一下源碼:

  private void initialize(String scheme, String authority, String path,
      String fragment) {
    try {
      this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)
        .normalize();
    } catch (URISyntaxException e) {
      throw new IllegalArgumentException(e);
    }
  }

折騰了一下(幾分鐘吧),然後發現是讀取hive-site.xml時候,${system:java.io.tmpdir}和${system:user.name}替換的問題,需要在hive-site.xml裏面配置絕對路徑(並事先創建):

  <property>
    <name>system:java.io.tmpdir</name>
    <value>C:/Users</value>
    <description/>
  </property>
  <property>
    <name>system:user.name</name>
    <value>Administrator</value>
    <description/>
  </property>

重新執行personRDD.toDF,問題解決 :)


下面是具體的測試命令行:

scala> case class Person(firstName: String, lastName: String, age:Int)
scala> val personRDD = sc.textFile("hdfs://localhost:9000/person").map(line => line.split(",")).map(p => Person(p(0),p(1),p(2).toInt))
scala> val personDF = personRDD.toDF
scala> personDF.registerTempTable("person")
scala> val people = sql("select * from person")
scala> people.collect.foreach(println)


我hdfs裏面的測試文件是person.txt,內容比較簡單:

C:\Users\Administrator>hdfs dfs -cat /person/person.txt
Barack,Obama,53
George,Bush,68
Bill,Clinton,68

scala> people.collect.foreach(println)的執行結果是:

scala> people.collect.foreach(println)
[Stage 2:> (0 + 0) / 2]

[Barack,Obama,53]
[George,Bush,68]
[Bill,Clinton,68]


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章