解決Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 175 in stage 4.0 failed 8 times, most recent failure: Lost task 175.7 in stage 4.0 (TID 421, bsa100): java.lang.RuntimeException: java.lang.RuntimeException: **Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient**
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
	at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
	at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
	at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
	at com.xxx.main.ParserDataToHive$.Log2HiveWithDate(ParserDataToHive.scala:149)
	at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
	at com.nsfocus.bsa.iot.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1909)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1909)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
	... 23 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
	... 29 more
Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
	... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
	... 53 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1883)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1896)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1909)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1980)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
	at com.xxx.main.ParserDataToHive$.main(ParserDataToHive.scala:116)
	at com.xxx.main.ParserDataToHive.main(ParserDataToHive.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
	at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
	at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
	at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
	at com.xxx.main.ParserDataToHive$.Log2HiveWithDate(ParserDataToHive.scala:149)
	at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
	at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at 

源代碼:

		val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
        .map(r => Tuple2(r.getInt(0), r.getLong(1)))
      reportDateAndNums.foreach(println)
      
reportDateAndNums.foreach(tup => {
        val detectDay: String = tup._1.toString
        val detectMonth: String = detectDay.substring(0, 6)
        println(detectDay,detectMonth)
        
        val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
          s"where day_id=${detectDay}").collect()(0).getLong(0)

         //...
          }

        }
      })

嘗試1:

由於在foreach中調用hiveContext做查詢,且一直報hive相關錯誤,所以先註釋掉foreach中hive相關的代碼。

		val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
        .map(r => Tuple2(r.getInt(0), r.getLong(1)))
      reportDateAndNums.foreach(println)
      
reportDateAndNums.foreach(tup => {
        val detectDay: String = tup._1.toString
        val detectMonth: String = detectDay.substring(0, 6)
        println(detectDay,detectMonth)
        
        //val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
          //s"where day_id=${detectDay}").collect()(0).getLong(0)

         //...
          }

        }
      })

結果: log中沒有輸出兩處foreach中應有的輸出,於是猜測是以下這兩處輸出代碼有問題。

reportDateAndNums.foreach(println)
reportDateAndNums.foreach(tup => {
//.....
println(detectDay,detectMonth)
      })

reportDateAndNums可以發現是rdd類型,如果正確遍歷輸出其內容?

  • 如果數據量較小
myRDD.collect().foreach(println)
  • 如果數據量上10億級,這就不是很友好,應該考慮取少量輸出
myRDD.take(n).foreach(println)

參考:
https://stackoverflow.com/questions/23173488/how-to-print-the-contents-of-rdd

這裏我的reportDateAndNums是分組查詢過來的,並不大,所以採取直接collect.foreach()。

因此猜想在foreach中調用hivecontext.sql一直報錯的原因是不是沒有collect先獲取到數據

嘗試2:

		val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
        .map(r => Tuple2(r.getInt(0), r.getLong(1)))
      reportDateAndNums.collect().foreach(println)
      
reportDateAndNums.collect().foreach(tup => {
        val detectDay: String = tup._1.toString
        val detectMonth: String = detectDay.substring(0, 6)
        println(detectDay,detectMonth)
        
        val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
          s"where day_id=${detectDay}").collect()(0).getLong(0)

         //...
          }

        }
      })

**結果:**正常運行

參考:
https://stackoverflow.com/questions/23173488/how-to-print-the-contents-of-rdd
https://stackoverflow.com/questions/28804647/why-does-foreach-not-bring-anything-to-the-driver-program
https://stackoverflow.com/questions/41487346/scala-spark-dataframe-convert-rows-to-map-variable

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章