Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 175 in stage 4.0 failed 8 times, most recent failure: Lost task 175.7 in stage 4.0 (TID 421, bsa100): java.lang.RuntimeException: java.lang.RuntimeException: **Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient**
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.xxx.main.ParserDataToHive$.Log2HiveWithDate(ParserDataToHive.scala:149)
at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
at com.nsfocus.bsa.iot.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1909)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1909)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 23 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 29 more
Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
... 53 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1883)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1896)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1909)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1980)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
at com.xxx.main.ParserDataToHive$.main(ParserDataToHive.scala:116)
at com.xxx.main.ParserDataToHive.main(ParserDataToHive.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.xxx.main.ParserDataToHive$.Log2HiveWithDate(ParserDataToHive.scala:149)
at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
at com.xxx.main.ParserDataToHive$$anonfun$main$5.apply(ParserDataToHive.scala:116)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
源代碼:
val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
.map(r => Tuple2(r.getInt(0), r.getLong(1)))
reportDateAndNums.foreach(println)
reportDateAndNums.foreach(tup => {
val detectDay: String = tup._1.toString
val detectMonth: String = detectDay.substring(0, 6)
println(detectDay,detectMonth)
val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
s"where day_id=${detectDay}").collect()(0).getLong(0)
//...
}
}
})
嘗試1:
由於在foreach中調用hiveContext做查詢,且一直報hive相關錯誤,所以先註釋掉foreach中hive相關的代碼。
val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
.map(r => Tuple2(r.getInt(0), r.getLong(1)))
reportDateAndNums.foreach(println)
reportDateAndNums.foreach(tup => {
val detectDay: String = tup._1.toString
val detectMonth: String = detectDay.substring(0, 6)
println(detectDay,detectMonth)
//val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
//s"where day_id=${detectDay}").collect()(0).getLong(0)
//...
}
}
})
結果: log中沒有輸出兩處foreach中應有的輸出,於是猜測是以下這兩處輸出代碼有問題。
reportDateAndNums.foreach(println)
reportDateAndNums.foreach(tup => {
//.....
println(detectDay,detectMonth)
})
reportDateAndNums可以發現是rdd類型,如果正確遍歷輸出其內容?
- 如果數據量較小
myRDD.collect().foreach(println)
- 如果數據量上10億級,這就不是很友好,應該考慮取少量輸出
myRDD.take(n).foreach(println)
參考:
https://stackoverflow.com/questions/23173488/how-to-print-the-contents-of-rdd
這裏我的reportDateAndNums是分組查詢過來的,並不大,所以採取直接collect.foreach()。
因此猜想在foreach中調用hivecontext.sql一直報錯的原因是不是沒有collect先獲取到數據
嘗試2:
val reportDateAndNums = hiveContext.sql("select report_date, COUNT(*) as nums from table2 GROUP BY report_date")
.map(r => Tuple2(r.getInt(0), r.getLong(1)))
reportDateAndNums.collect().foreach(println)
reportDateAndNums.collect().foreach(tup => {
val detectDay: String = tup._1.toString
val detectMonth: String = detectDay.substring(0, 6)
println(detectDay,detectMonth)
val numsInHive = hiveContext.sql(s"select count(*) from ${databaseName}.${deviceFlowTableName} " +
s"where day_id=${detectDay}").collect()(0).getLong(0)
//...
}
}
})
**結果:**正常運行
參考:
https://stackoverflow.com/questions/23173488/how-to-print-the-contents-of-rdd
https://stackoverflow.com/questions/28804647/why-does-foreach-not-bring-anything-to-the-driver-program
https://stackoverflow.com/questions/41487346/scala-spark-dataframe-convert-rows-to-map-variable