apache sedona 多邊形有沒有關閉場景

數據源data/geo.csv:



代碼:
// 創建 SparkConf 對象
		SparkConf conf = new SparkConf().setAppName("GeoSparkExample").setMaster("local[*]");

		// 創建 SparkSession
		SparkSession spark = SparkSession.builder().config(conf).getOrCreate();

		// 註冊 SedonaSQL 函數
		SedonaSQLRegistrator.registerAll(spark);

		Dataset<Row> rawDf = spark.read().format("csv").option("header", "true") // 指定第一行作爲列名
				.option("inferSchema", "true") // 推斷列的數據類型
				.option("delimiter", ",") // 指定列分隔符,默認爲逗號
				// 最後一條數據多邊形沒有關閉
				.load("data/geo.csv");// 文件位置

		rawDf.createOrReplaceTempView("rawdf");
		rawDf.show();
		System.out.println("======================================");

		spark.sql("DESCRIBE rawdf").show();

		spark.sql("DESCRIBE rawdf polygon").show();

		// 將字符串類型的字段轉換爲 Sedona 的 Geometry 類型
		Dataset<Row> result = spark.sql("SELECT ST_GeomFromWKT(regexp_replace(polygon, '\"', '')) AS geo FROM rawdf");

		result.createOrReplaceTempView("test");
		// 顯示查詢結果
		result.show();
		
		


// 將字符串類型的字段轉換爲 Sedona 的 Geometry 類型
		Dataset<Row> result = spark.sql("SELECT ST_GeomFromWKT(regexp_replace(polygon, '\"', '')) AS geo FROM rawdf");

		result.createOrReplaceTempView("test");
		
		

23/07/21 15:57:02 ERROR FormatUtils: [Sedona] Points of LinearRing do not form a closed linestring
23/07/21 15:57:02 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 4)
java.lang.NullPointerException
	at org.apache.sedona.sql.utils.GeometrySerializer$.getDimension(GeometrySerializer.scala:53)
	at org.apache.sedona.sql.utils.GeometrySerializer$.serialize(GeometrySerializer.scala:37)
	at org.apache.spark.sql.sedona_sql.expressions.implicits$GeometryEnhancer.toGenericArrayData(implicits.scala:79)
	at org.apache.spark.sql.sedona_sql.expressions.ST_GeomFromWKT.eval(Constructors.scala:182)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:256)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:858)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:858)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)		
	
	
	根據我的經驗分析:因爲原始多邊形範圍收尾沒有關閉導致了這個異常,那麼怎麼通過ST_IsValid方法來過濾掉這種數據了?讓程序不要報錯?
	
	答案:ST_IsClosed(ST_GeomFromText(t.pg1))
	https://sedona.apache.org/1.3.1-incubating/api/flink/Function/#st_isclosed

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章