問題:
SparkStreaming處理實時數據將統計結果寫入mongo,用mongo-java的api需要做一層判斷即對某個維度進行查找如果存在則把指標更新,如果不存在則插入維度與指標字段,這種方式耗時效率低下
換用mongo-scala的api使用其upsert方式實現插入與跟新,需要query的字段需在mongo中建立索引
/**
* Performs an update operation.
* @param q search query for old object to update
* @param o object with which to update `q`
*/
def update[A, B](q: A, o: B, upsert: Boolean = false, multi: Boolean = false,
concern: com.mongodb.WriteConcern = this.writeConcern,
bypassDocumentValidation: Option[Boolean] = None)(implicit queryView: A => DBObject, objView: B => DBObject,
encoder: DBEncoder = customEncoderFactory.map(_.create).orNull): WriteResult = {
bypassDocumentValidation match {
case None => underlying.update(queryView(q), objView(o), upsert, multi, concern, encoder)
case Some(bypassValidation) => underlying.update(queryView(q), objView(o), upsert, multi, concern, bypassValidation, encoder)
}
}
添加依賴:
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>casbah_2.11</artifactId>
<version>3.1.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
注:pom中和需要去掉java-mongo-driver的依賴,否則衝突