第一步:從maven中下連接mysql的jar包
第二步:spark2-shell --jars mysql-connector-java-8.0.15.jar
第三步:
// scala 版
val df = spark.read.format("jdbc").option("url", "jdbc:mysql://rr-bp1d22ltxgwa09g44720.mysql.rds.aliyuncs.com/"+dbname+"?useUnicode=true&characterEncoding=UTF-8").option("driver", "com.mysql.jdbc.Driver").option("fetchsize", 1000).option("numPartitions", 2).option("dbtable", "(select * from " + tablename + ") as t").option("user", "用戶名").option("password", "密碼").load()
df.write.mode("Overwrite").saveAsTable("寫入hive的表名")
如果要同步很多,將上述的代碼封裝成一個函數,然後做for循環就好了!!
Hive1.1版本不支持Date數據類型,所以遇到這個情況,先把Date類型轉換爲String類型,我這邊用最笨的方法,構建HSQL來進行轉換
# scala 版本
var columns=df.columns.toBuffer
val dateTypecolumns=Array("last_biz_date","final_repayment_day","principal_settled_day","value_date")
columns--=dateTypecolumns
val temp="CAST(last_biz_date AS STRING), CAST(final_repayment_day AS STRING), CAST(principal_settled_day AS STRING), CAST(value_date AS STRING)"
val temp2=temp+','+columns.mkString(",")
def get_columns(x:String)={
val df=spark.sql(s"select $x from df")
df
}
get_columns(temp2).write.mode("Overwrite").saveAsTable("hive表名")