本地小數據量測試了一下Spark的LogisticRegressionWithSGD算法,效果不盡如人意。
數據樣例如下,豎槓前的0,1代表兩種類型,後面逗號隔開的是兩個特徵,兩個特徵只要有一個大於等於0.6就會被分爲1這一類,否則就是0。
1|0.3,0.6 0|0.2,0.1 1|0.5,0.6 1|0.8,0.3 0|0.4,0.3 0|0.3,0.4 0|0.3,0.1 0|0.3,0.2 0|0.1,0.4 1|0.3,0.7 1|0.8,0.2 1|0.9,0.1 0|0.2,0.1 0|0.25,0.11
代碼如下:
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.{SparkConf, SparkContext} object TestLogisticsAlgorithm { def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setMaster("local").setAppName("test").set("spark.testing.memory", "2147480000") val sparkContext = new SparkContext(sparkConf) val trainData = sparkContext.textFile("file:///D:\\var\\11.txt") val modelData = trainData.map(line => { println(line) val tmpData = line.split("\\|") //val tmpV:Vector= LabeledPoint(tmpData(0).toDouble,Vectors.dense(tmpData(1).split("\\,").map(_.toDouble))) }).cache() val model = LogisticRegressionWithSGD.train(modelData, 200) val predictData = Vectors.dense(0.01, 0.1) val result = model.predict(predictData) println(result) } }
輸出效果爲1,理想效果應該是0,如下圖:
關注公衆號瞭解更多: