spark-submit運行sparkml 程序

編寫test1.py 文件內容如下:

from pyspark.ml.linalg import Vectors
from pyspark.ml.classification import LogisticRegression
from pyspark import SparkContext, SparkConf, SQLContext
# Prepare training data from a list of (label, features) tuples.  
# spark = SparkSession.builder().appName("Spark Hive Example").master("local[*]").config("spark.sql.warehouse.dir", "file:${system:user.dir}/spark-warehouse").enableHiveSupport().getOrCreate();

// 需要新加的上下文
conf = SparkConf().setAppName("MyFirstApp").set('spark.port.maxRetries', '100')
sc = SparkContext(conf=conf)
spark = SQLContext(sc)

training = spark.createDataFrame([
    (1.0, Vectors.dense([0.0, 1.1, 0.1])),
    (0.0, Vectors.dense([2.0, 1.0, -1.0])),
    (0.0, Vectors.dense([2.0, 1.3, 1.0])),
    (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])

# Create a LogisticRegression instance. This instance is an Estimator.  
lr = LogisticRegression(maxIter=10, regParam=0.01)
# Print out the parameters, documentation, and any default values.  
print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
model1 = lr.fit(training)
print "Model 1 was fit using parameters: "
print model1.extractParamMap()

# We may alternatively specify parameters using a Python dictionary as a paramMap  
paramMap = {lr.maxIter: 20}
paramMap[lr.maxIter] = 30  # Specify 1 Param, overwriting the original maxIter.  
paramMap.update({lr.regParam: 0.1, lr.threshold: 0.55})  # Specify multiple Params.  

# You can combine paramMaps, which are python dictionaries.  
paramMap2 = {lr.probabilityCol: "myProbability"}  # Change output column name  
paramMapCombined = paramMap.copy()
paramMapCombined.update(paramMap2)

# Now learn a new model using the paramMapCombined parameters.  
# paramMapCombined overrides all parameters set earlier via lr.set* methods.  
model2 = lr.fit(training, paramMapCombined)
print "Model 2 was fit using parameters: "
print model2.extractParamMap()

# Prepare test data  
test = spark.createDataFrame([
    (1.0, Vectors.dense([-1.0, 1.5, 1.3])),
    (0.0, Vectors.dense([3.0, 2.0, -0.1])),
    (1.0, Vectors.dense([0.0, 2.2, -1.5]))], ["label", "features"])

# Make predictions on test data using the Transformer.transform() method.  
# LogisticRegression.transform will only use the 'features' column.  
# Note that model2.transform() outputs a "myProbability" column instead of the usual  
# 'probability' column since we renamed the lr.probabilityCol parameter previously.  
prediction = model2.transform(test)
selected = prediction.select("features", "label", "myProbability", "prediction")
for row in selected.collect():
    print row            

用下面的命令進行提交:

     spark-submit  --master spark://master:7077 test1.py

提交之後結果如下:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章