本地配置了阿里雲的hdfs 地址
利用python pyspark連不上
報錯
嘗試找到正確的地址和端口號
hdfs getconf -confKey fs.default.name
# hdfs getconf -confKey fs.default.name
2020-06-17 14:59:51,762 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
dfs://****.cn-beijing.dfs.aliyuncs.com:10290
歸根結底
之前是端口錯誤
dirpath='dfs://******.cn-beijing.dfs.aliyuncs.com:10290/user/root/test/test.txt'
sc=spark.sparkContext
textFile = sc.textFile(dirpath)
textFile.first()
Out[16]:
'What is Big Data?'