分佈式搭建
下載所需的安裝包,建立相應的文件夾,將安裝包發送到對應的安裝包中去。
安裝JDK
1. 解壓安裝包
2. cd 進入解壓後的文件夾,pwd獲取JAVA_HOME路徑
/home/pangying/java/jdk1.8.0_151
3. 配置環境變量
4. 使配置生效
5. 檢查是否配置成功
安裝Hadoop
1. 解壓安裝包
2. cd進入解壓後的文件夾,pwd獲取hadoop路徑
3. 配置hadoop環境變量
4. 使配置生效
5. 配置hadoop的配置文件
6. 修改hadoop-env.sh將原來的JAVA_HOME配置替換掉
7. 修改core-site.xml文件
8. 修改hdfs-site.xml
9. 修改mapred-site.xml,先複製一份mapred-site.xml.template模板,並重命名爲mapred-site.xml
10. 配置yarn-site.xml
11. 配置集羣中的datanode節點:
12. 格式化文件系統:
檢查是否安裝成功的一個較爲簡單但是不完全可靠的方法是,訪問瀏覽器master對應IP的8088和50070端口,查看活躍節點是否對應你所配的節點數。
安裝Spark
1. 解壓安裝包
2. 進入解壓後的安裝包,pwd獲取安裝路徑
/home/pangying/spark/scala-2.12.4
3. 解壓spark的安裝包,同樣獲取其安裝路徑:
/home/pangying/spark/spark-2.1.0-bin-hadoop2.7
4. 配置環境變量:
5. 使配置生效
6. 進入spark的conf文件夾下配置spark-env.sh文件
7. 複製模板並重命名
8. 配置spark-defaults.conf
9. 配置slaves
同樣的,最簡單但是不可靠的檢測是否安裝成功的方式是瀏覽器訪問master節點對應IP的8085端口(如果你在spark-env.sh配了這個端口的話)
注意:上述所有的安裝與配置在所有節點中的操作都是一樣的!
說明:本文省略了ssh免密登錄配置,和hostname主機名的配置,有需要的話可以參考[大數據]-hadoop2.8和spark2.1完全分佈式搭建這篇文章,寫的很詳細,我最初也是參考着它做的。
問題與解決方案
問題1: Unable to load native-hadoop library for yourplatform... using builtin-java
解決方案:
網上很多解決方法都是說hadoop編譯的版本和Java編輯的版本或者系統位數不相符,應該自己去編譯的一個64位的hadoop庫,可是我直覺並檢查我的版本沒有問題,都是64位的,於是我使用個下述方法解決了這個問題:
在文件hadoop-env.sh中增加:
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
問題2:inator stopped!
18/01/03 17:47:13 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.net.ConnectException: Call From master/192.168.217.128 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 34 more
解決方案:
將spark-default.conf文件中的spark.eventLog.enabled 改成false
問題3:焦頭爛額,解決中。。。
參考文獻:提交spark sample作業失敗