spark2.4.2編譯（mac系統下）

原創

夜下探戈

2019-05-03 08:58

編譯前所注意事項：

首先，儘可能閱讀官網編譯文檔 Building Apache Spark
源碼下載推薦git clone 或者 wget 。
編譯前確保網絡良好。

下載所需要的軟件（注意版本）

· Spark-2.4.2.tgz
· Hadoop-2.7.6
· Scala-2.11.12
· jdk1.8.0_191
· apache-maven-3.6.x
· git
注意：其中spark是源碼，其他是可運行包

解壓安裝並配置環境變量（過程略）

配置完，注意測試。其中，maven配置本地庫，鏡像地址設置爲阿里雲地址。

# 創建本地倉庫文件夾
mkdir ~/maven_repo
# 修改settings.xml文件
vim $MAVEN_HOME/conf/settings.xml

部分代碼：

<!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ${user.home}/.m2/repository
  <localRepository>/path/to/local/repo</localRepository>
  -->
<localRepository>/home/max/maven_repo</localRepository>

<mirrors>
    <mirror>
    <id>nexus-aliyun</id>
    <mirrorOf>*,!cloudera</mirrorOf>
    <name>Nexus aliyun</name>                     
    <url>
      http://maven.aliyun.com/nexus/content/groups/public
    </url>
</mirror>

修改腳本make-distribution.sh

編譯不使用mvn這個命令,直接用make-distribution.sh腳本，但是需要修改該腳本

#spark-2.4.2文件夾下
vim ./dev/make-distribution.sh

#將這些行註釋掉    此處爲最佳實踐，爲的是通過指定版本號減少編譯時間
#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | grep -v "WARNING"\
#    | tail -n 1)
#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | grep -v "WARNING"\
#    | tail -n 1)
#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | grep -v "WARNING"\
#    | tail -n 1)
#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | grep -v "WARNING"\
#    | fgrep --count "<id>hive</id>";\
#    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#    # because we use "set -o pipefail"
#    echo -n)

##添加一下參數，注意，版本號要對應自己想要的生產環境
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=hadoop-2.6.0-cdh5.14.0
SPARK_HIVE=1

修改源碼包spark-2.4.2下的pom.xml

<repositories>
    <!--<repositories>
     This should be at top, it makes maven try the central repo first and then others
and hence faster dep resolution
    <repository>
        <id>central</id>
        <name>Maven Repository</name>
        <url>https://repo.maven.apache.org/maven2</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
-->
    <repository>
        <id>central</id>
        <url>http://maven.aliyun.com/nexus/content/groups/public//</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>always</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </snapshots>
    </repository>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

開始編譯

./dev/make-distribution.sh \
--name hadoop-2.6.0-cdh5.14.0  \
--tgz \
-Phadoop-2.6 \
-Dhadoop.version=2.6.0-cdh5.14.0 \
-Phive -Phive-thriftserver  \
-Pyarn \
-Pkubernetes

編譯大概需要半小時以上，耐心等待就行。編譯過程中如果報錯，一般有error字樣。
出現以下字樣，代表編譯完成：

編譯後包所在位置，源碼包spark-2.4.2根目錄下：

至此，編譯完！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spark2.4.2編譯（mac系統下）

編譯前所注意事項：

下載所需要的軟件（注意版本）

解壓安裝並配置環境變量（過程略）

修改腳本make-distribution.sh

修改源碼包spark-2.4.2下的pom.xml

開始編譯

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

Java應用定製工廠--jar轉exe製作工具

bat文件直接進入某個盤符目錄

linux中時間服務器同步問題：ntpdate[5426]: the NTP socket is in use, exiting

linux 下打開162端口命令

細說業務邏輯

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結