一：問題現象：

在spark on yarn 提交任務是，提示如下：

WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

INFO yarn.Client: Uploading resource file:/tmp/spark-27a2d9ca-106c-4f4a-baff-c96ef5081c51/__spark_libs__808575299793112451.zip -> hdfs://weizhonggui/user/hadoop/.sparkStaging/application_1543886353459_0001/__spark_libs__808575299793112451.zip
18/12/24 23:32:36 INFO yarn.Client: Uploading resource file:/tmp/spark-27a2d9ca-106c-4f4a-baff-c96ef5081c51/__spark_conf__4031622468796062240.zip -> hdfs://weizhonggui/user/hadoop/.sparkStaging/application_1543886353459_0001/spark_conf.zip

二：原因分析：

在[https://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties]裏有解釋：

To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars. For details please refer to Spark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified, Spark will create a zip file with all jars under $SPARK_HOME/jars and upload it to the distributed cache.

繼續查看具體的 Spark Properties：
spark.yarn.jars：none ：List of libraries containing Spark code to distribute to YARN containers. By default, Spark on YARN will use Spark jars installed locally, but the Spark jars can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn’t need to be distributed each time an application runs. To point to jars on HDFS, for example, set this configuration to hdfs:///some/path. Globs are allowed.

spark.yarn.archive：An archive containing needed Spark jars for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application’s containers. The archive should contain jar files in its root directory. Like with the previous option, the archive can also be hosted on HDFS to speed up file distribution.

就是在默認情況：Spark on YARN要用Spark jars（默認就在Spark安裝目錄），但這個jars也可以再HDFS任何可以讀到的地方，這樣就方便每次應用程序跑的時候在節點上可以Cache，這樣就不用上傳這些jars，

三：處理過程：

3.1.創建 archive: jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
3.2.上傳jar包到 HDFS: hdfs dfs -put spark-libs.jar /system/SparkJars/jar
hdfs下創建目錄：hdfs dfs -mkdir -p /system/SparkJars/jar
3.3. 在spark-default.conf中設置 spark.yarn.archive=hdfs:///system/SparkJars/jar/spark-libs.jar

四：問題總結：

這是SPARK on YARN,調優的一個手段，節約每個NODE上傳JAR到HDFS的時間，可通過具體情況查看：

十：WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set,解決案例

一：問題現象：

二：原因分析：

三：處理過程：

四：問題總結：

python gdal 安裝使用（Windows， python 3.6.8）

二十六：Spark-submit提交一個WC任務源碼解析：

二十五：Caused by: java.net.UnknownHostException: nameservice1(CDH 5.14 部署spark2第一次運行報錯)

二十四：RDD源碼分析

二十二：Flume+kafka+spark日誌採集故障分析

二十一：CDH5.14離線安裝Apache Spark 2

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結