Spark - 使用yarn client模式

SparkConf

如果這樣寫

new SparkConf().setMaster("yarn-client")

在idea內調試會報錯:

Exception in thread "main" java.lang.IllegalStateException: Library directory '....../data-platform-task/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
	at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)

查看Spark官方文檔,需要設置spark.yarn.jars或者spark.yarn.archive。

  • spark.yarn.jars:支持本地jar,也支持hdfs路徑。
  • spark.yarn.archive:壓縮包。

修改程序:

new SparkConf().setMaster("yarn-client")
	.set("spark.yarn.archive", getProperty(HDFS_SPARK_ARCHIVE))

在idea內調試,報錯:

Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$334 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)

這是因爲找不到任務依賴的類。
繼續查閱文檔,有

  • spark.yarn.dist.jars:逗號分隔的jar包。

繼續修改程序:

new SparkConf().setMaster("yarn-client")
	.set("spark.yarn.archive", getProperty(HDFS_SPARK_ARCHIVE))
	.set("spark.yarn.dist.jars", getProperty(TASK_JARS))

調試,可以執行了。

19/06/27 12:53:12 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
19/06/27 12:53:12 INFO yarn.YarnAllocator: Submitted 2 unlocalized container requests.
19/06/27 12:53:12 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
19/06/27 12:53:12 INFO impl.AMRMClientImpl: Received new token for : leishu-OptiPlex-7060:39105
19/06/27 12:53:12 INFO yarn.YarnAllocator: Launching container container_1561543784696_0031_01_000002 on host leishu-OptiPlex-7060 for executor with ID 1
19/06/27 12:53:13 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: Opening proxy : leishu-OptiPlex-7060:39105
19/06/27 12:53:13 INFO yarn.YarnAllocator: Launching container container_1561543784696_0031_01_000003 on host leishu-OptiPlex-7060 for executor with ID 2
19/06/27 12:53:13 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: Opening proxy : leishu-OptiPlex-7060:39105
19/06/27 12:53:16 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them.
19/06/27 12:53:18 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
19/06/27 12:53:18 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 172.16.209.105:33251
19/06/27 12:53:18 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 172.16.209.105:33251
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
19/06/27 12:53:18 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Deleting staging directory file:/home/.../.sparkStaging/application_1561543784696_0031
19/06/27 12:53:18 INFO util.ShutdownHookManager: Shutdown hook called

上傳文件到hdfs

其中,對於spark.yarn.archive參數,我把需要的jar包(/spark-2.4.3-bin-hadoop2.7/jars目錄下的全部文件)壓縮成一個zip文件,上傳到了hdfs。
使用使用代碼,實現上傳到hdfs的功能:

public class SparkJar2Hdfs {

    public static void main(String[] args) throws Exception {

        //要上傳的源文件所在路徑
        Path src = new Path(getProperty(SPARK_JARS_ZIP));

        Path dst = new Path(getProperty(HDFS_SPARK_JARS_PATH));

        removeDir(dst);

        if (createDir(dst) && uploadPath(src, dst)) {
            listStatus(dst);
        }
    }

    private static FileSystem getCorSys() {
        FileSystem coreSys = null;
        Configuration conf = new Configuration();
        try {
            return FileSystem.get(URI.create(getProperty(HDFS_SPARK_ROOT)), conf);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return coreSys;
    }


    //創建目錄
    private static boolean createDir(Path path) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.exists(path)) {
                return true;
            } else {
                return coreSys.mkdirs(path);
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //刪除目錄
    private static boolean removeDir(Path path) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.exists(path)) {
                return true;
            } else {
                return coreSys.delete(path, true);
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //文件上傳
    private static boolean uploadPath(Path srcPath, Path desPath) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.isDirectory(desPath)) {
                coreSys.copyFromLocalFile(srcPath, desPath);
                return true;
            } else {
                throw new IOException("desPath is not exist");
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //文件列表
    private static void listStatus(Path desPath) {
        try (FileSystem coreSys = getCorSys()) {
            FileStatus files[] = coreSys.listStatus(desPath);
            for (int i = 0; i < files.length; i++) {
                System.out.println(files[i].getPath());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

執行,會輸出文件URL。

Connected to the target VM, address: '127.0.0.1:39539', transport: 'socket'
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
hdfs://localhost:9000/user/.../spark-libs/spark-2.4.3-hadoop2.7.7.zip
Disconnected from the target VM, address: '127.0.0.1:39539', transport: 'socket'

Process finished with exit code 0

maven-shade

對於spark.yarn.dist.jars參數,可以使用maven-shade-plugin:

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <configuration>
                    <shadedArtifactAttached>false</shadedArtifactAttached>
                    <outputFile>${project.build.directory}/shaded/data-platform-task-${project.version}-shaded.jar
                    </outputFile>
                    <artifactSet>
                        <includes>
                            <include>com.alibaba:druid</include>
                            <include>com.aliyun:emr-core</include>
                            <include>com.google.inject:guice</include>
                            <include>log4j:log4j</include>
                            <include>org.postgresql:postgresql</include>
                            <include>org.slf4j:slf4j-api</include>
                            <include>org.slf4j:slf4j-log4j12</include>
                            <include>org.projectlombok.lombok</include>
                            <include>org.springframework:spring-jdbc</include>
                        </includes>
                    </artifactSet>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

這樣,任務所在工程的類,以及依賴的全部第三方jar包可以打成一個jar包。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章