Apache Hadoop 2.7如何支持讀寫OSS

背景

2017.12.13日Apache Hadoop 3.0.0正式版本發佈,默認支持阿里雲OSS對象存儲系統,作爲Hadoop兼容的文件系統,後續版本號大於等於Hadoop 2.9.x系列也支持OSS。然而,低版本的Apache Hadoop官方不再支持OSS,本文將描述如何通過支持包來使Hadoop 2.7.2能夠讀寫OSS。

如何使用

下面的步驟需要在所有的Hadoop節點執行

下載支持包

http://gosspublic.alicdn.com/hadoop-spark/hadoop-oss-2.7.2.tar.gz

解壓這個支持包,裏面的文件是:

[root@apache hadoop-oss-2.7.2]# ls -lh
總用量 3.1M
-rw-r--r-- 1 root root 3.1M 2月  28 17:01 hadoop-aliyun-2.7.2.jar

這個支持包是根據Hadoop 2.7.2的版本,並打了Apache Hadoop對OSS支持的patch後編譯得到,其他的小版本對OSS的支持後續也將陸續提供。

部署

首先將文件hadoop-aliyun-2.7.2.jar複製到$HADOOP_HOME/share/hadoop/tools/lib/目錄下;

修改​​$HADOOP_HOME/libexec/hadoop-config.sh文件,在文件的327行加下代碼:

CLASSPATH=$CLASSPATH:$TOOL_PATH

修改的目的就是將$HADOOP_HOME/share/hadoop/tools/lib/放到Hadoop的CLASSPATH裏面;下面是修改前後,這個文件的diff供參考(hadoop-config.sh.bak是修改前的文件):

[root@apache hadoop-2.7.2]# diff -C 3 libexec/hadoop-config.sh.bak libexec/hadoop-config.sh
*** libexec/hadoop-config.sh.bak    2019-03-01 10:35:59.629136885 +0800
--- libexec/hadoop-config.sh    2019-02-28 16:33:39.661707800 +0800
***************
*** 325,330 ****
--- 325,332 ----
  CLASSPATH=${CLASSPATH}:$HADOOP_MAPRED_HOME/$MAPRED_DIR'/*'
fi

+ CLASSPATH=$CLASSPATH:$TOOL_PATH
+
# Add the user-specified CLASSPATH via HADOOP_CLASSPATH
# Add it first or last depending on if user has
# set env-var HADOOP_USER_CLASSPATH_FIRST

增加OSS的配置

修改core-site.xml文件,增加如下配置項:

配置項 說明
fs.oss.endpoint 如 oss-cn-zhangjiakou-internal.aliyuncs.com 要連接的endpoint
fs.oss.accessKeyId access key id
fs.oss.accessKeySecret access key secret
fs.oss.impl org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem hadoop oss文件系統實現類,目前固定爲這個
fs.oss.buffer.dir /tmp/oss 臨時文件目錄
fs.oss.connection.secure.enabled false 是否enable https, 根據需要來設置,enable https會影響性能
fs.oss.connection.maximum 2048 與oss的連接數,根據需要設置

相關參數的解釋可以在這裏找到

重啓集羣,驗證讀寫OSS

增加配置後,根據CM提示重啓集羣,重啓後,可以測試

# 測試寫
hadoop fs -mkdir oss://{your-bucket-name}/hadoop-test
# 測試讀
hadoop fs -ls oss://{your-bucket-name}/

運行teragen

[root@apache hadoop-2.7.2]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen -Dmapred.map.tasks=100 10995116 oss://{your-bucket-name}/1G-input
19/02/28 16:38:59 INFO client.RMProxy: Connecting to ResourceManager at apache/192.168.0.176:8032
19/02/28 16:39:01 INFO terasort.TeraSort: Generating 10995116 using 100
19/02/28 16:39:01 INFO mapreduce.JobSubmitter: number of splits:100
19/02/28 16:39:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/02/28 16:39:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551343125387_0001
19/02/28 16:39:02 INFO impl.YarnClientImpl: Submitted application application_1551343125387_0001
19/02/28 16:39:02 INFO mapreduce.Job: The url to track the job: http://apache:8088/proxy/application_1551343125387_0001/
19/02/28 16:39:02 INFO mapreduce.Job: Running job: job_1551343125387_0001
19/02/28 16:39:09 INFO mapreduce.Job: Job job_1551343125387_0001 running in uber mode : false
19/02/28 16:39:09 INFO mapreduce.Job:  map 0% reduce 0%
19/02/28 16:39:18 INFO mapreduce.Job:  map 1% reduce 0%
19/02/28 16:39:19 INFO mapreduce.Job:  map 2% reduce 0%
19/02/28 16:39:21 INFO mapreduce.Job:  map 4% reduce 0%
19/02/28 16:39:25 INFO mapreduce.Job:  map 5% reduce 0%
19/02/28 16:39:28 INFO mapreduce.Job:  map 6% reduce 0%
19/02/28 16:39:29 INFO mapreduce.Job:  map 7% reduce 0%
19/02/28 16:39:31 INFO mapreduce.Job:  map 8% reduce 0%
19/02/28 16:39:33 INFO mapreduce.Job:  map 9% reduce 0%
19/02/28 16:39:36 INFO mapreduce.Job:  map 10% reduce 0%
19/02/28 16:39:38 INFO mapreduce.Job:  map 11% reduce 0%
19/02/28 16:39:42 INFO mapreduce.Job:  map 13% reduce 0%
19/02/28 16:39:45 INFO mapreduce.Job:  map 14% reduce 0%
19/02/28 16:39:48 INFO mapreduce.Job:  map 15% reduce 0%
19/02/28 16:39:49 INFO mapreduce.Job:  map 16% reduce 0%
19/02/28 16:39:50 INFO mapreduce.Job:  map 17% reduce 0%
19/02/28 16:39:54 INFO mapreduce.Job:  map 18% reduce 0%
19/02/28 16:39:57 INFO mapreduce.Job:  map 19% reduce 0%
19/02/28 16:39:58 INFO mapreduce.Job:  map 20% reduce 0%
19/02/28 16:40:00 INFO mapreduce.Job:  map 21% reduce 0%
19/02/28 16:40:01 INFO mapreduce.Job:  map 22% reduce 0%
19/02/28 16:40:04 INFO mapreduce.Job:  map 23% reduce 0%
19/02/28 16:40:06 INFO mapreduce.Job:  map 24% reduce 0%
19/02/28 16:40:08 INFO mapreduce.Job:  map 25% reduce 0%
19/02/28 16:40:10 INFO mapreduce.Job:  map 26% reduce 0%
19/02/28 16:40:13 INFO mapreduce.Job:  map 27% reduce 0%
19/02/28 16:40:15 INFO mapreduce.Job:  map 28% reduce 0%
19/02/28 16:40:17 INFO mapreduce.Job:  map 29% reduce 0%
19/02/28 16:40:19 INFO mapreduce.Job:  map 30% reduce 0%
19/02/28 16:40:21 INFO mapreduce.Job:  map 31% reduce 0%
19/02/28 16:40:23 INFO mapreduce.Job:  map 32% reduce 0%
19/02/28 16:40:27 INFO mapreduce.Job:  map 33% reduce 0%
19/02/28 16:40:28 INFO mapreduce.Job:  map 34% reduce 0%
19/02/28 16:40:30 INFO mapreduce.Job:  map 35% reduce 0%
19/02/28 16:40:32 INFO mapreduce.Job:  map 36% reduce 0%
19/02/28 16:40:36 INFO mapreduce.Job:  map 37% reduce 0%
19/02/28 16:40:37 INFO mapreduce.Job:  map 38% reduce 0%
19/02/28 16:40:38 INFO mapreduce.Job:  map 39% reduce 0%
19/02/28 16:40:40 INFO mapreduce.Job:  map 40% reduce 0%
19/02/28 16:40:44 INFO mapreduce.Job:  map 41% reduce 0%
19/02/28 16:40:46 INFO mapreduce.Job:  map 42% reduce 0%
19/02/28 16:40:47 INFO mapreduce.Job:  map 43% reduce 0%
19/02/28 16:40:49 INFO mapreduce.Job:  map 44% reduce 0%
19/02/28 16:40:52 INFO mapreduce.Job:  map 45% reduce 0%
19/02/28 16:40:55 INFO mapreduce.Job:  map 46% reduce 0%
19/02/28 16:40:56 INFO mapreduce.Job:  map 47% reduce 0%
19/02/28 16:40:57 INFO mapreduce.Job:  map 48% reduce 0%
19/02/28 16:41:00 INFO mapreduce.Job:  map 49% reduce 0%
19/02/28 16:41:04 INFO mapreduce.Job:  map 50% reduce 0%
19/02/28 16:41:05 INFO mapreduce.Job:  map 51% reduce 0%
19/02/28 16:41:06 INFO mapreduce.Job:  map 52% reduce 0%
19/02/28 16:41:08 INFO mapreduce.Job:  map 53% reduce 0%
19/02/28 16:41:12 INFO mapreduce.Job:  map 54% reduce 0%
19/02/28 16:41:14 INFO mapreduce.Job:  map 55% reduce 0%
19/02/28 16:41:16 INFO mapreduce.Job:  map 56% reduce 0%
19/02/28 16:41:17 INFO mapreduce.Job:  map 57% reduce 0%
19/02/28 16:41:21 INFO mapreduce.Job:  map 58% reduce 0%
19/02/28 16:41:23 INFO mapreduce.Job:  map 59% reduce 0%
19/02/28 16:41:25 INFO mapreduce.Job:  map 60% reduce 0%
19/02/28 16:41:27 INFO mapreduce.Job:  map 61% reduce 0%
19/02/28 16:41:29 INFO mapreduce.Job:  map 62% reduce 0%
19/02/28 16:41:31 INFO mapreduce.Job:  map 63% reduce 0%
19/02/28 16:41:34 INFO mapreduce.Job:  map 64% reduce 0%
19/02/28 16:41:37 INFO mapreduce.Job:  map 65% reduce 0%
19/02/28 16:41:38 INFO mapreduce.Job:  map 66% reduce 0%
19/02/28 16:41:39 INFO mapreduce.Job:  map 67% reduce 0%
19/02/28 16:41:42 INFO mapreduce.Job:  map 68% reduce 0%
19/02/28 16:41:45 INFO mapreduce.Job:  map 69% reduce 0%
19/02/28 16:41:46 INFO mapreduce.Job:  map 70% reduce 0%
19/02/28 16:41:48 INFO mapreduce.Job:  map 71% reduce 0%
19/02/28 16:41:50 INFO mapreduce.Job:  map 72% reduce 0%
19/02/28 16:41:53 INFO mapreduce.Job:  map 73% reduce 0%
19/02/28 16:41:54 INFO mapreduce.Job:  map 74% reduce 0%
19/02/28 16:41:56 INFO mapreduce.Job:  map 75% reduce 0%
19/02/28 16:41:58 INFO mapreduce.Job:  map 76% reduce 0%
19/02/28 16:42:00 INFO mapreduce.Job:  map 77% reduce 0%
19/02/28 16:42:04 INFO mapreduce.Job:  map 78% reduce 0%
19/02/28 16:42:05 INFO mapreduce.Job:  map 79% reduce 0%
19/02/28 16:42:08 INFO mapreduce.Job:  map 80% reduce 0%
19/02/28 16:42:10 INFO mapreduce.Job:  map 81% reduce 0%
19/02/28 16:42:12 INFO mapreduce.Job:  map 82% reduce 0%
19/02/28 16:42:14 INFO mapreduce.Job:  map 83% reduce 0%
19/02/28 16:42:17 INFO mapreduce.Job:  map 84% reduce 0%
19/02/28 16:42:18 INFO mapreduce.Job:  map 85% reduce 0%
19/02/28 16:42:19 INFO mapreduce.Job:  map 86% reduce 0%
19/02/28 16:42:21 INFO mapreduce.Job:  map 87% reduce 0%
19/02/28 16:42:25 INFO mapreduce.Job:  map 88% reduce 0%
19/02/28 16:42:27 INFO mapreduce.Job:  map 89% reduce 0%
19/02/28 16:42:28 INFO mapreduce.Job:  map 90% reduce 0%
19/02/28 16:42:29 INFO mapreduce.Job:  map 91% reduce 0%
19/02/28 16:42:33 INFO mapreduce.Job:  map 92% reduce 0%
19/02/28 16:42:35 INFO mapreduce.Job:  map 93% reduce 0%
19/02/28 16:42:36 INFO mapreduce.Job:  map 94% reduce 0%
19/02/28 16:42:38 INFO mapreduce.Job:  map 95% reduce 0%
19/02/28 16:42:41 INFO mapreduce.Job:  map 96% reduce 0%
19/02/28 16:42:44 INFO mapreduce.Job:  map 97% reduce 0%
19/02/28 16:42:45 INFO mapreduce.Job:  map 98% reduce 0%
19/02/28 16:42:46 INFO mapreduce.Job:  map 99% reduce 0%
19/02/28 16:42:48 INFO mapreduce.Job:  map 100% reduce 0%
19/02/28 16:43:11 INFO mapreduce.Job: Job job_1551343125387_0001 completed successfully
19/02/28 16:43:12 INFO mapreduce.Job: Counters: 37
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=11931190
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=8497
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=100
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
        OSS: Number of bytes read=0
        OSS: Number of bytes written=1099511600
        OSS: Number of read operations=1100
        OSS: Number of large read operations=0
        OSS: Number of write operations=500
    Job Counters
        Killed map tasks=1
        Launched map tasks=100
        Other local map tasks=100
        Total time spent by all maps in occupied slots (ms)=704048
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=704048
        Total vcore-milliseconds taken by all map tasks=704048
        Total megabyte-milliseconds taken by all map tasks=720945152
    Map-Reduce Framework
        Map input records=10995116
        Map output records=10995116
        Input split bytes=8497
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=22387
        CPU time spent (ms)=224980
        Physical memory (bytes) snapshot=19855642624
        Virtual memory (bytes) snapshot=212926672896
        Total committed heap usage (bytes)=11358175232
    org.apache.hadoop.examples.terasort.TeraGen$Counters
        CHECKSUM=23608744984763050
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=1099511600

參考鏈接

https://yq.aliyun.com/articles/292792?spm=a2c4e.11155435.0.0.7ccba82fbDwfhK

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章