hadoop streaming方式下的參數傳遞

In hadoop streaming, when run a map/reduce job, we may want to get some running parameter to known the statues of job. Many thess parameters, configuration and running parameters of job can be obtained from os.environ in python, i.e., the name of file input split, the job id of mapred tasks

os.environ is the dictionary to store the environment variables, Hadoop
will pass the parameter to each task of map/reduce by setting the
environment variable on each host

In map/reduce step, we can use the function of os.environ.get()

Also, you can pass the parameters of configuration explictly to your script, i.e., -mapper "python map.py -i 1"

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章