需求背景:提供一批卡號,去提取這批卡號的流水(lzo格式的交易流水)
模擬實現邏輯:
配置文件如下:
#!/bin/bash
hadoop fs -rmr card_match_2/output
hadoop jar ~/koulb/softjar/hadoop-streaming-2.0.0-mr1-cdh4.7.0.jar \
-D map.output.key.field.separator=_ \
-D num.key.fields.for.partition=1 \
-D stream.map.input.ignoreKey=true \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat \
-input ${1} ${2}\
-output card_match_2/output \
-file map.py \
-file red.py \
-mapper "python map.py" \
-reducer "python red.py" \
-jobconf mapred.job.priority=VERY_HIGH \
-jobconf mapred.reduce.tasks=40 \
-jobconf mapred.job.name="card_match"