問題的提出

正常情況下，Mapreduce的保障之一就是送到Reducer端的數據總是根據Reducer的輸入鍵進行排序的，如果我們使用單個Reducer，排序就會直接了當，但是隻是使用一個Reducer的情況少之又少，如果使用了多個Reducer，那麼就只可能會保證每一個Reducer內的內容是會根據鍵進行排序的，而不會保證Reducder之間也是有序的，就會出現下面這種情況：
reducer1：

全排序的問題解決

全排序的技巧包含在Partitioner的實現中，我們需要將鍵的取值範圍轉換爲一個索引（0-25），例如這裏的鍵就是所有的英文單詞，不過我們需要得出劃分幾個索引範圍，然後這些索引分配給相應的reducer

解決

這裏假如我們可以分配的reducer的數量是2，那麼我們就可以直接將（0-12）分配給第一個reducer，將（13-25）分配給另一個reducer
注意這裏我們是隻根據第一個字母進行索引化分的情況，但是假如我們現在有30個reducer，我們如果還是隻根據首字母確定索引的取值範圍就會有點問題，會造成有4個reduce被浪費掉了，此時我們就需要重新確定索引的範圍以及索引的計算方式，例如我們可以使用0+26的0次方+26的0次方+0表示aa，1+26的0次方+26的0次方+0表示ab，依次類推，然後將前30分支一的索引的範圍分配給reducer0，接着的30分之一分配給reducer1，等等，如果不能整除的話，我們可以讓剩下多的交給最後一個reducer，但是這不是最好的方案，因爲這樣可能會造成最後一個reducer被分配到的數據過多，影響這個task的性能，最好的做法應該是：
假如我們現在的索引的範圍是（0，82），分配給30個reducer，那麼每一個費配到的應該是83/30=2個，按照最原始的想法，那麼reducer29需要處理的是（59，82）這麼大的索引範圍內的數據，這顯然是不科學的，我們需要將後面沒有分配到的22個一次再分配給reducer0到reducer21

接下來的一個問題是：
我們可能需要動態的指定reducer的輸入鍵的索引的範圍，這裏我們需要將我們的partitioner實現Configurable接口，因爲在初始化的過程中，hadoop框架就會加載我們自定義的Partitioner實例，當hadoop框架通過反射機制實例化這個類的時候，它就會檢查這個類型是不是Configurable實例，如果是的話，就會調用setConf，將作業的Configuration對象設置過來，我們就可以在Partitioner中獲取到配置的變量了

java代碼的實現

public class GlobalSort {
    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();

        configuration.set("key.indexRange","26");

        Job job = Job.getInstance(configuration);
        job.setNumReduceTasks(2);
        job.setJarByClass(GlobalSort.class);
        job.setMapperClass(GlobalSortMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setReducerClass(GlobalSortReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        job.setPartitionerClass(GlobalSortPartitioner.class);

        FileInputFormat.setInputPaths(job,new Path("F:\\wc\\input"));
        FileOutputFormat.setOutputPath(job,new Path("F:\\wc\\output"));
        job.waitForCompletion(true);
    }
}

class GlobalSortMapper extends Mapper<LongWritable,Text,Text,LongWritable>{
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //value是獲取的一行的數據的內容，此處可以split
        String[] splits = value.toString().split(" ");
        for(String str : splits){
            context.write(new Text(str), new LongWritable(1L));
        }
    }
}


class GlobalSortPartitioner  extends Partitioner<Text,LongWritable> implements Configurable {

    private Configuration configuration = null;
    private int indexRange = 0;

    public int getPartition(Text text, LongWritable longWritable, int numPartitions) {
        //假如取值範圍等於26的話，那麼就意味着只需要根據第一個字母來劃分索引
        int index = 0;
        if(indexRange==26){
            index = text.toString().toCharArray()[0]-'a';
        }else if(indexRange == 26*26 ){
            //這裏就是需要根據前兩個字母進行劃分索引了
            char[] chars = text.toString().toCharArray();
            if (chars.length==1){
                index = (chars[0]-'a')*26;
            }
            index = (chars[0]-'a')*26+(chars[1]-'a');
        }
        int perReducerCount = indexRange/numPartitions;
        if(indexRange<numPartitions){
            return numPartitions;
        }

        for(int i = 0;i<numPartitions;i++){
            int min = i*perReducerCount;
            int max = (i+1)*perReducerCount-1;
            if(index>=min && index<=max){
                return i;
            }
        }
        //這裏我們採用的是第一種不太科學的方法
        return numPartitions-1;

    }

    public void setConf(Configuration conf) {
        this.configuration = conf;
        indexRange = configuration.getInt("key.indexRange",26*26);
    }

    public Configuration getConf() {
        return configuration;
    }
}

class GlobalSortReducer extends Reducer<Text,LongWritable,Text,LongWritable>{
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        long count = 0;
        for(LongWritable value : values){
            count += value.get();
        }
        context.write(key, new LongWritable(count));
    }
}

輸入的文件：

hello a
hello abc
helo jyw
he lq
mo no m n
zz za

輸出的結果：

part-r-00000
a   1
abc 1
he  1
hello   2
helo    1
jyw 1
lq  1
m   1
mo  1

part-r-00001
n   1
no  1
za  1
zz  1

怎麼確定索引的範圍

列出鍵的所有的可能取的值，這個種類就是索引的個數
如果鍵的可能的取值是無窮盡的，那麼就應該像本例一樣，尋找出鍵的某一部分的所有的可能的取值（在排序山是不同的）

總結

step1：在WritableComparable鍵中實現排序邏輯，或者寫一個自定義的Comparator，實現compareTo方法，實現排序的比大小的任務
step2：定義一個方法將Reducer實例轉換爲一個索引值
step3：實現一個自定義的Partitioner
應該清楚整個Reducer鍵的索引範圍
利用鍵的索引將實例分配給相應的Reducer

MapReduce的全排序

問題的提出

全排序的問題解決

解決

java代碼的實現

怎麼確定索引的範圍

總結

使用MapReduce求解join問題

JDK併發包中的線程池（一）

使用MapReduce實現尋找共同好友的案例

MapReduce的GroupComparator

運行MR程序的方式

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結