Hadoop中Partitioner解析

原創

s20082043

2020-02-26 06:05

Mapper最終生成的鍵值對<key,value> 需要送到Reducer進行合併,相同的key會送到同一個Reducer中,哪個key由哪個Reducer來處理的分配過程是由Partitioner規定的,Partitioner接口如下：

  public abstract class Partitioner<KEY, VALUE> {

/**
* Get the partition number for a given key (hence record) given the total
* number of partitions i.e. number of reduce-tasks for the job.
*
* <p>Typically a hash function on a all or a subset of the key.</p>
*
* @param key the key to be partioned.
* @param value the entry value.
* @param numPartitions the total number of partitions.
* @return the partition number for the <code>key</code>.
*/
public abstract int getPartition(KEY key, VALUE value, int numPartitions);

}

輸入是Map的結果對<key, value>和Reducer的數目，輸出則是分配的Reducer（整數編號）。就是指定Mappr輸出的鍵值對到哪一個reducer上去。系統缺省的Partitioner是HashPartitioner，它以key的Hash值對Reducer的數目取模，得到對應的Reducer。這樣保證如果有相同的key值，肯定被分配到同一個reducre上。如果有N個reducer，編號就爲0,1,2,3……(N-1)。

JobContext.java中如下：

/**
* Get the {@link Partitioner} class for the job.
*
* @return the {@link Partitioner} class for the job.
*/
@SuppressWarnings("unchecked")
public Class<? extends Partitioner<?,?>> getPartitionerClass()
throws ClassNotFoundException {
return (Class<? extends Partitioner<?,?>>)
conf.getClass(PARTITIONER_CLASS_ATTR, HashPartitioner.class);
}

系統缺省的HashPartitioner.java實現如下：

public class HashPartitioner<K, V> extends Partitioner<K, V> {
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}

可以繼承Partitioner抽象類來實現自己的Partitioner對象MyParatitioner,通過job.setPartitionerClass(myParatitioner);來執行

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hadoop中Partitioner解析

SpringMVC+Spring3+Hibernate4的開發環境搭建

Atlas簡介

Hadoop中Partitioner解析

Java Map排序

Java運行時異常

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結