Writable子接口:
Hadoop引入org.apache.hadoop.io.Writable接口,是所有可序列化對象必須實現的接口
在hadoop2.71.中,Writable共有6個子接口:
Counter,CounterGroup,CounterGroupBase<T>,InputSplit,InputSplitWithLocationInfo,WritableComparable<T>
在org.apache.hadoop.io中有一個:WritableComparable<T>
WritableComparable,顧名思義,它提供類型比較的能力,WritableComparable
s 能夠通過
Comparator
s進行互相比較。主要是針對MapReduce而設計的,中間有個排序很重要。在 Hadoop Map-Reduce framework中,任何key值類型都要實現這個接口。WritableComparable是可序列化的,所以,同樣實現了readFiels()和write()這兩個序列化和反序列化方法,它多了一個比較的能力,所以實現compareTo()方法,該方法即是比較和排序規則的實現,因此,使用該類的實現類MR中的key值就既能可序列化又是可比較的。當然了,如果僅是做爲值使用的話,僅實現Writable接口即可,接下來也要重點介紹WritableComparable這個接口。
另外在org.apache.hadoop.io還有一個相關接口:WritableFactory
同時在這個包裏有一個實現類:WritableFactories
其目的就是將所有Writable類型註冊到WritableFactories裏統一治理,利用WritableFactories形成註冊到這裏的Writable對象,這種統一管理的方式主要提供系統可用Writable類型的方便性。如若系統規模很大,Writable類型對象四處疏散在系統,那麼有時不能很直觀地看到一個類型是否是Writable類型。通過這個WritableFactory你可以返回你想要的Writable對象,這個工廠產生的Writable對象可能會用於某些ObjectWritable的readFileds()等方法中。
Writable實現類:
當然了,Writable有着自己一大堆的實現類:
AbstractCounters, org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier,AbstractMapWritable,AccessControlList,AggregatedLogFormat.LogKey,AMRMTokenIdentifier,ArrayPrimitiveWritable,ArrayWritable,BloomFilter,BooleanWritable,BytesWritable,ByteWritable,ClientToAMTokenIdentifier,ClusterMetrics, ClusterStatus, CombineFileSplit, CombineFileSplit, CompositeInputSplit, CompositeInputSplit, CompressedWritable, Configuration, ContainerTokenIdentifier, ContentSummary, Counters, Counters, Counters.Counter, Counters.Group, CountingBloomFilter, DoubleWritable, DynamicBloomFilter, EnumSetWritable, FileChecksum, FileSplit, FileSplit, FileStatus, org.apache.hadoop.util.bloom.Filter, FloatWritable, FsPermission, FsServerDefaults, FsStatus, GenericWritable, ID, ID, IntWritable, JobConf, JobID, JobID, JobQueueInfo, JobStatus, JobStatus, LocatedFileStatus, LongWritable, MapWritable, MD5Hash, MultiFileSplit, NMTokenIdentifier, NullWritable, ObjectWritable, QueueAclsInfo, QueueInfo, Record, RecordTypeInfo, RetouchedBloomFilter, RMDelegationTokenIdentifier, ShortWritable, SortedMapWritable, TaskAttemptID, TaskAttemptID, TaskCompletionEvent, TaskCompletionEvent, TaskID, TaskID, org.apache.hadoop.mapreduce.TaskReport, TaskReport, TaskTrackerInfo, Text, TimelineDelegationTokenIdentifier, org.apache.hadoop.security.token.TokenIdentifier,TupleWritable,TupleWritable,TwoDArrayWritable,VersionedWritable,VIntWritable,VLongWritable,YarnConfiguration, org.apache.hadoop.yarn.security.client.YARNDelegationTokenIdentifier
相關類圖:
org.apache.hadoop.io類圖:
重要子接口:WritableComparable
在包org.apache.hadoop.io中,有着Writable的一個重要的子接口:WritableComparable,從上面的類圖中可以看出,ByteWritable、IntWritable、DoubleWritable等Java基本類型對應的Writable類型,都繼承自WritableComparable。
hadoop2.7.1中的WritableComparable源碼:
package org.apache.hadoop.io;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
/**
* A {@link Writable} which is also {@link Comparable}.
*
* <p><code>WritableComparable</code>s can be compared to each other, typically
* via <code>Comparator</code>s. Any type which is to be used as a
* <code>key</code> in the Hadoop Map-Reduce framework should implement this
* interface.</p>
*
* <p>Note that <code>hashCode()</code> is frequently used in Hadoop to partition
* keys. It's important that your implementation of hashCode() returns the same
* result across different instances of the JVM. Note also that the default
* <code>hashCode()</code> implementation in <code>Object</code> does <b>not</b>
* satisfy this property.</p>
*
* <p>Example:</p>
* <p><blockquote><pre>
* public class MyWritableComparable implements WritableComparable<MyWritableComparable> {
* // Some data
* private int counter;
* private long timestamp;
*
* public void write(DataOutput out) throws IOException {
* out.writeInt(counter);
* out.writeLong(timestamp);
* }
*
* public void readFields(DataInput in) throws IOException {
* counter = in.readInt();
* timestamp = in.readLong();
* }
*
* public int compareTo(MyWritableComparable o) {
* int thisValue = this.value;
* int thatValue = o.value;
* return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
* }
*
* public int hashCode() {
* final int prime = 31;
* int result = 1;
* result = prime * result + counter;
* result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
* return result
* }
* }
* </pre></blockquote></p>
*/
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface WritableComparable<T> extends Writable, Comparable<T> {
}
接口WritableComparable的註解:
通過類圖進行觀察:
hadoop2.7.1的org.apache.hadoop.io包中跟Writable相關的實現類分兩類,一類是直接繼承自Writable,一類是繼承自WritableComparable。
通過上面的分析及類圖可以看出,繼承自WritableComparable的類實現了有比較能力的java基本類,在hadoop2.7.1中所有實現了WritableComparable接口的類如下:
- BooleanWritable,BytesWritable,ByteWritable,DoubleWritable,FloatWritable,ID,ID,IntWritable,JobID, JobID, LongWritable, MD5Hash, NullWritable, Record, RecordTypeInfo, ShortWritable, TaskAttemptID, TaskAttemptID, TaskID, TaskID, Text, VIntWritable, VLongWritable
當自定義的序列化類用做key時,需要考慮到在根據key進行reduce分區時經常用到hashCode()方法,需要確保該方法在不同的JVM實例中返回相同的結果,而Object對象中默認的hashCode()方法不能夠滿足該特性,所以在實現自定義類時需要重寫hashCode()方法,而如果兩個對象根據equals()方法是相等的,那麼二者的hashCode()返回值也必須相同,因此在重寫hashCode()的時候,有必要重寫equals(Object o)方法,在WritableComparable的源碼舉得例子中並沒有實現equals()方法,而它的實現類都實現了equals(Object o)方法,感興趣的可以深入瞭解下。
而直接繼承自Writable的實現類並沒有實現上述的三個用作比較的方法:compareTo()、hashCode()、equals()。
相關接口:WritableFactory
源碼:
package org.apache.hadoop.io;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
/** A factory for a class of Writable.
* @see WritableFactories
*/
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface WritableFactory {
/** Return a new instance. */
Writable newInstance();
}
實現類
WritableFactories
源碼:
package org.apache.hadoop.io;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.util.ReflectionUtils;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
/** Factories for non-public writables. Defining a factory permits {@link
* ObjectWritable} to be able to construct instances of non-public classes. */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class WritableFactories {
private static final Map<Class, WritableFactory> CLASS_TO_FACTORY =
new ConcurrentHashMap<Class, WritableFactory>();
private WritableFactories() {} // singleton
/** Define a factory for a class. */
public static void setFactory(Class c, WritableFactory factory) {
CLASS_TO_FACTORY.put(c, factory);
}
/** Define a factory for a class. */
public static WritableFactory getFactory(Class c) {
return CLASS_TO_FACTORY.get(c);
}
/** Create a new instance of a class with a defined factory. */
public static Writable newInstance(Class<? extends Writable> c, Configuration conf) {
WritableFactory factory = WritableFactories.getFactory(c);
if (factory != null) {
Writable result = factory.newInstance();
if (result instanceof Configurable) {
((Configurable) result).setConf(conf);
}
return result;
} else {
return ReflectionUtils.newInstance(c, conf);
}
}
/** Create a new instance of a class with a defined factory. */
public static Writable newInstance(Class<? extends Writable> c) {
return newInstance(c, null);
}
}