大數據框架hadoop的序列化機制

       對象的序列化(Serialization)用於將對象編碼成一個字節流,以及從字節流中重新構建對象。“將一個對象編碼成一個字節流”稱爲序列化該對象(Serializing);相反的處理過程稱爲反序列化(Deserializing)。

1.1              Java內建序列化機制

Java序列化機制將對象轉換爲連續的byte數據,這些數據可以在日後還原爲原先的對象狀態,該機制還能自動處理不同操作系統上的差異,在Windows系統上序列化的Java對象,可以在UNIX系統上被重建出來,不需要擔心不同機器上的數據表示方法,也不需要擔心字節排列次序。

Java中,使一個類的實例可被序列化非常簡單,只需要在類聲明中加入implements Serializable即可。Serializable接口是一個標誌,不具有任何成員函數,其定義如下:

    publicinterface Serializable {

}

    Serializable接口沒有任何方法,所以不需要對類進行修改,Block類通過聲明它實現了Serializable 接口,立即可以獲得Java提供的序列化功能。代碼如下:

publicclassBlockimplements Writable, Comparable<Block>, Serializable

由於序列化主要應用在與I/O相關的一些操作上,其實現是通過一對輸入/輸出流來實現的。如果想對某個對象執行序列化動作,可以在某種OutputStream對象的基礎上創建一個對象流ObjectOutputStream對象,然後調用writeObject()就可達到目的。

writeObject()方法負責寫入實現了Serializable接口對象的狀態信息,輸出數據將被送至該OutputStream。多個對象的序列化可以在ObjectOutputStream對象上多次調用writeObject(),分別寫入這些對象。下面是序列化對象的例子:

Block block1=new Block(7806259420524417791L,39447755L,56736651L);

... ...

ByteArrayOutputStream out =new ByteArrayOutputStream();

ObjectOutputStream objOut=new ObjectOutputStream(out);

objOut.writeObject(block1);

但是,序列化以後的對象在尺寸上有點過於充實了,以Block類爲例,它只包含3個長整數,但是它的序列化結果竟然有112字節。包含3個長整數的Block對象的序列化結果如下:

-84, -19, 0, 5, 115, 114, 0, 23, 111, 114, 103, 46, 115, 101, 97, 110, 100, 101, 110, 103, 46, 116, 101, 115, 116, 46, 66, 108, 111, 99, 107, 40, -7, 56, 46, 72, 64, -69, 45, 2, 0, 3, 74, 0, 7, 98, 108, 111, 99, 107, 73, 100, 74, 0, 16, 103, 101, 110, 101, 114, 97, 116, 105, 111, 110, 115, 83, 116, 97, 109, 112, 74, 0, 8, 110, 117, 109, 66, 121, 116, 101, 115, 120, 112, 108, 85, 103, -107, 104, -25, -110, -1, 0, 0, 0, 0, 3, 97, -69, -117, 0, 0, 0, 0, 2, 89, -20, -53

1.2              Hadoop序列化機制

Java序列化機制不同(在對象流ObjectOutputStream對象上調用writeObject()方法),Hadoop的序列化機制通過調用對象的write()方法(它帶有一個類型爲DataOutput的參數),將對象序列化到流中。反序列化的過程也是類似,通過對象的readFields(),從流中讀取數據。值得一提的是,Java序列化機制中,反序列化過程會不斷地創建新的對象,但在Hadoop的序列化機制的反序列化過程中,用戶可以複用對象,這減少了Java對象的分配和回收,提高了應用的效率。

public static void main(String[] args) {

    try {

        Block block1 = new Block(1L,2L,3L);

        ... ...

        ByteArrayOutputStream bout = new ByteArrayOutputStream();

        DataOutputStream dout = new DataOutputStream();

        block1.write(dout);

        dout.close();

        ... ...

    }

    ... ...

}

由於Block對象序列化時只輸出了3個長整數,block1的序列化結果一共有24字節。

1.3              Hadoop Writable機制

Hadoop引入org.apache.hadoop.io.Writable接口,作爲所有可序列化對象必須實現的接口,在eclipse開發工具裏看到的大綱視圖如下:



 

java.io.Serializable不同,Writable接口不是一個說明性接口,它包含兩個方法:

publicinterface Writable {

  /**

   * Serialize the fields of this object to <code>out</code>.

   * @param out <code>DataOuput</code> to serialize this object into.

   * @throws IOException

   */

  void write(DataOutput out) throws IOException;

  /**

   * Deserialize the fields of this object from <code>in</code>

   * For efficiency, implementations should attempt to re-use storage in the

   * existing object where possible.</p>

   * @param in <code>DataInput</code> to deseriablize this object from.

   * @throws IOException

   */

  void readFields(DataInput in) throws IOException;

}

Writable.write(DataOutput out)方法用於將對象寫入二進制的DataOutput中,反序列化的過程由readFields(DataInput in)DataInput流中讀取狀態完成。下面是一個例子:

public class Block {

    private long blockId;

    private long numBytes;

    private long generationsStamp;

    public void write(DataOutput out) throws IOException {

        out.writeLong(blockId);

        out.writeLong(numBytes);

        out.writeLong(generationsStamp);

    }

    public void readFields(DataInput in) throws IOException {

        this.blockId = in.readLong();

        this.numBytes = in.readLong();

        this.generationsStamp = in.readLong();

        if (numBytes < 0 ) {

            throw new IOException("Unexpected block size:" + numBytes);

        }

    }

}

Hadoop序列化機制中還包括另外幾個重要接口:WritableComparableRawComparatorWritableComparator

Comparable是一個對象本身就已經支持自比較所需要實現的接口(如Integer自己就可以完成比較大小操作),實現Comparable接口的方法compareTo(),通過傳入要比較的對象即可進行比較。

   Comparator是一個專用的比較器,可以完成兩個對象之間大小的比較。實現Comparator接口的compare()方法,通過傳入需要比較的兩個對象來實現對兩個對象之間大小的比較。

1.4              典型的Writable類詳解

1.4.1       Java基本類型的Writable封裝

Java基本類型對應的Writable封裝如下表:

Java基本類型

Writable

布爾型(Boolean)

BooleanWritable

字節型(byte)

ByteWritable

整型(int)

IntWritable

VIntWritable

浮點型(float)

FloatWritable

長整型(long)

LongWritable

VLongWritable

雙精度浮點型(double)

DoubleWritable

下面以VIntWritable爲例,代碼如下:

publicclass VIntWritable implements WritableComparable {

  privateintvalue;

  public VIntWritable() {}

  public VIntWritable(intvalue) { set(value); }

  /** Set the value of this VIntWritable. */

  publicvoid set(intvalue) { this.value = value; }

  /** Return the value of this VIntWritable. */

  publicint get() { returnvalue; }

  publicvoid readFields(DataInput in) throws IOException {

    value = WritableUtils.readVInt(in);

  }

  publicvoid write(DataOutput out) throws IOException {

    WritableUtils.writeVInt(out, value);

  }

  /** Compares two VIntWritables. */

  publicint compareTo(Object o) {

    intthisValue = this.value;

    intthatValue = ((VIntWritable)o).value;

    return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));

  }

}

    VIntWritable是通過調用Writable工具類中提供的readVInt()writeVInt()/寫數據。

1.4.2       ObjectWritable類的實現

針對類實例,ObjectWritable提供了一個封裝。相關代碼如下:

publicclass ObjectWritable implements Writable, Configurable {

  private Class declaredClass;

  private Object instance;

  private Configuration conf;

  public ObjectWritable() {}

  public ObjectWritable(Object instance) {

    set(instance);

  }

  public ObjectWritable(Class declaredClass, Object instance) {

    this.declaredClass = declaredClass;

    this.instance = instance;

  }

  /** Return the instance, or null if none. */

  public Object get() { returninstance; }

  /** Return the class this is meant to be. */

  public Class getDeclaredClass() { returndeclaredClass; }

  /** Reset the instance. */

  publicvoid set(Object instance) {

    this.declaredClass = instance.getClass();

    this.instance = instance;

  }

  publicvoid readFields(DataInput in) throws IOException {

    readObject(in, this, this.conf);

  }

  publicvoid write(DataOutput out) throws IOException {

    writeObject(out, instance, declaredClass, conf);

  }

  /** Write a {@link Writable}, {@link String}, primitive type, or an array of

   * the preceding. */

  publicstaticvoid writeObject(DataOutput out, Object instance,

                                 Class declaredClass,

                                 Configuration conf) throws IOException {

    if (instance == null) {                       // null

      instance = new NullInstance(declaredClass, conf);

      declaredClass = Writable.class;

    }

    UTF8.writeString(out, declaredClass.getName()); // always write declared

    if (declaredClass.isArray()) {                // array

      intlength = Array.getLength(instance);

      out.writeInt(length);

      for (inti = 0; i < length; i++) {

        writeObject(out, Array.get(instance, i),

        declaredClass.getComponentType(), conf);

      }

    } elseif (declaredClass == String.class) {   // String

      UTF8.writeString(out, (String)instance);

    } elseif (declaredClass.isPrimitive()) {     // primitive type

      if (declaredClass == Boolean.TYPE) {        // boolean

        out.writeBoolean(((Boolean)instance).booleanValue());

      } elseif (declaredClass == Character.TYPE) { // char

        out.writeChar(((Character)instance).charValue());

      } elseif (declaredClass == Byte.TYPE) {    // byte

        out.writeByte(((Byte)instance).byteValue());

      } elseif (declaredClass == Short.TYPE) {   // short

        out.writeShort(((Short)instance).shortValue());

      } elseif (declaredClass == Integer.TYPE) { // int

        out.writeInt(((Integer)instance).intValue());

      } elseif (declaredClass == Long.TYPE) {    // long

        out.writeLong(((Long)instance).longValue());

      } elseif (declaredClass == Float.TYPE) {   // float

        out.writeFloat(((Float)instance).floatValue());

      } elseif (declaredClass == Double.TYPE) {  // double

        out.writeDouble(((Double)instance).doubleValue());

      } elseif (declaredClass == Void.TYPE) {    // void

      } else {

        thrownew IllegalArgumentException("Not a primitive: "+declaredClass);

      }

    } elseif (declaredClass.isEnum()) {         // enum

      UTF8.writeString(out, ((Enum)instance).name());

    } elseif (Writable.class.isAssignableFrom(declaredClass)) { // Writable

      UTF8.writeString(out, instance.getClass().getName());

      ((Writable)instance).write(out);

    } else {

      thrownew IOException("Can't write: "+instance+" as "+declaredClass);

    }

  }

  /** Read a {@link Writable}, {@link String}, primitive type, or an array of

   * the preceding. */

  publicstatic Object readObject(DataInput in, Configuration conf)

    throws IOException {

    return readObject(in, null, conf);

  }

  /** Read a {@link Writable}, {@link String}, primitive type, or an array of

   * the preceding. */

  @SuppressWarnings("unchecked")

  publicstatic Object readObject(DataInput in, ObjectWritable objectWritable, Configuration conf)

    throws IOException {

    String className = UTF8.readString(in);

    Class<?> declaredClass = PRIMITIVE_NAMES.get(className);

    if (declaredClass == null) {

      try {

        declaredClass = conf.getClassByName(className);

      } catch (ClassNotFoundException e) {

        thrownew RuntimeException("readObject can't find class " + className, e);

      }

    }

    Object instance;

    if (declaredClass.isPrimitive()) {            // primitive types

      if (declaredClass == Boolean.TYPE) {             // boolean

        instance = Boolean.valueOf(in.readBoolean());

      } elseif (declaredClass == Character.TYPE) {    // char

        instance = Character.valueOf(in.readChar());

      } elseif (declaredClass == Byte.TYPE) {         // byte

        instance = Byte.valueOf(in.readByte());

      } elseif (declaredClass == Short.TYPE) {        // short

        instance = Short.valueOf(in.readShort());

      } elseif (declaredClass == Integer.TYPE) {      // int

        instance = Integer.valueOf(in.readInt());

      } elseif (declaredClass == Long.TYPE) {         // long

        instance = Long.valueOf(in.readLong());

      } elseif (declaredClass == Float.TYPE) {        // float

        instance = Float.valueOf(in.readFloat());

      } elseif (declaredClass == Double.TYPE) {       // double

        instance = Double.valueOf(in.readDouble());

      } elseif (declaredClass == Void.TYPE) {         // void

        instance = null;

      } else {

        thrownew IllegalArgumentException("Not a primitive: "+declaredClass);

      }

    } elseif (declaredClass.isArray()) {              // array

      intlength = in.readInt();

      instance = Array.newInstance(declaredClass.getComponentType(), length);

      for (inti = 0; i < length; i++) {

        Array.set(instance, i, readObject(in, conf));

      }

    } elseif (declaredClass == String.class) {        // String

      instance = UTF8.readString(in);

    } elseif (declaredClass.isEnum()) {         // enum

      instance = Enum.valueOf((Class<? extends Enum>) declaredClass, UTF8.readString(in));

    } else {                                      // Writable

      Class instanceClass = null;

      String str = "";

      try {

        str = UTF8.readString(in);

        instanceClass = conf.getClassByName(str);

      } catch (ClassNotFoundException e) {

        thrownew RuntimeException("readObject can't find class " + str, e);

      }

      Writable writable = WritableFactories.newInstance(instanceClass, conf);

      writable.readFields(in);

      instance = writable;

      if (instanceClass == NullInstance.class) {  // null

        declaredClass = ((NullInstance)instance).declaredClass;

        instance = null;

      }

    }

    if (objectWritable != null) {                 // store values

      objectWritable.declaredClass = declaredClass;

      objectWritable.instance = instance;

    }

    returninstance;

  }

  ... ...

}

通過readFields方法反序列化一個object。而如果DataInput中傳過來的是Writable 類型,則會在readObject再去調用readFields方法(writable.readFields(in)),直到DataInput中傳遞 的是非Writable 類型,就這樣遞歸的反序列化DataInput中的Writable對象。

readObject()方法依賴於WritableFactories類。WritableFactories類允許非公有的Writable子類定義一個對象工廠,由該工廠創建Writable對象。相關代碼如下:

publicclass WritableFactories {

  privatestaticfinal HashMap<Class, WritableFactory> CLASS_TO_FACTORY =

    new HashMap<Class, WritableFactory>();

  private WritableFactories() {}                  // singleton

  /** Define a factory for a class. */

  publicstaticsynchronizedvoid setFactory(Class c, WritableFactory factory) {

    CLASS_TO_FACTORY.put(c, factory);

  }

  /** Define a factory for a class. */

  publicstaticsynchronized WritableFactory getFactory(Class c) {

    returnCLASS_TO_FACTORY.get(c);

  }

  /** Create a new instance of a class with a defined factory. */

  publicstatic Writable newInstance(Class<? extends Writable> c, Configuration conf) {

    WritableFactory factory = WritableFactories.getFactory(c);

    if (factory != null) {

      Writable result = factory.newInstance();

      if (resultinstanceof Configurable) {

        ((Configurable) result).setConf(conf);

      }

      returnresult;

    } else {

      return ReflectionUtils.newInstance(c, conf);

    }

  }

  /** Create a new instance of a class with a defined factory. */

  publicstatic Writable newInstance(Class<? extends Writable> c) {

    return newInstance(c, null);

  }

}

 

WritableFacories.newInstance()方法根據輸入的類型查找對應的WritableFactory工廠對象,然後調用該對象的newInstance()創建對象,如果該對象是可配置的,newInstance()還會通過對象的setConf()方法配置對象。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章