今天寫了一個writable,其代碼如下:
public class CFWritable implements Writable {
private IntWritable mark ;//標識位
private List<ItemWritable> items ;
public CFWritable(){
mark = new IntWritable(0);
items = new ArrayList<ItemWritable>(2);
}
public CFWritable(int mark,List<ItemWritable> items){
this.mark = new IntWritable(mark);
this.items = items ;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(items.size());
mark.write(out);
for(ItemWritable item:items){
item.write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
int itemsSize = in.readInt();
mark.readFields(in);
for(int i = 0 ; i < itemsSize; i ++){
ItemWritable item = new ItemWritable();
item.readFields(in);
items.add(item);
}
}
public int getMark() {
return mark.get();
}
public void setMark(int mark) {
this.mark = new IntWritable(mark);
}
public List<ItemWritable> getItems() {
return items;
}
public void setItems(List<ItemWritable> items) {
this.items = items;
}
}
上面的代碼在跑集羣任務的時候,發現Reduce到66%這個數後就基本上不動了。排查一番,感覺類中的items的個數不會超過100個,那麼在計算的時候不應該慢下來。爲了驗證想法,自己在程序中打印了一些信息,其中就包含items的size;打印出來的結果令我不解,items的size就是前面的累計。
仔細排查代碼後,突然在腦中一閃:在ruduce的時候,mr爲了加快速度(不要重新new)就複用了writable的類,而我這裏卻沒有任何機制清空items,所以這裏會一直在items 的後面添加數據。
問題找到後,修改代碼如下:
public class CFWritable implements Writable {
private IntWritable mark ;//標識位
private List<ItemWritable> items ;
public CFWritable(){
mark = new IntWritable(0);
items = new ArrayList<ItemWritable>(2);
}
public CFWritable(int mark,List<ItemWritable> items){
this.mark = new IntWritable(mark);
this.items = items ;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(items.size());
mark.write(out);
for(ItemWritable item:items){
item.write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
<span style="color:#ff0000;">clear();//先清除上次給的值</span>
int itemsSize = in.readInt();
mark.readFields(in);
for(int i = 0 ; i < itemsSize; i ++){
ItemWritable item = new ItemWritable();
item.readFields(in);
items.add(item);
}
}
public void clear(){
items.clear();
}
public int getMark() {
return mark.get();
}
public void setMark(int mark) {
this.mark = new IntWritable(mark);
}
public List<ItemWritable> getItems() {
return items;
}
public void setItems(List<ItemWritable> items) {
this.items = items;
}
}