大數據學習記錄（day5）-Hadoop之Mapper類和Reducer類代碼學習

原創

alvin_2005

2020-02-20 18:01

學習來源：http://www.aboutyun.com/thread-5597-1-1.html
http://www.aboutyun.com/thread-5598-1-1.html
說明：由於參考資料大多是2013年以前的，所以有些說法也許並不成立，請讀者選擇性吸收。

       今天繼續來讀代碼，關於Hadoop之Mapper類和Reducer類。
       一、Mapper類。
      在Hadoop的mapper類中，有4個主要的函數，分別是：setup，clearup，map，run。代碼如下：

protected void setup(Context context) throws IOException, InterruptedException {
// NOTHING
}

protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
context.write((KEYOUT) key, (VALUEOUT) value);
}

protected void cleanup(Context context) throws IOException, InterruptedException {
// NOTHING
}

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKeyValue()) {
      map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
    cleanup(context);
  }
}

由上面的代碼，我們可以瞭解到，當調用到map時，通常會先執行一個setup函數，最後會執行一個cleanup函數。而默認情況下，這兩個函數的內容都是nothing。因此，當map方法不符合應用要求時，可以試着通過增加setup和cleanup的內容來滿足應用的需求。
二、Reducer類。

在Hadoop的reducer類中，有3個主要的函數，分別是：setup，clearup，reduce。代碼如下：

  /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

/**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

/**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

在用戶的應用程序中調用到reducer時，會直接調用reducer裏面的run函數，其代碼如下：

/*
   * control how the reduce task works.
   */
  @SuppressWarnings("unchecked")
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
      // If a back up store is used, reset it
      ((ReduceContext.ValueIterator)
          (context.getValues().iterator())).resetBackupStore();
    }
    cleanup(context);
  }
}

由上面的代碼，我們可以瞭解到，當調用到reduce時，通常會先執行一個setup函數，最後會執行一個cleanup函數。而默認情況下，這兩個函數的內容都是nothing。因此，當reduce不符合應用要求時，可以試着通過增加setup和cleanup的內容來滿足應用的需求。
小結：
今天以閱讀代碼的形式學習了Hadoop中Mapper類和Reducer類的主要方法。

alvin_2005

發佈了80 篇原創文章 · 獲贊 3 · 訪問量 12萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

大數據學習記錄（day5）-Hadoop之Mapper類和Reducer類代碼學習

DAPPER 事務 TRANSACTION

大數據學習記錄（day1）--雲計算

大數據學習記錄（day6）-圖說Mapreduce工作機制

大數據學習記錄（day2）-Hadoop概述

有關C#+ASP.NET中用戶控件(webusercontrol)使用問題2則

Python學習筆記（五）模塊及面向對象編程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結