【Apache HBase系列】HBase ORM框架GORA使用文檔

開源框架 Apache GORA 提供了一個內存中的大數據的數據模型和持久性。

Gora 支持列存儲,關鍵值存儲,文檔存儲和關係數據庫管理系統,具有廣泛的Apache Hadoop的MapReduce的支持和分析數據。

GORA使用步驟:

1、配置gora.properties文件

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true


2、定義數據源BEAN,以JSON格式定義數據源BEAN,

創建一個json文件,內容如下

{
  "type": "record",
  "name": "Pageview",
  "namespace": "org.apache.gora.tutorial.log.generated",
  "fields" : [
    {"name": "url", "type": "string"},
    {"name": "timestamp", "type": "long"},
    {"name": "ip", "type": "string"},
    {"name": "httpMethod", "type": "string"},
    {"name": "httpStatusCode", "type": "int"},
    {"name": "responseSize", "type": "int"},
    {"name": "referrer", "type": "string"},
    {"name": "userAgent", "type": "string"}
  ]
}

3、apache gora使用了arvo框架作爲orm映射的實體,這裏可以使用gora自帶的工具來對json文件進行編譯,獲取你要的實體對象

$ bin/gora goracompile

編譯工具說明如下:

$ Usage: GoraCompiler <schema file> <output dir> [-license <id>]
   <schema file>     - individual avsc file to be compiled or a directory path containing avsc files
   <output dir>      - output directory for generated Java files
   [-license <id>]   - the preferred license header to add to the
               generated Java file. Current options include; 
      ASLv2   (Apache Software License v2.0) 
      AGPLv3  (GNU Affero General Public License)
      CDDLv1  (Common Development and Distribution License v1.0)
      FDLv13  (GNU Free Documentation License v1.3)
      GPLv1   (GNU General Public License v1.0)
      GPLv2   (GNU General Public License v2.0)
      GPLv3   (GNU General Public License v3.0)
      LGPLv21 (GNU Lesser General Public License v2.1)
      LGPLv3  (GNU Lesser General Public License v2.1)

示例:

$ bin/gora goracompiler gora-tutorial/src/main/avro/pageview.json gora-tutorial/src/main/java/

4、定義數據存儲映射:gora-hbase-mapping.xml

完成以上三部工作之後,接下來需要做的是實體和表的映射配置

示例如下:

<!--  This is gora-sql-mapping.xml

<gora-orm>
  <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
   <primarykey column="line"/>
  <field name="url" column="url" length="512" primarykey="true"/>
  <field name="timestamp" column="timestamp"/>
  <field name="ip" column="ip" length="16"/>
  <field name="httpMethod" column="httpMethod" length="6"/>
  <;field name="httpStatusCode" column="httpStatusCode"/>
  <field name="responseSize" column="responseSize"/>
  <field name="referrer" column="referrer" length="512"/>
  <field name="userAgent" column="userAgent" length="512"/>
  </class>

 ...

</gora-orm>

  -->

<gora-orm>
  <table name="Pageview"> <!-- optional descriptors for tables -->
    <family name="common"> <!-- This can also have params like compression, bloom filters -->
    <family name="http"/>
    <family name="misc"/>
  </table>

  <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
   <field name="url" family="common" qualifier="url"/>
   <field name="timestamp" family="common" qualifier="timestamp"/>
   <field name="ip" family="common" qualifier="ip" />
   <field name="httpMethod" family="http" qualifier="httpMethod"/>
   <field name="httpStatusCode" family="http" qualifier="httpStatusCode"/>
   <field name="responseSize" family="http" qualifier="responseSize"/>
   <field name="referrer" family="misc" qualifier="referrer"/>
   <field name="userAgent" family="misc" qualifier="userAgent"/>
  </class>

  ...

</gora-orm>

5、Api

1)、初始化創建HBaseStore對象

private void init() throws IOException {
    dataStore = DataStoreFactory.getDataStore(Long.class, Pageview.class);
  }

這裏GORA會根據你上面編譯的實體類以及gora-hbase-mapping.xml幫你創建好相應的hbase數據庫表

2)、數據存儲

/** Stores the pageview object with the given key */
private void storePageview(long key, Pageview pageview) throws IOException {
  dataStore.put(key, pageview);
}

3)、讀取數據

/** Fetches a single pageview object and prints it*/
private void get(long key) throws IOException {
  Pageview pageview = dataStore.get(key);
  printPageview(pageview);
}

4)、查詢

/** Queries and prints pageview object that have keys between startKey and endKey*/
private void query(long startKey, long endKey) throws IOException {
  Query<Long, Pageview> query = dataStore.newQuery();
  //set the properties of query
  query.setStartKey(startKey);
  query.setEndKey(endKey);

  Result<Long, Pageview> result = query.execute();

  printResult(result);
}

遍歷結果

private void printResult(Result<Long, Pageview> result) throws IOException {

  while(result.next()) { //advances the Result object and breaks if at end
    long resultKey = result.getKey(); //obtain current key
    Pageview resultPageview = result.get(); //obtain current value object

    //print the results
    System.out.println(resultKey + ":");
    printPageview(resultPageview);
  }

  System.out.println("Number of pageviews from the query:" + result.getOffset());
}

5)、刪除數據

/**Deletes the pageview with the given line number */
private void delete(long lineNum) throws Exception {
  dataStore.delete(lineNum);
  dataStore.flush(); //write changes may need to be flushed before they are committed 
}

/** This method illustrates delete by query call */
private void deleteByQuery(long startKey, long endKey) throws IOException {
  //Constructs a query from the dataStore. The matching rows to this query will be deleted
  Query<Long, Pageview> query = dataStore.newQuery();
  //set the properties of query
  query.setStartKey(startKey);
  query.setEndKey(endKey);

  dataStore.deleteByQuery(query);
}

6)、MapReduce支持

JOB:

public Job createJob(DataStore<Long, Pageview> inStore
      , DataStore<String, MetricDatum> outStore, int numReducer) throws IOException {
    Job job = new Job(getConf());

    job.setJobName("Log Analytics");
    job.setNumReduceTasks(numReducer);
    job.setJarByClass(getClass());

    /* Mappers are initialized with GoraMapper.initMapper() or 
     * GoraInputFormat.setInput()*/
    GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
        , LogAnalyticsMapper.class, true);

    /* Reducers are initialized with GoraReducer#initReducer().
     * If the output is not to be persisted via Gora, any reducer 
     * can be used instead. */
    GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

    return job;
  }

Mapper:

private TextLong tuple;

protected void map(Long key, Pageview pageview, Context context) 
  throws IOException ,InterruptedException {

  Utf8 url = pageview.getUrl();
  long day = getDay(pageview.getTimestamp());

  tuple.getKey().set(url.toString());
  tuple.getValue().set(day);

  context.write(tuple, one);
};

Reducer:

protected void reduce(TextLong tuple
    , Iterable<LongWritable> values, Context context) 
  throws IOException ,InterruptedException {

  long sum = 0L; //sum up the values
  for(LongWritable value: values) {
    sum+= value.get();
  }

  String dimension = tuple.getKey().toString();
  long timestamp = tuple.getValue().get();

  metricDatum.setMetricDimension(new Utf8(dimension));
  metricDatum.setTimestamp(timestamp);

  String key = metricDatum.getMetricDimension().toString();
  metricDatum.setMetric(sum);

  context.write(key, metricDatum);
};

GORA除了支持HBASE外,還支持sql(mysql、hsql),dynamodb,cassandra,accumulo。需要的話大夥可以試試其他功能。具體使用與上面的使用方法類似!






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章