Data compression on Hbase will make your mapreduce job fly

原創

2018-09-03 18:58

如果你需要在HBase的數據上做MapReduce任務，記得打開壓縮選項。

IO speed is always performance bottleneck in any case. So focus on IO performance generally is best practice for performance tuning.

Data compression is one of way to improve IO performance.

Below table is our case, use LZO compression on HBase compare with data none compression.

compression algorithm	Record Count	HDFS Space usage(GB)	MapReduce Job Time
NONE	400,000	190	19mins, 24sec
LZO	400,000	46	9mins, 34sec

Almost 100% increase performance, impressive.

For the compression algorithm, Snappy is another option which seems more faster than LZO.

see, http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/ and http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

微服務實踐Aspire項目發佈到遠程k8s集羣

前提你必須會創建aspire項目，不會的請先看微服務新體驗之Aspire初體驗 Aspirate (Aspir8) Aspirate 是將aspire項目發佈到k8s集羣的工具安裝aspirate dotnet tool install

hiningrise

2024-06-02 14:24:56

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

hiningrise

2024-06-02 14:24:56

.NET開源、跨平臺、使用簡單的面部識別庫

前言今天給大家分享一個.NET開源（MIT License）、免費、跨平臺（適用於 Windows、MacOS 和 Linux ）、使用簡單的面部識別庫：FaceRecognitionDotNet。項目介紹 FaceRecogniti

追逐時光

2024-06-02 14:21:55

Python 潮流週刊#53：我輩楷模，一個約見諾獎得主，一個成爲核心開發者

本週刊由 Python貓出品，精心篩選國內外的 250+ 信息源，爲你挑選最值得分享的文章、教程、開源項目、軟件工具、播客和視頻、熱門話題等內容。願景：幫助所有讀者精進 Python 技術，並增長職業和副業的收入。本期週刊分享了 12

豌豆花下貓

2024-06-02 14:19:15

Terraform管理OpenStack

官方安裝指南 https://developer.hashicorp.com/terraform/install https://developer.hashicorp.com/terraform/intro/getting-sta

馬昌偉

2024-06-02 14:13:44

matlab練習程序（LQR路徑跟蹤）

LQR 是一種優化控制方法，設計目標是找到一組控制輸入，使得線性系統的狀態軌跡儘可能地接近目標，同時使控制輸入儘可能小。其目標函數是一個二次型成本函數。分爲以下幾個步驟： 1. 設系統動態方程爲：其中x爲狀態量，u爲控制輸入，A和B爲

Dsp Tian

2024-06-02 14:11:04

h32 Most commonly used tags in HTML

Most commonly used tags in HTML Last Updated : 08 Mar, 2024 Most commonly used tags in HTML refer to HTM

emanlee

2024-06-02 14:10:23

css45 CSS Math Functions

https://www.w3schools.com/css/css_math_functions.asp The CSS math functions allow mathematical expressions to be used

emanlee

2024-06-02 14:10:23

CSS tutorials (w3school)

CSS tutorials (w3school) https://www.schoolsw3.com/css/index.php (Русский язык) https://www.w3schools.com/css/css_intro

emanlee

2024-06-02 14:10:23

css44 CSS The !important Rule

https://www.w3schools.com/css/css_important.asp What is !important? The !important rule in CSS is used to add more imp

emanlee

2024-06-02 14:10:23

css41 CSS Website Layout

https://www.w3schools.com/css/css_website_layout.asp Website Layout A website is often divided into headers, menus, co

emanlee

2024-06-02 14:10:23

css39 CSS Forms

https://www.w3schools.com/css/css_form.asp The look of an HTML form can be greatly improved with CSS: <!DOCTYPE html>

emanlee

2024-06-02 14:10:23

css40 CSS Counters

https://www.w3schools.com/css/css_counters.asp CSS counters are "variables" maintained by CSS whose values can be inc

emanlee

2024-06-02 14:10:23

css43 CSS Specificity

https://zhuanlan.zhihu.com/p/670589063 CSS Specificity(CSS 特異性)是一個用來決定當多個CSS規則應用於同一個元素時,哪個規則將優先應用的機制。 What is Specific

emanlee

2024-06-02 14:10:23

css42 CSS Units

https://www.w3schools.com/css/css_units.asp CSS Units CSS has several different units for expressing a length. Many CS

emanlee

2024-06-02 14:10:23

24小時熱門文章

Data compression on Hbase will make your mapreduce job fly

工作中用到的腳本合集

微服務實踐Aspire項目發佈到遠程k8s集羣

通過f-string編寫簡潔高效的Python格式化輸出代碼

[轉帖]20個常用的Linux工具命令

[轉帖]PostgreSQL從小白到高手教程 - 第46講：poc-tpch測試

24-5-18 X

如何在Jsp頁面中導入JAVA類。

個人理解：Struts中的Jsp解釋順序

如何在Struts中進行分頁處理

Oracle 的數據批量讀取

Hadoop cannot find namenode pid file when shutdown

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結