原创 Java GC

young generation and old generation. 1 eden and 2 survivor spaces.minor GC, mark and copy, from eden and one survivor t

原创 HBase Filters, Counters & Coprocessors

- Scan, setCaching(rows), setBatch(cells)- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilte

原创 HBase Region Split

- Split Policy (ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy(default), KeyPrefixRegionSplitPo

原创 submit spark code to yarn

- configure eclipse, add scala-ide plugin and m2e-scala plugin (http://alchim31.free.fr/m2e-scala/update-site/)- config

原创 JVM trouble shooting

- JPS, TOP and JSTACK,  jps to find java info, like classname, parameters of main, JVM arguments, pid, jps -m -ltop to

原创 spark - Running on Cluster

- package spark app (maven)<plugin>     <groupId>org.apache.maven.plugins</groupId>     <artifactId>maven-shade-plugin<

原创 Spark Trouble Shooting and Performance Tuning

- spark master server - more memory export SPARK_DAEMON_MEMORY=5gspark.ui.retainedJobs 500 # 默認都是1000 spark.ui.retain

原创 spark - Tuning and Debugging Spark

- submit application (sparkconf object cannot be changed after SparkContext creationmethod 1bin/spark-submit \ —class c

原创 HBase Concept

- Data Model, sparse, distributed, persisted multidimensional sorted map(row:string, column:string, time:int64) -> stri

原创 bloom filter

- space efficient look up for fixed number of static elements. - may have, definitely no haven: number of elementsk: nu