原创 scala notes (2) - Class, Object, Package & Import and Inheritance

- Classclass Counter {     private var value = 0 // You must initialize the field, otherwise it's abstract class.     d

原创 Programming with RDD

- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class Sea

原创 YARN (Yet Another Resource Negotiator) - Cluster Manager

- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, map

原创 scala notes (4) - collection

- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is imm

原创 spark - Advanced Spark Programming

- Accumulatorval blankLines = new LongAccumulator sc.register(blankLines)put accumulate in transformation for debugging

原创 HBase MapReduce

- Data Locality, block placement policy. the first copy is written to the data node where region server runs.- TableInp

原创 cassandra, hbase and mongodb

cassandra, AP system, weak consistency, heavy write, high availibility, good for online use hbase, CP system, good supp

原创 kudu - impala

partition nums equal to num of cores in cluster kudu optimizes sql if   =, <=, '\<', '\>', >=, BETWEEN, or IN used, but

原创 compile spark source code

Change scala version to the scala version in your machine: ./dev/change-scala-version.sh <version>Shutdown zinc: ./buil

原创 Software Design

- Design PrinciplesOpen-close, open for extension, close for modificationLiskov substitution, any subclass can be in th

原创 Java offheap memory

- MappedByteBuffer public void copyFile(String filename,String srcpath,String destpath)throws IOException {       File

原创 kafka

- user caselog collectionmessage systemuser activitystream processingevent source- designkafka broker leader, multiple

原创 B tree vs B+ tree

- B tree (key+data in every node),  O(log(d)(n))d is degree of treeh is height of tree, h<= log(d)((n+1)/2)non-leaf nod

原创 LSM Log-Structured Merge-Tree

- Sequential access is better than random access -> WAL, append update to log- Memstore in memory for quick lookup -> M

原创 review list

devopsspringboot and microservicesgmlparser, design patternpersistable queue, java volatile, atomacity