原创 scala notes (2) - Class, Object, Package & Import and Inheritance
- Classclass Counter { private var value = 0 // You must initialize the field, otherwise it's abstract class. d
原创 Programming with RDD
- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class Sea
原创 YARN (Yet Another Resource Negotiator) - Cluster Manager
- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, map
原创 scala notes (4) - collection
- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is imm
原创 spark - Advanced Spark Programming
- Accumulatorval blankLines = new LongAccumulator sc.register(blankLines)put accumulate in transformation for debugging
原创 HBase MapReduce
- Data Locality, block placement policy. the first copy is written to the data node where region server runs.- TableInp
原创 cassandra, hbase and mongodb
cassandra, AP system, weak consistency, heavy write, high availibility, good for online use hbase, CP system, good supp
原创 kudu - impala
partition nums equal to num of cores in cluster kudu optimizes sql if =, <=, '\<', '\>', >=, BETWEEN, or IN used, but
原创 compile spark source code
Change scala version to the scala version in your machine: ./dev/change-scala-version.sh <version>Shutdown zinc: ./buil
原创 Software Design
- Design PrinciplesOpen-close, open for extension, close for modificationLiskov substitution, any subclass can be in th
原创 Java offheap memory
- MappedByteBuffer public void copyFile(String filename,String srcpath,String destpath)throws IOException { File
原创 kafka
- user caselog collectionmessage systemuser activitystream processingevent source- designkafka broker leader, multiple
原创 B tree vs B+ tree
- B tree (key+data in every node), O(log(d)(n))d is degree of treeh is height of tree, h<= log(d)((n+1)/2)non-leaf nod
原创 LSM Log-Structured Merge-Tree
- Sequential access is better than random access -> WAL, append update to log- Memstore in memory for quick lookup -> M
原创 review list
devopsspringboot and microservicesgmlparser, design patternpersistable queue, java volatile, atomacity