原创 MapReduce Workflow
check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests
原创 scala notes (3) - Files & Regular Expression, Trait, Operation and Function
- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfi
原创 MapReduce Features
- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters
原创 scala notes (5) - pattern and case class
- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ... }prefix match
原创 scala type parameters
- type bounds class Pair[T <: Comparable[T]](val first: T, val second: T) { def smaller = if (first.compareTo(second)
原创 scala notes (6) - Annotation, Future and Type Parameter
- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.ju
原创 HBase Filters, Counters & Coprocessors
- Scan, setCaching(rows), setBatch(cells)- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilte
原创 scala notes (1) - Basic, Control & Function, Array and Map & Tuple
- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int ->
原创 scala notes (7) - Advanced Type and Implicit
- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypes def set(obj: Titl
原创 MapReduce Types and Formats
- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partit
原创 Hadoop I/O
- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification
原创 HDFS
- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable in
原创 spark - Pair RDD (Key/Value Pairs)
- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformatio
原创 MapReduce Application
- Configurationconf.addDefaultResource, conf.addResource, configuration overridden <property> <name>fs.defaultFS</name>
原创 spark - Loading and Saving Data
- File FormatsText Filesc.textFile, load a text filesc.wholeTextFiles, load multiple files (filename, entire content) u