原创 MapReduce Workflow

check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests

原创 scala notes (3) - Files & Regular Expression, Trait, Operation and Function

- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfi

原创 MapReduce Features

- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters

原创 scala notes (5) - pattern and case class

- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ... }prefix match

原创 scala type parameters

- type bounds   class Pair[T <: Comparable[T]](val first: T, val second: T) { def smaller = if (first.compareTo(second)

原创 scala notes (6) - Annotation, Future and Type Parameter

- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.ju

原创 HBase Filters, Counters &amp; Coprocessors

- Scan, setCaching(rows), setBatch(cells)- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilte

原创 scala notes (1) - Basic, Control & Function, Array and Map & Tuple

- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int ->

原创 scala notes (7) - Advanced Type and Implicit

- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypes def set(obj: Titl

原创 MapReduce Types and Formats

- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partit

原创 Hadoop I/O

- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification

原创 HDFS

- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable in

原创 spark - Pair RDD (Key/Value Pairs)

- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformatio

原创 MapReduce Application

- Configurationconf.addDefaultResource, conf.addResource, configuration overridden <property> <name>fs.defaultFS</name>

原创 spark - Loading and Saving Data

- File FormatsText Filesc.textFile, load a text filesc.wholeTextFiles, load multiple files (filename, entire content) u