Spark-utils 類

@(spark)[reading]
按字母序簡單描述

ActorLogReceive

記錄所有的Actor信息
/**                                                                                                                                                                     
 * A trait to enable logging all Akka actor messages. Here's an example of using this:                                                                                  
 *                                                                                                                                                                      
 * {{{                                                                                                                                                                  
 *   class BlockManagerMasterActor extends Actor with ActorLogReceive with Logging {                                                                                    
 *     ...                                                                                                                                                              
 *     override def receiveWithLogging = {                                                                                                                              
 *       case GetLocations(blockId) =>                                                                                                                                  
 *         sender ! getLocations(blockId)                                                                                                                               
 *       ...                                                                                                                                                            
 *     }                                                                                                                                                                
 *     ...                                                                                                                                                              
 *   }                                                                                                                                                                  
 * }}}                                                                                                                                                                  
 *                                                                                                                                                                      
 */                                                                                                                                                                     
private[spark] trait ActorLogReceive {

AkkaUtils

和Akka相關的，主要是start akka service

AsynchronousListenerBus

異步的ListenerBus

BoundedPriorityQueue

/**                                                                                                                                                                     
 * Bounded priority queue. This class wraps the original PriorityQueue                                                                                                  
 * class and modifies it such that only the top K elements are retained.                                                                                                
 * The top K elements are defined by an implicit Ordering[A].                                                                                                           
 */                                                                                                                                                                     
private[spark] class BoundedPriorityQueue[A](maxSize: Int)(implicit ord: Ordering[A])

ByteBufferInputStream

/**                                                                                                                                                                     
 * Reads data from a ByteBuffer, and optionally cleans it up using BlockManager.dispose()                                                                               
 * at the end of the stream (e.g. to close a memory-mapped file).                                                                                                       
 */                                                                                                                                                                     
private[spark]                                                                                                                                                          
class ByteBufferInputStream(private var buffer: ByteBuffer, dispose: Boolean = false)

Clock

/**                                                                                                                                                                     
 * An interface to represent clocks, so that they can be mocked out in unit tests.                                                                                      
 */                                                                                                                                                                     
private[spark] trait Clock {                                                                                                                                            
  def getTimeMillis(): Long                                                                                                                                             
  def waitTillTime(targetTime: Long): Long                                                                                                                              
}

提供了一個Clock，在本文件中實現了SystemClock。

ClosureCleaner

和語言相關

CollectionsUtils

目前只有二分搜索

CompletionIterator

/**                                                                                                                                                                     
 * Wrapper around an iterator which calls a completion method after it successfully iterates                                                                            
 * through all the elements.                                                                                                                                            
 */                                                                                                                                                                     
private[spark]                                                                                                                                                          
// scalastyle:off                                                                                                                                                       
abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends Iterator[A] {

Distribution

/**                                                                                                                                                                     
 * Util for getting some stats from a small sample of numeric values, with some handy                                                                                   
 * summary functions.                                                                                                                                                   
 *                                                                                                                                                                      
 * Entirely in memory, not intended as a good way to compute stats over large data sets.                                                                                
 *                                                                                                                                                                      
 * Assumes you are giving it a non-empty set of data                                                                                                                    
 */                                                                                                                                                                     
private[spark] class Distribution(val data: Array[Double], val startIdx: Int, val endIdx: Int) {

EventLoop

/**                                                                                                                                                                     
 * An event loop to receive events from the caller and process all events in the event thread. It                                                                       
 * will start an exclusive event thread to process all events.                                                                                                          
 *                                                                                                                                                                      
 * Note: The event queue will grow indefinitely. So subclasses should make sure `onReceive` can                                                                         
 * handle events in time to avoid the potential OOM.                                                                                                                    
 */                                                                                                                                                                     
private[spark] abstract class EventLoop[E](name: String) extends Logging {

IdGenerator

/**                                                                                                                                                                     
 * A util used to get a unique generation ID. This is a wrapper around Java's                                                                                           
 * AtomicInteger. An example usage is in BlockManager, where each BlockManager                                                                                          
 * instance would start an Akka actor and we use this utility to assign the Akka                                                                                        
 * actors unique names.                                                                                                                                                 
 */                                                                                                                                                                     
private[spark] class IdGenerator {

IntParam

/**                                                                                                                                                                     
 * An extractor object for parsing strings into integers.                                                                                                               
 */                                                                                                                                                                     
private[spark] object IntParam {

爲啥每個project都自己搞個類似的東東

JsonProtocol

SparkListenerEvent 和 json之間的互相轉換，前者是一個trait，可以認爲是消息類型

ListenerBus

/**                                                                                                                                                                     
 * An event bus which posts events to its listeners.                                                                                                                    
 */                                                                                                                                                                     
private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {

ManualClock

/**                                                                                                                                                                     
 * A `Clock` whose time can be manually set and modified. Its reported time does not change                                                                             
 * as time elapses, but only as its time is modified by callers. This is mainly useful for                                                                              
 * testing.                                                                                                                                                             
 *                                                                                                                                                                      
 * @param time initial time (in milliseconds since the epoch)                                                                                                           
 */                                                                                                                                                                     
private[spark] class ManualClock(private var time: Long) extends Clock {

MemoryParam

/**                                                                                                                                                                     
 * An extractor object for parsing JVM memory strings, such as "10g", into an Int representing                                                                          
 * the number of megabytes. Supports the same formats as Utils.memoryStringToMb.                                                                                        
 */                                                                                                                                                                     
private[spark] object MemoryParam {

MetadataCleaner

/**                                                                                                                                                                     
 * Runs a timer task to periodically clean up metadata (e.g. old files or hashtable entries)                                                                            
 */                                                                                                                                                                     
private[spark] class MetadataCleaner(                                                                                                                                   
    cleanerType: MetadataCleanerType.MetadataCleanerType,                                                                                                               
    cleanupFunc: (Long) => Unit,                                                                                                                                        
    conf: SparkConf)

MutablePair

是個pair

/**                                                                                                                                                                     
 * :: DeveloperApi ::                                                                                                                                                   
 * A tuple of 2 elements. This can be used as an alternative to Scala's Tuple2 when we want to                                                                          
 * minimize object allocation.                                                                                                                                          
 *                                                                                                                                                                      
 * @param  _1   Element 1 of this MutablePair                                                                                                                           
 * @param  _2   Element 2 of this MutablePair                                                                                                                           
 */                                                                                                                                                                     
@DeveloperApi                                                                                                                                                           
case class MutablePair[@specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T1,                                                                                 
                       @specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T2]

MutableURLClassLoader

/**                                                                                                                                                                     
 * URL class loader that exposes the `addURL` and `getURLs` methods in URLClassLoader.                                                                                  
 */                                                                                                                                                                     
private[spark] class MutableURLClassLoader(urls: Array[URL], parent: ClassLoader)

NextIterator

/** Provides a basic/boilerplate Iterator implementation. */                                                                                                            
private[spark] abstract class NextIterator[U] extends Iterator[U] {

ParentClassLoader

/**                                                                                                                                                                     
 * A class loader which makes some protected methods in ClassLoader accesible.                                                                                          
 */                                                                                                                                                                     
private[spark] class ParentClassLoader(parent: ClassLoader) extends ClassLoader(parent) {

SerializableBuffer

/**                                                                                                                                                                     
 * A wrapper around a java.nio.ByteBuffer that is serializable through Java serialization, to make                                                                      
 * it easier to pass ByteBuffers in case class messages.                                                                                                                
 */                                                                                                                                                                     
private[spark]                                                                                                                                                          
class SerializableBuffer(@transient var buffer: ByteBuffer) extends Serializable {

SizeEstimator

/**                                                                                                                                                                     
 * Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in                                                                              
 * memory-aware caches.                                                                                                                                                 
 *                                                                                                                                                                      
 * Based on the following JavaWorld article:                                                                                                                            
 * http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html                                                                                             
 */                                                                                                                                                                     
private[spark] object SizeEstimator extends Logging {

SignalLogger

/**                                                                                                                                                                     
 * Used to log signals received. This can be very useful in debugging crashes or kills.                                                                                 
 *                                                                                                                                                                      
 * Inspired by Colin Patrick McCabe's similar class from Hadoop.                                                                                                        
 */                                                                                                                                                                     
private[spark] object SignalLogger {

SparkExitCode

private[spark] object SparkExitCode {                                                                                                                                   
  /** The default uncaught exception handler was reached. */                                                                                                            
  val UNCAUGHT_EXCEPTION = 50                                                                                                                                           

  /** The default uncaught exception handler was called and an exception was encountered while                                                                          
      logging the exception. */                                                                                                                                         
  val UNCAUGHT_EXCEPTION_TWICE = 51                                                                                                                                     

  /** The default uncaught exception handler was reached, and the uncaught exception was an                                                                             
      OutOfMemoryError. */                                                                                                                                              
  val OOM = 52                                                                                                                                                          

}

StatCounter

/**                                                                                                                                                                     
 * A class for tracking the statistics of a set of numbers (count, mean and variance) in a                                                                              
 * numerically robust way. Includes support for merging two StatCounters. Based on Welford                                                                              
 * and Chan's [[http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance algorithms]]                                                                           
 * for running variance.                                                                                                                                                
 *                                                                                                                                                                      
 * @constructor Initialize the StatCounter with the given values.                                                                                                       
 */                                                                                                                                                                     
class StatCounter(values: TraversableOnce[Double]) extends Serializable {

TimeStampedHashMap

/**                                                                                                                                                                     
 * This is a custom implementation of scala.collection.mutable.Map which stores the insertion                                                                           
 * timestamp along with each key-value pair. If specified, the timestamp of each pair can be                                                                            
 * updated every time it is accessed. Key-value pairs whose timestamp are older than a particular                                                                       
 * threshold time can then be removed using the clearOldValues method. This is intended to                                                                              
 * be a drop-in replacement of scala.collection.mutable.HashMap.                                                                                                        
 *                                                                                                                                                                      
 * @param updateTimeStampOnGet Whether timestamp of a pair will be updated when it is accessed                                                                          
 */                                                                                                                                                                     
private[spark] class TimeStampedHashMap[A, B](updateTimeStampOnGet: Boolean = false)

Utils

Various utility methods used by Spark.
真的是大雜燴

logging

提供兩種方式fileappend和roll

io

只有一個類：ByteArrayChunkOutputStream

/**                                                                                                                                                                     
 * An OutputStream that writes to fixed-size chunks of byte arrays.                                                                                                     
 *                                                                                                                                                                      
 * @param chunkSize size of each chunk, in bytes.                                                                                                                       
 */                                                                                                                                                                     
private[spark]                                                                                                                                                          
class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream {

random

提供各種random方法，沒細看

blesslyy

發佈了28 篇原創文章 · 獲贊 1 · 訪問量 3萬+

私信關注