Spark-utils 類
@(spark)[reading]
按字母序簡單描述
ActorLogReceive
記錄所有的Actor信息
/**
* A trait to enable logging all Akka actor messages. Here's an example of using this:
*
* {{{
* class BlockManagerMasterActor extends Actor with ActorLogReceive with Logging {
* ...
* override def receiveWithLogging = {
* case GetLocations(blockId) =>
* sender ! getLocations(blockId)
* ...
* }
* ...
* }
* }}}
*
*/
private[spark] trait ActorLogReceive {
AkkaUtils
和Akka相關的,主要是start akka service
AsynchronousListenerBus
異步的ListenerBus
BoundedPriorityQueue
/**
* Bounded priority queue. This class wraps the original PriorityQueue
* class and modifies it such that only the top K elements are retained.
* The top K elements are defined by an implicit Ordering[A].
*/
private[spark] class BoundedPriorityQueue[A](maxSize: Int)(implicit ord: Ordering[A])
ByteBufferInputStream
/**
* Reads data from a ByteBuffer, and optionally cleans it up using BlockManager.dispose()
* at the end of the stream (e.g. to close a memory-mapped file).
*/
private[spark]
class ByteBufferInputStream(private var buffer: ByteBuffer, dispose: Boolean = false)
Clock
/**
* An interface to represent clocks, so that they can be mocked out in unit tests.
*/
private[spark] trait Clock {
def getTimeMillis(): Long
def waitTillTime(targetTime: Long): Long
}
提供了一個Clock,在本文件中實現了SystemClock。
ClosureCleaner
和語言相關
CollectionsUtils
目前只有二分搜索
CompletionIterator
/**
* Wrapper around an iterator which calls a completion method after it successfully iterates
* through all the elements.
*/
private[spark]
// scalastyle:off
abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends Iterator[A] {
Distribution
/**
* Util for getting some stats from a small sample of numeric values, with some handy
* summary functions.
*
* Entirely in memory, not intended as a good way to compute stats over large data sets.
*
* Assumes you are giving it a non-empty set of data
*/
private[spark] class Distribution(val data: Array[Double], val startIdx: Int, val endIdx: Int) {
EventLoop
/**
* An event loop to receive events from the caller and process all events in the event thread. It
* will start an exclusive event thread to process all events.
*
* Note: The event queue will grow indefinitely. So subclasses should make sure `onReceive` can
* handle events in time to avoid the potential OOM.
*/
private[spark] abstract class EventLoop[E](name: String) extends Logging {
IdGenerator
/**
* A util used to get a unique generation ID. This is a wrapper around Java's
* AtomicInteger. An example usage is in BlockManager, where each BlockManager
* instance would start an Akka actor and we use this utility to assign the Akka
* actors unique names.
*/
private[spark] class IdGenerator {
IntParam
/**
* An extractor object for parsing strings into integers.
*/
private[spark] object IntParam {
爲啥每個project都自己搞個類似的東東
JsonProtocol
SparkListenerEvent 和 json之間的互相轉換,前者是一個trait,可以認爲是消息類型
ListenerBus
/**
* An event bus which posts events to its listeners.
*/
private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {
ManualClock
/**
* A `Clock` whose time can be manually set and modified. Its reported time does not change
* as time elapses, but only as its time is modified by callers. This is mainly useful for
* testing.
*
* @param time initial time (in milliseconds since the epoch)
*/
private[spark] class ManualClock(private var time: Long) extends Clock {
MemoryParam
/**
* An extractor object for parsing JVM memory strings, such as "10g", into an Int representing
* the number of megabytes. Supports the same formats as Utils.memoryStringToMb.
*/
private[spark] object MemoryParam {
MetadataCleaner
/**
* Runs a timer task to periodically clean up metadata (e.g. old files or hashtable entries)
*/
private[spark] class MetadataCleaner(
cleanerType: MetadataCleanerType.MetadataCleanerType,
cleanupFunc: (Long) => Unit,
conf: SparkConf)
MutablePair
是個pair
/**
* :: DeveloperApi ::
* A tuple of 2 elements. This can be used as an alternative to Scala's Tuple2 when we want to
* minimize object allocation.
*
* @param _1 Element 1 of this MutablePair
* @param _2 Element 2 of this MutablePair
*/
@DeveloperApi
case class MutablePair[@specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T1,
@specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T2]
MutableURLClassLoader
/**
* URL class loader that exposes the `addURL` and `getURLs` methods in URLClassLoader.
*/
private[spark] class MutableURLClassLoader(urls: Array[URL], parent: ClassLoader)
NextIterator
/** Provides a basic/boilerplate Iterator implementation. */
private[spark] abstract class NextIterator[U] extends Iterator[U] {
ParentClassLoader
/**
* A class loader which makes some protected methods in ClassLoader accesible.
*/
private[spark] class ParentClassLoader(parent: ClassLoader) extends ClassLoader(parent) {
SerializableBuffer
/**
* A wrapper around a java.nio.ByteBuffer that is serializable through Java serialization, to make
* it easier to pass ByteBuffers in case class messages.
*/
private[spark]
class SerializableBuffer(@transient var buffer: ByteBuffer) extends Serializable {
SizeEstimator
/**
* Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in
* memory-aware caches.
*
* Based on the following JavaWorld article:
* http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html
*/
private[spark] object SizeEstimator extends Logging {
SignalLogger
/**
* Used to log signals received. This can be very useful in debugging crashes or kills.
*
* Inspired by Colin Patrick McCabe's similar class from Hadoop.
*/
private[spark] object SignalLogger {
SparkExitCode
private[spark] object SparkExitCode {
/** The default uncaught exception handler was reached. */
val UNCAUGHT_EXCEPTION = 50
/** The default uncaught exception handler was called and an exception was encountered while
logging the exception. */
val UNCAUGHT_EXCEPTION_TWICE = 51
/** The default uncaught exception handler was reached, and the uncaught exception was an
OutOfMemoryError. */
val OOM = 52
}
StatCounter
/**
* A class for tracking the statistics of a set of numbers (count, mean and variance) in a
* numerically robust way. Includes support for merging two StatCounters. Based on Welford
* and Chan's [[http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance algorithms]]
* for running variance.
*
* @constructor Initialize the StatCounter with the given values.
*/
class StatCounter(values: TraversableOnce[Double]) extends Serializable {
TimeStampedHashMap
/**
* This is a custom implementation of scala.collection.mutable.Map which stores the insertion
* timestamp along with each key-value pair. If specified, the timestamp of each pair can be
* updated every time it is accessed. Key-value pairs whose timestamp are older than a particular
* threshold time can then be removed using the clearOldValues method. This is intended to
* be a drop-in replacement of scala.collection.mutable.HashMap.
*
* @param updateTimeStampOnGet Whether timestamp of a pair will be updated when it is accessed
*/
private[spark] class TimeStampedHashMap[A, B](updateTimeStampOnGet: Boolean = false)
Utils
Various utility methods used by Spark.
真的是大雜燴
logging
提供兩種方式fileappend和roll
io
只有一個類:ByteArrayChunkOutputStream
/**
* An OutputStream that writes to fixed-size chunks of byte arrays.
*
* @param chunkSize size of each chunk, in bytes.
*/
private[spark]
class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream {
random
提供各種random方法,沒細看