

簡單說這部分就是做optimizer的工作的,關於這部分是有一篇論文,寫的很清楚,可以當作high leve design來看。


1. parse(讓sql語句變成合法的語法樹)
2. resolve(驗證olumn,table之類的確實存在,並把table,column的scheme和具體的名字結合起來。
3. 生成具體logicplan,詳細的見talyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala,典型的比如filter,project,sort,union等等。
4. 這裏是一個基於規則的優化器,具體代碼在catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
1. 按道理來說,catalyst和Spark沒有必然的聯繫,可以看作一個SQL的optimizer。




 * ::DeveloperApi::                                                                                                                                                     
 * The data type for User Defined Types (UDTs).                                                                                                                         
 * This interface allows a user to make their own classes more interoperable with SparkSQL;                                                                             
 * e.g., by creating a [[UserDefinedType]] for a class X, it becomes possible to create                                                                                 
 * a `DataFrame` which has class X in the schema.                                                                                                                       
 * For SparkSQL to recognize UDTs, the UDT must be annotated with                                                                                                       
 * [[SQLUserDefinedType]].                                                                                                                                              
 * The conversion via `serialize` occurs when instantiating a `DataFrame` from another RDD.                                                                             
 * The conversion via `deserialize` occurs when reading from a `DataFrame`.                                                                                             
abstract class UserDefinedType[UserType] extends DataType with Serializable {          


class PointUDT extends UserDefinedType[Point] {
    def dataType = StructType(Seq( // Our native structure
        StructField("x", DoubleType),
        StructField("y", DoubleType)
    def serialize(p: Point) = Row(p.x, p.y)
    def deserialize(r: Row) =
    Point(r.getDouble(0), r.getDouble(1))



 * A mutable implementation of BigDecimal that can hold a Long if values are small enough.                                                                              
 * The semantics of the fields are as follows:                                                                                                                          
 * - _precision and _scale represent the SQL precision and scale we are looking for                                                                                     
 * - If decimalVal is set, it represents the whole decimal value                                                                                                        
 * - Otherwise, the decimal value is longVal / (10 ** _scale)                                                                                                           
final class Decimal extends Ordered[Decimal] with Serializable {  


 * :: DeveloperApi ::                                                                                                                                                   
 * Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean,                                                                      
 * Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and                                                                       
 * Array[Metadata]. JSON is used for serialization.                                                                                                                     
 * The default constructor is private. User should use either [[MetadataBuilder]] or                                                                                    
 * [[Metadata.fromJson()]] to create Metadata instances.                                                                                                                
 * @param map an immutable map that stores the data                                                                                                                     
sealed class Metadata private[types] (private[types] val map: Map[String, Any])                                                                                         
  extends Serializable {    


  1. 請仔細閱讀parser的document,尤其是那些operator
  2. 在正則表達式中:(?i) starts case-insensitive mode ,(?-i) turns off case-insensitive mode


The main data type in Catalyst is a tree composed of node objects. Each node has a node type and zero or more children. New node types are defined in Scala as subclasses of the TreeNode class. These objects are immutable and can be manipulated using functional transformations, as discussed in the next subsection.

abstract class TreeNode[BaseType <: TreeNode[BaseType]] {                                                                                                               
  self: BaseType with Product =>   




abstract class Expression extends TreeNode[Expression] {                                                                                                                
  self: Product =>                                                                                                                                                      

  /** The narrowest possible type that is produced when this expression is evaluated. */                                                                                
  type EvaluatedType <: Any                                                                                                                                             

   * Returns true when an expression is a candidate for static evaluation before the query is                                                                           
   * executed.                                                                                                                                                          
   * The following conditions are used to determine suitability for constant folding:                                                                                   
   *  - A [[Coalesce]] is foldable if all of its children are foldable                                                                                                  
   *  - A [[BinaryExpression]] is foldable if its both left and right child are foldable                                                                                
   *  - A [[Not]], [[IsNull]], or [[IsNotNull]] is foldable if its child is foldable                                                                                    
   *  - A [[Literal]] is foldable                                                                                                                                       
   *  - A [[Cast]] or [[UnaryMinus]] is foldable if its child is foldable                                                                                               
  def foldable: Boolean = false                                                                                                                                         
  def nullable: Boolean                                                                                                                                                 
  def references: AttributeSet = AttributeSet(children.flatMap(_.references.iterator))                                                                                  

  /** Returns the result of evaluating this expression on a given input Row */                                                                                          
  def eval(input: Row = null): EvaluatedType                                                                                                                            

    * Returns `true` if this expression and all its children have been resolved to a specific schema                                                                     
   * and `false` if it still contains any unresolved placeholders. Implementations of expressions                                                                       
   * should override this if the resolution of this type of expression involves more than just                                                                          
   * the resolution of its children.                                                                                                                                    
  lazy val resolved: Boolean = childrenResolved                                                                                                                         

   * Returns the [[DataType]] of the result of evaluating this expression.  It is                                                                                       
   * invalid to query the dataType of an unresolved expression (i.e., when `resolved` == false).                                                                        
  def dataType: DataType                                                                                                                                                

   * Returns true if  all the children of this expression have been resolved to a specific schema                                                                       
   * and false if any still contains any unresolved placeholders.                                                                                                       
  def childrenResolved: Boolean = !children.exists(!_.resolved)                                                                                                         

* Returns a string representation of this expression that does not have developer centric                                                                            
   * debugging information like the expression id.                                                                                                                      
  def prettyString: String = {                                                                                                                                          
    transform {                                                                                                                                                         
      case a: AttributeReference => PrettyAttribute(                                                                                                             
      case u: UnresolvedAttribute => PrettyAttribute(                                                                                                            




class SqlLexical extends StdLexical {


 * A very simple SQL parser.  Based loosely on:                                                                                                                         
 * Limitations:                                                                                                                                                         
 *  - Only supports a very limited subset of SQL.                                                                                                                       
 * This is currently included mostly for illustrative purposes.  Users wanting more complete support                                                                    
 * for a SQL like language should checkout the HiveQL support in the sql/hive sub-project.                                                                              
class SqlParser extends AbstractSparkSQLParser with DataTypeParser {   




abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {   



sealed abstract class JoinType                                                                                                                                          

case object Inner extends JoinType                                                                                                                                      

case object LeftOuter extends JoinType                                                                                                                                  

case object RightOuter extends JoinType                                                                                                                                 

case object FullOuter extends JoinType                                                                                                                                  

case object LeftSemi extends JoinType  


還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.