元數據比對-altas vs amundsen vs TDH-catalog(一)


一、 Altas
屬於apache開源的元數據管理系統,可以對接hive、storm、kafka、hbase、sqoop等組件完成元數據管理以及數據的血緣關係。

系統架構圖:

clip_image002

MetaSource Sources:目前,Atlas支持從以下來源提取和管理元數據:Hbase、Hive、Sqoop、Storm、Kafka

Messaging:除了API之外,用戶還可以選擇使用基於Kafka的消息傳遞接口與Atlas集成

採集/導出(Ingest/Export):採集組件允許將元數據添加到Atlas。同樣,“導出”組件將Atlas檢測到的元數據更改公開爲事件。

類型系統(Type System):用戶爲他們想要管理的元數據對象定義模型。Type System稱爲“實體”的“類型”實例,表示受管理的實際元數據對象。

圖形引擎(Graph Engine):Atlas 通過使用圖形模型管理元數據對象。

Titan:目前,Atlas 使用 Titan 圖數據庫來存儲元數據對象

Metadata Store<Hbase>:採用Hbase來存儲元數據

IndexStore<Solr>:採用Solr來建索引

API:Atlas的所有功能都可以通過REST API提供給最終用戶,允許創建、更新和刪除類型和實體。它也是查詢和發現通過Atlas管理的類型和實體的主要方法。

Atlas Admin UI:該組件是一個基於Web的應用程序,允許數據管理員和科學家發現和註釋元數據。Admin UI提供了搜索界面和類SQL的查詢語言,可以用來查詢由Atlas管理的元數據類型和對象。

Tag Based Policies:權限管理模塊。

Business Taxonomy:業務分

l github地址

https://github.com/apache/atlas

l 安裝文檔幫助

https://atlas.apache.org/#/Installation

l 配置連接數據源

Metadata sources

Atlas supports integration with many sources of metadata out of the box. More integrations will be added in future as well. Currently, Atlas supports ingesting and managing metadata from the following sources:

功能詳情:

1. 根據表名稱搜索;

2. 給表或者文件加標籤或者分類

3. 根據分類或者標籤搜索

clip_image004

代碼研究

clip_image006

通過代碼包的名稱很容易理解各個包的作用,我們主要說明下altas如何解析sql找到關係。

Altas用到了antlr4解析sql,具體antlr4解析sql的用法大家可以自行搜索。下面給出關鍵的語法樹解析規則。

clip_image008

文件AtlasDSLParser.g4

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

*

* http://www.apache.org/licenses/LICENSE-2.0

*

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

parser grammar AtlasDSLParser;

options { tokenVocab=AtlasDSLLexer; }

// Core rules

identifier: ID ;

operator: (K_LT | K_LTE | K_EQ | K_NEQ | K_GT | K_GTE | K_LIKE) ;

sortOrder: K_ASC | K_DESC ;

valueArray: K_LBRACKET ID (K_COMMA ID)* K_RBRACKET ;

literal: BOOL | NUMBER | FLOATING_NUMBER | (ID | valueArray) ;

// Composite rules

limitClause: K_LIMIT NUMBER ;

offsetClause: K_OFFSET NUMBER ;

atomE: (identifier | literal) | K_LPAREN expr K_RPAREN ;

multiERight: (K_STAR | K_DIV) atomE ;

multiE: atomE multiERight* ;

arithERight: (K_PLUS | K_MINUS) multiE ;

arithE: multiE arithERight* ;

comparisonClause: arithE operator arithE ;

isClause: arithE (K_ISA | K_IS) identifier ;

hasClause: arithE K_HAS identifier ;

countClause: K_COUNT K_LPAREN K_RPAREN ;

maxClause: K_MAX K_LPAREN expr K_RPAREN ;

minClause: K_MIN K_LPAREN expr K_RPAREN ;

sumClause: K_SUM K_LPAREN expr K_RPAREN ;

exprRight: (K_AND | K_OR) compE ;

compE: comparisonClause

| isClause

| hasClause

| arithE

| countClause

| maxClause

| minClause

| sumClause

;

expr: compE exprRight* ;

limitOffset: limitClause offsetClause? ;

selectExpression: expr (K_AS identifier)? ;

selectExpr: selectExpression (K_COMMA selectExpression)* ;

aliasExpr: (identifier | literal) K_AS identifier ;

orderByExpr: K_ORDERBY expr sortOrder? ;

fromSrc: aliasExpr | (identifier | literal) ;

whereClause: K_WHERE expr ;

fromExpression: fromSrc whereClause? ;

fromClause: K_FROM fromExpression ;

selectClause: K_SELECT selectExpr ;

singleQrySrc: fromClause | whereClause | fromExpression | expr ;

groupByExpression: K_GROUPBY K_LPAREN selectExpr K_RPAREN ;

commaDelimitedQueries: singleQrySrc (K_COMMA singleQrySrc)* ;

spaceDelimitedQueries: singleQrySrc singleQrySrc* ;

querySrc: commaDelimitedQueries | spaceDelimitedQueries ;

query: querySrc groupByExpression?

selectClause?

orderByExpr?

limitOffset? EOF;

文件AtlasDSLLexer.g4

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

*

* http://www.apache.org/licenses/LICENSE-2.0

*

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

lexer grammar AtlasDSLLexer;

fragment A: ('A'|'a');

fragment B: ('B'|'b');

fragment C: ('C'|'c');

fragment D: ('D'|'d');

fragment E: ('E'|'e');

fragment F: ('F'|'f');

fragment G: ('G'|'g');

fragment H: ('H'|'h');

fragment I: ('I'|'i');

fragment J: ('J'|'j');

fragment K: ('K'|'k');

fragment L: ('L'|'l');

fragment M: ('M'|'m');

fragment N: ('N'|'n');

fragment O: ('O'|'o');

fragment P: ('P'|'p');

fragment Q: ('Q'|'q');

fragment R: ('R'|'r');

fragment S: ('S'|'s');

fragment T: ('T'|'t');

fragment U: ('U'|'u');

fragment V: ('V'|'v');

fragment W: ('W'|'w');

fragment X: ('X'|'x');

fragment Y: ('Y'|'y');

fragment Z: ('Z'|'z');

fragment DIGIT: [0-9];

fragment LETTER: 'a'..'z'| 'A'..'Z' | '_';

// Comment skipping

SINGLE_LINE_COMMENT: '--' ~[\r\n]* -> channel(HIDDEN) ;

MULTILINE_COMMENT : '/*' .*? ( '*/' | EOF ) -> channel(HIDDEN) ;

WS: (' ' ' '* | [ \n\t\r]+) -> channel(HIDDEN) ;

// Lexer rules

NUMBER: (K_PLUS | K_MINUS)? DIGIT DIGIT* (E (K_PLUS | K_MINUS)? DIGIT DIGIT*)? ;

FLOATING_NUMBER: (K_PLUS | K_MINUS)? DIGIT+ K_DOT DIGIT+ (E (K_PLUS | K_MINUS)? DIGIT DIGIT*)? ;

BOOL: K_TRUE | K_FALSE ;

K_COMMA: ',' ;

K_PLUS: '+' ;

K_MINUS: '-' ;

K_STAR: '*' ;

K_DIV: '/' ;

K_DOT: '.' ;

K_LIKE: L I K E ;

K_AND: A N D ;

K_OR: O R ;

K_LPAREN: '(' ;

K_LBRACKET: '[' ;

K_RPAREN: ')' ;

K_RBRACKET: ']' ;

K_LT: '<' | L T ;

K_LTE: '<=' | L T E ;

K_EQ: '=' | E Q ;

K_NEQ: '!=' | N E Q ;

K_GT: '>' | G T ;

K_GTE: '>=' | G T E ;

K_FROM: F R O M ;

K_WHERE: W H E R E ;

K_ORDERBY: O R D E R B Y ;

K_GROUPBY: G R O U P B Y ;

K_LIMIT: L I M I T ;

K_SELECT: S E L E C T ;

K_MAX: M A X ;

K_MIN: M I N ;

K_SUM: S U M ;

K_COUNT: C O U N T ;

K_OFFSET: O F F S E T ;

K_AS: A S ;

K_ISA: I S A ;

K_IS: I S ;

K_HAS: H A S ;

K_ASC: A S C ;

K_DESC: D E S C ;

K_TRUE: T R U E ;

K_FALSE: F A L S E ;

KEYWORD: K_LIKE

| K_DOT

| K_SELECT

| K_AS

| K_HAS

| K_IS

| K_ISA

| K_WHERE

| K_LIMIT

| K_TRUE

| K_FALSE

| K_AND

| K_OR

| K_GROUPBY

| K_ORDERBY

| K_SUM

| K_MIN

| K_MAX

| K_OFFSET

| K_FROM

| K_DESC

| K_ASC

| K_COUNT

;

ID: STRING

|LETTER (LETTER|DIGIT)*

| LETTER (LETTER|DIGIT)* KEYWORD KEYWORD*

| KEYWORD KEYWORD* LETTER (LETTER|DIGIT)*

| LETTER (LETTER|DIGIT)* KEYWORD KEYWORD* LETTER (LETTER|DIGIT)*

;

STRING: '"' ~('"')* '"' | '\'' ~('\'')* '\'' | '`' ~('`')* '`';

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章