最近用presto引擎查數據,發現了語法和MYSQL,PG的稍許區別,寫此文章留念~~
文章目錄
- 1 數據類型
- 2 SELECT 搜索查詢
- 2.1 with 子句
- 2.2 GROUP BY 子句
- 2.2.1 GROUP BY
- 2.2.2 GROUPING SETS
- 2.2.3 CUBE
- 2.2.4 ROLLUP
- 2.2.5 group by, clue, rollup區別
- 2.2.6 group sets, clue, rollup 組合使用
- 2.3 HAVING
- 2.4 UNION,INTERSECT, EXCEPT
- 2.5 ORDER BY 排序
- 2.6 LIMIT
- 3 其他常用SQL
presto是Facebook主持下運營的開源的分佈式SQL查詢引擎,用於針對各種大小(從千兆字節到千兆字節)的數據源運行交互式分析查詢。本文主要介紹常用SQL,具體可參考官方文檔:https://prestodb.github.io/docs/current/
1 數據類型
- Boolean: true, false
- Integer: tinyint, smallint, integer, bigint
- Floating-Point: real, double
- Fixed-Precision:DECIMAL
- String:varchar, char, varbinary, json
- Date and Time: date, time, time with time zone, timestamp, timestamp with time zone, interval year to month, interval day to second
- Structural: array, map, row
- Network Address: ipaddress
- HyperLogLog: HyperLogLog, P4HyperLogLog
- Quantile Digest: QDigest
2 SELECT 搜索查詢
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
以下是這些參數可能的格式
- from_item
table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
- join_type
[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
FULL [ OUTER ] JOIN
CROSS JOIN
- grouping_element
()
expression
GROUPING SETS ( ( column [, ...] ) [, ...] )
CUBE ( column [, ...] )
ROLLUP ( column [, ...] )
2.1 with 子句
with 定義要在查詢中使用的命名關係
WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
SELECT a, b FROM x;
等同於
SELECT a, b
FROM (
SELECT a, MAX(b) AS b FROM t GROUP BY a
) AS x;
也可以用於多條定義
WITH
t1 AS (SELECT a, MAX(b) AS b FROM x GROUP BY a),
t2 AS (SELECT a, AVG(d) AS d FROM y GROUP BY a)
SELECT t1.*, t2.*
FROM t1
JOIN t2 ON t1.a = t2.a;
也可以鏈式使用
WITH
x AS (SELECT a FROM t),
y AS (SELECT a AS b FROM x),
z AS (SELECT b AS c FROM y)
SELECT c FROM z;
2.2 GROUP BY 子句
2.2.1 GROUP BY
當在select語句中使用group by時,所有輸出表達式都必須是聚合函數或group by子句中存在的列。
按字段nationkey分組,並查出各組數量,以下兩種寫法是一致的,by 2 代表以輸出表達式第2列做分組
SELECT count(*), nationkey FROM customer GROUP BY 2;
SELECT count(*), nationkey FROM customer GROUP BY nationkey;
也可以不輸出指定分組的列,如下
SELECT count(*) FROM customer GROUP BY mktsegment;
2.2.2 GROUPING SETS
可以指定多個列進行分組,結果列中不屬於分組列的將被設置爲NUll。
具有複雜分組語法(GROUPING SETS, CUBE 或 ROLLUP)的查詢只從基礎數據源讀取一次,而使用UNION ALL的查詢將讀取基礎數據三次。這就是當數據源不具有確定性時,使用UNION ALL的查詢可能會產生不一致的結果的原因。
有一個表:SELECT * FROM shipping;
origin_state | origin_zip | destination_state | destination_zip | package_weight
--------------+------------+-------------------+-----------------+----------------
California | 94131 | New Jersey | 8648 | 13
California | 94131 | New Jersey | 8540 | 42
New Jersey | 7081 | Connecticut | 6708 | 225
California | 90210 | Connecticut | 6927 | 1337
California | 94131 | Colorado | 80302 | 5
New York | 10002 | New Jersey | 8540 | 3
(6 rows)
SELECT origin_state, origin_zip, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state),
(origin_state, origin_zip),
(destination_state));
這個的查詢在邏輯上等同於多個分組查詢的union all:
SELECT origin_state, NULL, NULL, sum(package_weight)
FROM shipping GROUP BY origin_state
UNION ALL
SELECT origin_state, origin_zip, NULL, sum(package_weight)
FROM shipping GROUP BY origin_state, origin_zip
UNION ALL
SELECT NULL, NULL, destination_state, sum(package_weight)
FROM shipping GROUP BY destination_state;
結果如下:
origin_state | origin_zip | destination_state | _col0
--------------+------------+-------------------+-------
New Jersey | NULL | NULL | 225
California | NULL | NULL | 1397
New York | NULL | NULL | 3
California | 90210 | NULL | 1337
California | 94131 | NULL | 60
New Jersey | 7081 | NULL | 225
New York | 10002 | NULL | 3
NULL | NULL | Colorado | 5
NULL | NULL | New Jersey | 58
NULL | NULL | Connecticut | 1562
(10 rows)
2.2.3 CUBE
爲給定的列生成所有可能的分組,比如 (origin_state, destination_state) 的可能分組爲(origin_state, destination_state),
(origin_state),
(destination_state),
()
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);
等同於
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state, destination_state),
(origin_state),
(destination_state),
());
2.2.4 ROLLUP
爲給定的列集生成部分可能的分類彙總
SELECT origin_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY ROLLUP (origin_state, origin_zip);
等同於
SELECT origin_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS ((origin_state, origin_zip), (origin_state), ());
2.2.5 group by, clue, rollup區別
比如按字段1,2,3來分組,group 只會聚合1,2,3分組,clue會展示所有層級分組,rollup只會展示1以下所有分組
用列表標識會更直觀
group by
1 | 2 | 3 |
---|
clue
1 | 2 | 3 |
---|---|---|
1 | 2 | |
1 | ||
2 | 3 | |
2 | ||
3 | ||
rollup
1 | 2 | 3 |
---|---|---|
1 | 2 | |
1 | ||
2.2.6 group sets, clue, rollup 組合使用
SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY
GROUPING SETS ((origin_state, destination_state)),
ROLLUP (origin_zip);
等同於
SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY
GROUPING SETS ((origin_state, destination_state)),
GROUPING SETS ((origin_zip), ());
邏輯上等同於
SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state, destination_state, origin_zip),
(origin_state, destination_state));
2.3 HAVING
HAVING與聚合函數和GROUP BY一起使用,以過濾GROUP BY。
從customer表中選擇帳戶餘額大於指定值的組
SELECT count(*), mktsegment, nationkey,
CAST(sum(acctbal) AS bigint) AS totalbal
FROM customer
GROUP BY mktsegment, nationkey
HAVING sum(acctbal) > 5700000
ORDER BY totalbal DESC;
2.4 UNION,INTERSECT, EXCEPT
query UNION [ALL | DISTINCT] query
query INTERSECT [DISTINCT] query
query EXCEPT [DISTINCT] query
- all: 最終結果集中包括所有行
- distinct: 組合結果集中只包含唯一的行
- 如果兩者都未指定,則行爲默認爲distinct。
區別
- intersect或except不支持all參數。
- 除非通過括號明確指定順序,否則將從左到右處理多個集合操作
- INTERSECT 優先級高於UNION和EXCEPT
比如:
A UNION B INTERSECT C EXCEPT D
等同於
A UNION (B INTERSECT C) EXCEPT D
2.4.1 UNION
以下結果返回13和42
SELECT 13 UNION SELECT 42;
以下結果返回13和42
SELECT 13 UNION SELECT * FROM (VALUES 42, 13);
以下結果返回13,42 和 13
SELECT 13 UNION ALL SELECT * FROM (VALUES 42, 13);
2.4.2 INTERSECT
使用INTERSECT代表返回的最終結果集爲:INTERSECT之前的結果與INTERSECT查出的結果取交集
比如以下結果返回 13:
SELECT * FROM (VALUES 13, 42)
INTERSECT
SELECT 13;
2.4.3 EXCEPT
使用EXCEPT代表返回的最終結果集中排除EXCEPT查出的結果
比如以下結果返回 42:
SELECT * FROM (VALUES 13, 42)
EXCEPT
SELECT 13;
2.5 ORDER BY 排序
一般放到SELECT語句的最後,或在HAVING之前, 默認ASC NULLS LAST,
ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...]
- ASC: 默認從小到大正序排序
- DESC: 倒序
- NULLS FIRST: NULL 值最大
- NULLS LAST: 默認 NULL 值最小
2.6 LIMIT
limit 5 代表只輸出5條結果,limit all 代表全部輸出,沒有數量限制
3 其他常用SQL
3.1 SCHEMA 操作
3.1.1 重命名 SCHEMA
ALTER SCHEMA name RENAME TO new_name
eg: 將 web 重命名爲 traffic
ALTER SCHEMA web RENAME TO traffic
3.1.2 創建 SCHEMA
CREATE SCHEMA [ IF NOT EXISTS ] schema_name
[ WITH ( property_name = expression [, ...] ) ]
- IF NOT EXISTS 比較安全,防止SCHEMA已存在的報錯
- WITH 可以給SCHEMA添加屬性,通過以下SQL可以查看所有屬性:
SELECT * FROM system.metadata.schema_properties
eg:
創建一個名爲web的SCHEMA
CREATE SCHEMA web
創建一個在hive目錄下名爲sales的SCHEMA
CREATE SCHEMA hive.sales
如果名爲traffic的SCHEMA不存在那麼創建它
CREATE SCHEMA IF NOT EXISTS traffic
3.1.2 刪除 SCHEMA
DROP SCHEMA [ IF EXISTS ] schema_name
- IF EXISTS 可以防止SCHEMA不存在時的報錯
eg:
刪除名爲web的SCHEMA
DROP SCHEMA web
如果名爲web的SCHEMA存在,則刪除它
DROP SCHEMA IF EXISTS sales
3.2 TABLE 操作
3.2.1 創建 TABLE
CREATE TABLE [ IF NOT EXISTS ]
table_name (
{ column_name data_type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ]
| LIKE existing_table_name [ { INCLUDING | EXCLUDING } PROPERTIES ] }
[, ...]
)
[ COMMENT table_comment ]
[ WITH ( property_name = expression [, ...] ) ]
- IF NOT EXISTS 比較安全,防止TABLE已存在的報錯
- WITH 可以給TABLE添加屬性:
通過以下SQL可以查看所有表屬性
SELECT * FROM system.metadata.table_properties
通過以下SQL可以查看所有列屬性
SELECT * FROM system.metadata.column_properties
- COMMENT 爲表添加註釋
- LIKE 可用於在新表中包含來自現有表的所有列。可以指定多個LIKE子句,允許從多個表複製列。
eg:
創建一個名爲orders的表, 並添加表註釋
CREATE TABLE orders (
orderkey bigint,
orderstatus varchar,
totalprice double,
orderdate date
)
COMMENT 'A table to keep track of orders.'
WITH (format = 'ORC')
創建一個名爲bigger_orders的表,幷包含orders表的所有字段
CREATE TABLE bigger_orders (
another_orderkey bigint,
LIKE orders,
another_orderdate date
)
3.2.2 查看建表語句
SHOW CREATE TABLE table_name
3.2.3 修改 TABLE
重命名
ALTER TABLE name RENAME TO new_name
添加字段
ALTER TABLE name ADD COLUMN column_name data_type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ]
刪除字段
ALTER TABLE name DROP COLUMN column_name
重命名字段
ALTER TABLE name RENAME COLUMN column_name TO new_column_name
3.2.4 刪除 TABLE
DROP TABLE [ IF EXISTS ] table_name
- IF EXISTS 可以防止TABLE不存在時的報錯
eg:
刪除名爲web的TABLE
DROP TABLE web
如果名爲web的TABLE存在,則刪除它
DROP TABLE IF EXISTS sales
3.2.5 CREATE TABLE AS 使用搜索結果建新表
CREATE TABLE [ IF NOT EXISTS ] table_name [ ( column_alias, ... ) ]
[ COMMENT table_comment ]
[ WITH ( property_name = expression [, ...] ) ]
AS query
[ WITH [ NO ] DATA ]
eg:
創建一個新表orders_column_aliased,字段order_date, total_price分別來自於表orders的字段orderdate, totalprice
CREATE TABLE orders_column_aliased (order_date, total_price)
AS
SELECT orderdate, totalprice
FROM orders
3.3 ANALYZE 統計表和列信息
統計表和列信息,目前該語句只支持Hive connector。
ANALYZE table_name [ WITH ( property_name = expression [, ...] ) ]
- WITH 可以給查詢添加特定屬性:
通過以下SQL可以查看所有可以使用的屬性
SELECT * FROM system.metadata.analyze_properties
3.4 CALL 調用存儲過程
調用存儲過程,有些連接器,比如 PostgreSQL Connector,有自己的存儲過程,不能通過call調用
CALL procedure_name ( [ name => ] expression [, ...] )
eg:
傳入必傳參數,調用存儲過程
CALL test(123, 'apple');
傳入命名參數,調用存儲過程
CALL test(name => 'apple', id => 123);
不需要傳參,調用存儲過程
CALL catalog.schema.test();
3.5 START TRANSACTION,ROLLBACK,COMMIT 事務
開啓事務 (默認爲READ WRITE讀寫事務)
START TRANSACTION [ mode [, ...] ]
回滾事務
ROLLBACK [ WORK ]
提交事務
COMMIT [ WORK ]
- model 是以下的一種:
ISOLATION LEVEL { READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE }
READ { ONLY | WRITE }
eg:
開始一個事務,默認爲READ WRITE讀寫事務
START TRANSACTION;
開始一個可重複讀事務
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
開始一個讀寫事務
START TRANSACTION READ WRITE;
開始一個提交讀/不可重複讀、只讀事務
START TRANSACTION ISOLATION LEVEL READ COMMITTED, READ ONLY;
開始一個讀寫串行化事務
START TRANSACTION READ WRITE, ISOLATION LEVEL SERIALIZABLE;
3.6 DELETE 刪除
有的連接器對於刪除有限制或者是不支持, 需要看具體的連接器文檔
DELETE FROM table_name [ WHERE condition ]
eg:
刪除lineitem表裏的shipmode = 'AIR'的行
DELETE FROM lineitem WHERE shipmode = 'AIR';
刪除所有orders裏的數據
DELETE FROM orders;
3.8 PREPARE, EXECUTE, DEALLOCATE PREPARE
聲明一個名爲statement_name的SQL
PREPARE statement_name FROM statement
執行名爲statement_name的聲明
EXECUTE statement_name [ USING parameter1 [ , parameter2, ... ] ]
刪除名爲statement_name的聲明
DEALLOCATE PREPARE statement_name
eg:
準備一條sql語句
PREPARE my_select2 FROM
SELECT name FROM nation WHERE regionkey = ? and nationkey < ?;
執行這個語句,並加入?的參數
EXECUTE my_select2 USING 1, 3;
以上兩句相當於執行下面這條SQL:
SELECT name FROM nation WHERE regionkey = 1 AND nationkey < 3;
3.9 INSERT 插入數據
INSERT INTO table_name [ ( column [, ... ] ) ] query
eg:
往orders表裏插入數據,數據全部來源於new_orders表
INSERT INTO orders SELECT * FROM new_orders;
往cities表裏插入一條數據
INSERT INTO cities VALUES (1, 'San Francisco');
往cities表裏插入多條數據
INSERT INTO cities VALUES (2, 'San Jose'), (3, 'Oakland');
指定字段名往nation表裏插入多條數據,如果有字段未指定,則用字段默認值, 沒有默認值就是null
INSERT INTO nation (nationkey, name, regionkey, comment)
VALUES (26, 'POLAND', 3, 'no comment');
3.10 查看數據倉庫目錄,SCHEMA, TABLE,COLUMN
查看數據倉庫第一層目錄
SHOW CATALOGS [ LIKE pattern ]
查看所有SCHEMAS
SHOW SCHEMAS [ FROM catalog ] [ LIKE pattern ]
查看schema裏的表
SHOW TABLES [ FROM schema ] [ LIKE pattern ]
查看錶的字段類型,描述, 搜索出來的結果有:column type extra comment
DESCRIBE table_name
相當於 SHOW COLUMNS from table_name