useful SQL in Presto

expand one, two or multiple arrays into rows (multiple columns)

expand one array

Convert one array to row. E.g. Convert the following

speeds
[1,2,3,4]

to

speed
1
2
3
4

with the following SQL

WITH t AS
  (SELECT array[1,
                2,
                3,
                4] AS speeds)
SELECT speed
FROM t
CROSS JOIN unnest(speeds) AS t(speed)

expand multiple arrays

Convert arrays to rows. E.g. Convert the following

speeds costs
[1,2,3,4] [5,6,7,8]

to

speed cost
1 5
2 6
3 7
4 8

with the following SQL

WITH t AS
  (SELECT array[1,
                2,
                3,
                4] AS speeds, array[5,
                                    6,
                                    7,
                                    8] AS costs)
SELECT speed,
       cost
FROM t
CROSS JOIN unnest(speeds, costs) AS t(speed, cost)

Histogram with equal width

Given a raw table

val
1
6
2
11

and then makes a histogram with width as 5. val_interval: 0 means there are 2 values (which are 1, 2) locating at the interval [0,4]. val_interval: 5, means there is 1 value (which is 6) locating at the interval [5,9].

val_interval total_count
0 2
5 1
10 1

with the SQL

SELECT val / 5 * 5 AS val_interval,
       count(1) AS total_count
FROM t
GROUP BY val / 5 * 5
ORDER BY val / 5 * 5

Numeric Histogram in Presto

Given the input table

speeds
[1,2,3,4,5]

firstly, let’s use the SQL to convert it to table t

speed
1
2
5
8
9

with the SQL

SELECT numeric_histogram(2,speed)
   FROM t
   CROSS JOIN unnest(speeds) AS t(speed)

and then, let’s make a numeric histogram, which separates the table into 2 buckets, and we get the result

speed_interval total_count
2.67 3.00
8.50 2.00

with the SQL

WITH t AS
  (SELECT array[1,
                2,
                5,
                8,
                9] AS speeds)
SELECT speed_interval,
       total_count
FROM
  (SELECT numeric_histogram(2,speed)
   FROM t
   CROSS JOIN unnest(speeds) AS t(speed)) AS x(hist)
CROSS JOIN UNNEST(hist) AS t (speed_interval, total_count)

Zip, filter in Presto

SQL

SELECT filter(cast(zip(array[1,2,3,4], array[5,6,7,8]) AS array<Row(speeds bigint, costs bigint)>), x -> x.speeds >= 3) as filtered

and we get

filtered
[[3,7],[4,8]]
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章