目錄
其他窗口函數可翻看:
窗口函數之(sum、avg、max、min)
窗口函數之(row_number, rank, dense_rank)
1.樣例數據
id crtime pv
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7
2.ntile(n)
ntile(n)用於將分組數據進行切片,n代表切成多少片。相當於把數據分成幾等份,如果不能均勻等份,則多出來的從第一片開始加。
比如多出來1份,則加給第一片。
比如多出來2份,則分別加給第一片和第二片。
2.1實例
select id,crtime,pv,
ntile(2) over(partition by id order by crtime) n2, --分2片
ntile(3) over(partition by id order by crtime) n3, --分3片
ntile(4) over(partition by id order by crtime) n4, --分4片
ntile(5) over(partition by id order by crtime) n5 --分5片
from nt;
->
id crtime pv n2 n3 n4 n5
cookie1 2015-04-10 1 1 1 1 1
cookie1 2015-04-11 5 1 1 1 1
cookie1 2015-04-12 7 1 1 2 2
cookie1 2015-04-13 3 1 2 2 2
cookie1 2015-04-14 2 2 2 3 3
cookie1 2015-04-15 4 2 3 3 4
cookie1 2015-04-16 4 2 3 4 5
cookie2 2015-04-10 2 1 1 1 1
cookie2 2015-04-11 3 1 1 1 1
cookie2 2015-04-12 5 1 1 2 2
cookie2 2015-04-13 6 1 2 2 2
cookie2 2015-04-14 3 2 2 3 3
cookie2 2015-04-15 9 2 3 3 4
cookie2 2015-04-16 7 2 3 4 5
可以看到,cookie1有7條數據,當將分組數據分成2片時,7/2餘數爲1份,加到第1片中,所以有4個1,3個2;
當將分組數據分成3片時,7/3餘數爲1份,加到第1片中,所以有3個1,2個2,2個3;
當將分組數據分成4片時,7/4餘數爲3份,分別加到第1,2,3片中,所以有2個1,2個2,2個3,1個4;
當將分組數據分成5片時,7/5餘數爲2份,分別加到第1,2片中,所以有2個1,2個2,1個3,1個4,1個5。
需求:統計cookie前1/3天的pv數有多少?
思路:前1/3天,可以使用ntile(3)分成三片,取ntile值爲1的pv進行sum。
select t.id,sum(t.pv) spv from
(select id,crtime,pv,ntile(3) over(partition by id order by crtime) nt3 from nt) t
where t.nt3 = 1
group by t.id;
->
id spv
cookie1 13
cookie2 10
3.lag、lead、first_value、last_value
這幾個函數經常用於時間序列,但是不支持rows between(window子句)。
lag(col,n,default):統計窗口內往上數第n行的值。
- col:列名,n:往上數第n行,不寫默認是1,default:往上第n行爲null時取該默認值,不寫爲null。
lead(col,n,default):統計窗口內往下數第n行的值。
- col:列名,n:往下數第n行,不寫默認是1,default:往下第n行爲null時取該默認值,不寫爲null。
first_value(col):求分組排序後截止到當前行的第一個值。
last_value(col):求分組排序後截止到當前行的最後一個值
3.1實例
select *,
lag(crtime,1,'a') over(partition by id order by crtime) lagc,
lead(crtime,2,'b') over(partition by id order by crtime) leadc,
first_value(pv) over(partition by id order by crtime) fpv,
last_value(pv) over(partition by id order by crtime) lpv
from nt;
->
id crtime pv lagc leadc fpv lpv
cookie1 2015-04-10 1 a 2015-04-12 1 1
cookie1 2015-04-11 5 2015-04-10 2015-04-13 1 5
cookie1 2015-04-12 7 2015-04-11 2015-04-14 1 7
cookie1 2015-04-13 3 2015-04-12 2015-04-15 1 3
cookie1 2015-04-14 2 2015-04-13 2015-04-16 1 2
cookie1 2015-04-15 4 2015-04-14 b 1 4
cookie1 2015-04-16 4 2015-04-15 b 1 4
cookie2 2015-04-10 2 a 2015-04-12 2 2
cookie2 2015-04-11 3 2015-04-10 2015-04-13 2 3
cookie2 2015-04-12 5 2015-04-11 2015-04-14 2 5
cookie2 2015-04-13 6 2015-04-12 2015-04-15 2 6
cookie2 2015-04-14 3 2015-04-13 2015-04-16 2 3
cookie2 2015-04-15 9 2015-04-14 b 2 9
cookie2 2015-04-16 7 2015-04-15 b 2 7
3.1.1問題1:如果想取分組後pv最後一個值
select *,
first_value(pv) over(partition by id order by crtime desc) newpv
from nt;
->
id crtime pv newpv
cookie1 2015-04-16 4 4
cookie1 2015-04-15 4 4
cookie1 2015-04-14 2 4
cookie1 2015-04-13 3 4
cookie1 2015-04-12 7 4
cookie1 2015-04-11 5 4
cookie1 2015-04-10 1 4
cookie2 2015-04-16 7 7
cookie2 2015-04-15 9 7
cookie2 2015-04-14 3 7
cookie2 2015-04-13 6 7
cookie2 2015-04-12 5 7
cookie2 2015-04-11 3 7
cookie2 2015-04-10 2 7
但是此時的crtime是倒序的,如果想升序排序,則需要加order by id,crtime
select *,
first_value(pv) over(partition by id order by crtime desc) newpv
from nt
order by id,crtime;
->
id crtime pv newpv
cookie1 2015-04-10 1 4
cookie1 2015-04-11 5 4
cookie1 2015-04-12 7 4
cookie1 2015-04-13 3 4
cookie1 2015-04-14 2 4
cookie1 2015-04-15 4 4
cookie1 2015-04-16 4 4
cookie2 2015-04-10 2 7
cookie2 2015-04-11 3 7
cookie2 2015-04-12 5 7
cookie2 2015-04-13 6 7
cookie2 2015-04-14 3 7
cookie2 2015-04-15 9 7
cookie2 2015-04-16 7 7
3.1.2問題2:如果不排序會怎樣?
不排序則crtime既不是升序也不是降序
select *,
lag(pv) over(partition by id) lagc, - 默認取前1行的值,前1行沒有值默認爲null
lead(pv) over(partition by id) leadc - 默認取下1行的值,下1行沒有值默認爲null
from nt;
->
id crtime pv lagc leadc
cookie1 2015-04-10 1 NULL 4
cookie1 2015-04-16 4 1 4
cookie1 2015-04-15 4 4 2
cookie1 2015-04-14 2 4 3
cookie1 2015-04-13 3 2 7
cookie1 2015-04-12 7 3 5
cookie1 2015-04-11 5 7 NULL
cookie2 2015-04-16 7 NULL 9
cookie2 2015-04-15 9 7 3
cookie2 2015-04-14 3 9 6
cookie2 2015-04-13 6 3 5
cookie2 2015-04-12 5 6 3
cookie2 2015-04-11 3 5 2
cookie2 2015-04-10 2 3 NULL
select *,
first_value(pv) over(partition by id) fpv, -取分組的第一個值
last_value(pv) over(partition by id) lpv -取分組的最後一個值
from nt;
->
id crtime pv fpv lpv
cookie1 2015-04-10 1 1 5
cookie1 2015-04-16 4 1 5
cookie1 2015-04-15 4 1 5
cookie1 2015-04-14 2 1 5
cookie1 2015-04-13 3 1 5
cookie1 2015-04-12 7 1 5
cookie1 2015-04-11 5 1 5
cookie2 2015-04-16 7 7 2
cookie2 2015-04-15 9 7 2
cookie2 2015-04-14 3 7 2
cookie2 2015-04-13 6 7 2
cookie2 2015-04-12 5 7 2
cookie2 2015-04-11 3 7 2
cookie2 2015-04-10 2 7 2