Oracle 11g學習筆記--分析函數

Oracle 11g學習筆記–分析函數

示例表:
這裏寫圖片描述

評級函數

rank()/dense_rank()
返回數據項在分組中的排名,前者在排名相等的情況下,會留下空位,後者不會

select emp_id, sum(amount),
rank() over (order by sum(amount) desc [{nulls last]|nulls first}]) as rank,
dense_rank() over (order by sum(amount) desc) as dense_rank
from all_sales
group by emp_id;

這裏寫圖片描述

該句中的desc的含義就是降序,可換爲asc進行升序排名;
[{nulls last]|nulls first}]指明瞭,將空結果排列在第一名,還是最排在最後一名;默認的情況下,系統會視空數據爲最低,再根據升序或者降序選擇顯示位置;

cume_dist和percent_rank函數
cume_dist可以計算某個特定值相對於一組值中的位置;
percent_rank可計算某個值相對於一組值的百分比排名;

select 
    prd_type_id, sum(amount),
    cume_dist() over (order by sum(amount) desc) as cume_dist,
    percent_rank() over (order by sum(amount) desc) as percent_rank 
    from all_sales
    where year = 2003
    group by prd_type_id
    order by sum(amount) desc;

這裏寫圖片描述
從中我們便容易便可以看出兩個函數的作用了
ntile(buket)函數
可以計算n分片的值,buket指定了分片的片數,記錄將被分組爲buket個片。

select prd_type_id, sum(amount),
    ntile(3) over(order by sum(amount) desc) as ntile
from all_sales
where year = 2003
and amount is not null
group by prd_type_id
order by sum(amount) desc;

這裏寫圖片描述
可以看出片規定了最低的名次,多出的名次均是第一;

row_number
row_number()從1開始,爲每一條分組記錄返回一個數字。

select 
    prd_type_id, sum(amount),
    dense_rank() over (order by sum(amount) desc nulls last) as dense_rank,
    row_number() over (order by sum(amount) desc nulls last) as row_number
from all_sales
where year = 2003
group by prd_type_id 
order by sum(amount) desc;

這裏寫圖片描述
通過該圖初步判定(猜測)row_number()的作用,和dense_rank作用一樣;只不過在排名相等的情況下,不可能出現並排;

反百分點函數

反百分點函數有兩個:percentile_disc(x)和percentile_cont(x)
他們的作用於cume_dist()和percent_rank()相反。percentile_disc(x)在每一個分組中檢查累積分佈的數值,直到大於或者等於x的值。
percentile_cont(x)在每一個分組中檢查百分比排名的值,直到周到找打大於或者等於x的值。

select 
    percentile_cont(0.6) within group (order by sum(amount) desc) as percentile_count,
    percentile_disc(0.5) within group (order by sum(amount) desc) as percentile_disc
    from all_sales
    where year = 2003 and amount is not null
    group by prd_type_id;

? percentile_cont()什麼意思,沒看懂。

窗口函數

窗口函數可以計算一定記錄範圍內,一定值域內,或者一段時間內的累計和以及移動平均值。窗口可以與這些函數結合使用:sum(), avg(), max(), min(), count(), variance(), stddev(), first_value(), last_value()。

那麼窗口函數到底用來幹什麼的呢?

  1. 計算累計和
select 
    month, sum(amount) as m_amount, 
    sum(sum(amount)) over (order by month rows between unbounded preceding and current row) as cumulative_amount

    from all_sales
    where year = 2003
    group by month
    order by month;

這裏寫圖片描述

  1. 計算移動平均值
select
    month as month, sum(amount) as month_amount, 
    sum(sum(amount)) over (order by month rows between unbounded preceding and current row)  as sum,
    avg(sum(amount)) over (order by month rows between unbounded preceding and current row)  as avg
    from all_sales
    where year = 2003 and month between 6
 and 12
    group by month
    order by month;

這裏寫圖片描述

從上圖可以看出所謂的移動平均值就是將當前行到窗口起點所有項的平均值,其實計算的項是根據窗口大小具體而定的;

  1. 計算中心平均值
select 
    month, sum(amount) as month_amount,
    sum(sum(amount)) over (order by month rows between 1 preceding and 1 following) as moving_sum,
    avg(sum(amount)) over (order by month rows between 1 preceding and 1 following) as moving_average
    from all_sales
    where year = 2003
    group by month
    order by month;

這裏寫圖片描述
從圖可以看出,所謂的中心平均值,就是講當月和上月的平均值

frist_value和last_value
這個函數的作用是獲取窗口的第一行和最後一行數據
使用方法和上面的類似,讀者自測;

補充:不知道從以上的列子中你是否看出了over裏面語句的作用沒?
其實它的作用就是規定了一個窗口,而前面的函數只是對這個窗口的函數進行操作;窗口規定:
between 1 preceding and 1 following:就是從前一行到後一行
between unbounded preceding and current row:(未綁定,默認從開始處,)從最開始處到當前行

報表函數

報表函數可用與執行跨越分組和組內分區的計算。

總計報表

select 
    month, prd_type_id,
    sum(sum(amount)) over (partition by month)
as total_month_amount,
    sum(sum(amount)) over (partition by prd_type_id) as total_type_amount
    from all_sales
    where year = 2003 and month <= 3
    group by month, prd_type_id
    order by month, prd_type_id;

這裏寫圖片描述

表達式分解:
sum(amount)計算一個銷量的總和,外的sum()計算總計
over(pratition by month)讓外部的sum()計算每一個月的總計
使用patio_to_report函數
該函數用來計算某個值在一組值的總和中所佔的比率;

select month, 
sum(amount) as prd_type_amount,
ratio_to_report(sum(amount)) over (partition by month) as prd_type_ratio from all_sales
where year = 2003 and month <= 3
group by month, prd_type_id
order by month;

這裏寫圖片描述
月總計
這裏寫圖片描述
從兩張圖可以看出,計算的是當行的數據在總計中所佔的比率

延遲與領先函數

lag和lead函數可獲取距當前記錄指定距離處的那條記錄中的數據。

select month, sum(amount) as month_amount,
lag(sum(amount), 2) over (order by month) as previous_month_amount,
lead(sum(amount), 2) over (order by month) as next_month_amount
from all_sales
where year = 2003
group by month
order by month;

這裏寫圖片描述

首函數與末函數

first和last函數可獲取一個排序分組中的第一個值和最後一個值.rirst和last可以與系列函數一起使用:min(), max(), sum(), avg(), stddev(), varlance().

select 
    max(month) keep (dense_rank first order by sum(amount)) as highest_sales_month,
    min(month) keep (dense_rank last order by sum(amount))
    from all_sales
    where year = 2003
    group by month 
    order by month;
-------------------------------------
select 
    month,  sum(amount)
from all_sales
    group by month
    order by sum(amount);

這裏寫圖片描述

這裏寫圖片描述

線性迴歸函數

線性迴歸函數可以用普通最小平方迴歸曲線擬合一組數值對,線性迴歸函數可用於聚合,窗口或報表函數;

函數說明
regr_avgx(y,x)先去除x或y爲空值的x和y數值對,然後返回x的平均值
regr_avgy(y,x)先去除x或y爲空值的x和y數值對,然後返回y的平均值
regr_count(y,x)返回可用於擬合迴歸曲線的非空數值對個數
regr_intercept(y,x)返回迴歸曲線在y軸方向的截距
regr_r2(y,x)返回迴歸曲線的決定係數,或相關係數(R-squared)
regr_slope(y,x)返回迴歸曲線的斜率
regr_sxx(y,x)返回reg_count(y,x)*var_pop(x)
regr_sxy(y,x)返回reg_count(y,x)*covar_pop(y,x)
regr_syy(y,x)返回reg_count(y,x)*var_pop(y)
select 
    prd_type_id,
    regr_avgx(&&y,&&x) as avgx,
    regr_avgy(&&y,&&x) as avgy,
    regr_count(&&y,&&x) as count,
    regr_intercept(&&y,&&x) as intercept,
    regr_r2(&&y,&&x) as r2, 
    regr_slope(&&y,&&x) as solpe,
    regr_sxx(&&y,&&x) as sxx,   
    regr_sxy(&&y,&&x) as sxy,
    regr_syy(&&y,&&x) as syy
from all_sales
where year = 2003
group by prd_type_id;

參數:y=amount , x=month
這裏寫圖片描述

這方面實在是沒有了解過,有點難以入腦;

假想評級及分佈函數

假想評級與分佈函數可以計算一條新的記錄在表中的排名和百分比,而不用將其插入表中。下面這些函數可以與假想計算結合使用:rank(), dense_rank(), percent_rank()和cume_dist()。

select 
    prd_type_id, sum(amount),
    rank() over (order by sum(amount) desc) as rank, 
    percent_rank() over (order by sum(amount) desc) as rank
    from all_sales
    where year = 2003 and amount is not null
    group by prd_type_id
    order by sum(amount);
---------------------------------------
select 
    rank(&&amount) within group (order by sum(amount) desc) as rank,
    percent_rank(&&amount) within group (order by sum(amount) desc) as percent_rank
    from all_sales
    where year = 2003 and amount is not null
    group by prd_type_id
    order by prd_type_id;

這裏寫圖片描述

參數:amount=500000
這裏寫圖片描述

通過這種方式可以估計出排名,這就是假想評級

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章