Java 嵌入 SPL 輕鬆實現數據分組（組合推廣型）

問題介紹

要在 Java 代碼中實現類似 SQL 中的 GroupBy 分組聚合運算，是比較繁瑣的，通常先要聲明數據結構（Java 實體類），然後用 Java 集合進行循環遍歷，最後根據分組條件添加到某個子集合中。Java 8 有了 Lambda（stream）代碼簡潔了許多，分組後往往還要跟着聚合操作，仍然需要單寫聚合函數 sum(),count(*),topN()等。這些還都是最常規的分組和聚合運算，遇到對位分組、枚舉分組、多重分組等非常規分組加上其他聚集函數 (FIRST，LAST…)，代碼就變得非常冗長且不通用。如果能有一箇中間件專門負責這類計算，採用類似 SQL 腳本做算法描述，在 Java 中直接調用腳本並返回結果集就好了。Java 版集算器和 SPL 腳本，就是這樣的機制，下面舉例說明如何使用。

SPL 實現

常規分組

duty.xlsx 文件中保存着每個人的加班記錄:

彙總每個人的值班天數：

保存腳本文件CountName.dfx(嵌入 Java 會用到)

每組 TopN

取每個月、每個人、頭三天的加班記錄

保存腳本文件RecMonTop3.dfx(嵌入 Java 會用到)

Java 調用

SPL 嵌入到 Java 應用程序十分方便，通過 JDBC 調用存儲過程方法加載，用常規分組保存的文件CountName.dfx，示例調用如下：

...
 Connection con = null;
 Class.forName("com.esproc.jdbc.InternalDriver");
 con= DriverManager.getConnection("jdbc:esproc:local://");
//調用存儲過程，其中CountName是dfx的文件名
 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");
 //執行存儲過程
 st.execute();
 //獲取結果集
 ResultSet rs = st.getResultSet();
... 

...
 Connection con = null;
 Class.forName("com.esproc.jdbc.InternalDriver");
 con= DriverManager.getConnection("jdbc:esproc:local://");
//調用存儲過程，其中CountName是dfx的文件名
 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");
 //執行存儲過程
 st.execute();
 //獲取結果集
 ResultSet rs = st.getResultSet();
...

替換成 RecMonTop3.dfx 是同樣的道理，只需 call RecMonTop3() 即可，也可同時返回兩個結果集。這裏只用 Java 片段粗略解釋瞭如何嵌入 SPL，詳細步驟請參閱 Java 如何調用 SPL 腳本，也非常簡單，不再贅述。同時，SPL 也支持 ODBC 驅動，集成到支持 ODBC 的語言，嵌入過程類似。

拓展節選

之前沒有相關的總結，其實關於數據分組，細分起來其實還有很多種，對位分組、枚舉分組、多重分組…，在乾學院 SPL 官方論壇都有總結和示例，這裏節選其中兩種。

SPL 對位分組

示例 1：按順序分別列出使用 Chinese、English、French 作爲官方語言的國家數量

MySQL8:
with t(name,ord) as (select 'Chinese',1
union all select 'English',2
union all select 'French',3)
select t.name, count(countrycode) cnt
from t left join world.countrylanguage s on t.name=s.language
where s.isofficial='T'
group by name,ord
order by ord; 

MySQL8:
with t(name,ord) as (select 'Chinese',1
union all select 'English',2
union all select 'French',3)
select t.name, count(countrycode) cnt
from t left join world.countrylanguage s on t.name=s.language
where s.isofficial='T'
group by name,ord
order by ord;

注意：表的字符集和數據庫會話的字符集要保持一致。

(1) show variables like ’character_set_connection’查看當前會話字符集

(2) show create table world.countrylanguage 查看錶的字符集

(3) set character_set_connection=[字符集] 更新當前會話字符集

集算器 SPL:

A1: 連接數據庫

A2: 查詢出所有官方語言的記錄

A3: 需要列出的語言

A4: 將所有記錄按 Language 對位到 A3 相應位置

A5: 構造以語言和使用此語言爲官方語言的國家數量的序表

示例 2：按順序分別列出使用 Chinese、English、French 及其它語言作爲官方語言的國家數量

MySQL8:
with t(name,ord) as (select 'Chinese',1 union all select 'English',2
union all select 'French',3 union all select 'Other', 4),
s(name, cnt) as (
select language, count(countrycode) cnt
from world.countrylanguage s
where s.isofficial='T' and language in ('Chinese','English','French')
group by language
union all
select 'Other', count(distinct countrycode) cnt
from world.countrylanguage s
where isofficial='T' and language not in ('Chinese','English','French')
)
select t.name, s.cnt
from t left join s using (name)
order by t.ord; 

MySQL8:
with t(name,ord) as (select 'Chinese',1 union all select 'English',2
union all select 'French',3 union all select 'Other', 4),
s(name, cnt) as (
select language, count(countrycode) cnt
from world.countrylanguage s
where s.isofficial='T' and language in ('Chinese','English','French')
group by language
union all
select 'Other', count(distinct countrycode) cnt
from world.countrylanguage s
where isofficial='T' and language not in ('Chinese','English','French')
)
select t.name, s.cnt
from t left join s using (name)
order by t.ord;

集算器 SPL:

A4: 將所有記錄按 Language 對位到 A3.to(3) 相應位置，並追加一組用於存放不能對位的記錄

A5: 第 4 組計算不同 CountryCode 的數量

SPL 枚舉分組

示例 1：按順序列出各類型城市的數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN'),
segment(class,start,end) as (select 'tiny', 0, 200000
union all select 'small',  200000, 1000000
union all select 'medium', 1000000, 2000000
union all select 'big', 2000000, 100000000
)
select class, count(1) cnt
from segment s join t on t.population>=s.start and t.population<s.end
group by class, start
order by start; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN'),
segment(class,start,end) as (select 'tiny', 0, 200000
union all select 'small',  200000, 1000000
union all select 'medium', 1000000, 2000000
union all select 'big', 2000000, 100000000
)
select class, count(1) cnt
from segment s join t on t.population>=s.start and t.population<s.end
group by class, start
order by start;

集算器 SPL:

A3: ${…} 宏替換，以大括號內表達式的結果作爲新表達式進行計算，結果爲序列 [“?<200000”,“?<1000000”,“?<2000000”,“?<100000000”]

A5: 針對 A2 中每條記錄，尋找 A3 中第 1 個成立的條件，並追加到對應的組中

示例 2：列出華東地區大型城市數量、其它地區大型城市數量、非大型城市數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Other&Big', count(*)
from t
where population>=2000000
and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big', count(*)
from t
where population<2000000; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Other&Big', count(*)
from t
where population>=2000000
and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big', count(*)
from t
where population<2000000;

集算器 SPL:

A5: enum@n 將不滿足 A4 中所有條件的記錄存放到追加的最後一組中

示例 3：列出所有地區大型城市數量、華東地區大型城市數量、非大型城市數量

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'Big' class, count(*) cnt
from t
where population>=2000000
union all
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big' class, count(*) cnt
from t
where population<2000000; 

MySQL8:
with t as (select * from world.city where CountryCode='CHN')
select 'Big' class, count(*) cnt
from t
where population>=2000000
union all
select 'East&Big' class, count(*) cnt
from t
where population>=2000000
and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
union all
select 'Not Big' class, count(*) cnt
from t
where population<2000000;

集算器 SPL:

A6: 若 A2 中記錄滿足 A4 中多個條件時，enum@r 會將其追加到對應的每個組中

優勢總結

有庫寫 SQL，沒庫寫 SPL
用 Java 程序直接彙總計算數據，還是比較累的，代碼很長，並且不可複用，很多情況數據也不在數據庫裏，有了 SPL，就能像在 Java 中用 SQL 一樣了，十分方便。
常用無憂，不花錢就能取得終身使用權的入門版
如果要分析的數據是一次性或臨時性的，潤乾集算器每個月都提供免費試用授權，可以循環免費使用。但要和 Java 應用程序集成起來部署到服務器上長期使用，定期更換試用授權還是比較麻煩，潤乾提供了有終身使用權的入門版，解決了這個後顧之憂，獲得方式參考如何免費使用潤乾集算器？
技術文檔和社區支持
官方提供的集算器技術文檔本身就有很多現成的例子，常規問題從文檔裏都能找到解決方法。如果獲得了入門版，不僅能夠使用 SPL 的常規功能，碰到任何問題都可以去乾學院上去諮詢，官方通過該社區對入門版用戶提供免費的技術支持。

Java 嵌入 SPL 輕鬆實現數據分組（組合推廣型）

問題介紹

SPL 實現

常規分組

每組 TopN

Java 調用

拓展節選

SPL 對位分組

SPL 枚舉分組

優勢總結

集算器學習材料彙總

從數據整理到業務計算的最佳工具

協助報表開發之 MongoDB join

協助 MongoDB 計算之交叉彙總

產權交易所解析 HTML 與計算案例

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結