最近有一個需求,就是要將數據文件打包上傳到服務器,上傳的同時分析數據文件並將數據清洗入庫。
文章目錄
分析:
- 有多個不同類型的數據文件,如果單個上傳對於用戶來說無疑是一件比較痛苦的事,因此需要將多個文件打包上傳,記錄上傳文件,點擊記錄則可以看到該壓縮包內的文件明細記錄;
- 數據文件清洗入庫,該項目採用國外的一款開源ETL工具kettle,Kettle 中文名稱叫水壺,意思是把各種數據放到一個壺裏,然後以一種指定的格式流出,這種理念很符合本需求;
- kettle清洗對數據文件格式要求較高,因此文件上傳之後清洗之前需要校驗壓縮包內數據文件的格式,不規範的文件給出詳細的錯誤提示。
該項目使用springboot + thymeleaf + mybatis,通過mybatis插件pagehelper插件進行分頁,簡單封裝了Page類。
**
項目github地址:kettle-springboot
**
實現:
github上有完整的項目代碼,有此需求的朋友可以關注一下,本文只介紹關鍵步驟。
1、spring boot整合kettle
- POM
<!-- with pentaho-kettle -->
<dependency>
<groupId>pentaho-kettle</groupId>
<artifactId>kettle-core</artifactId>
<version>${kettle-version}</version>
</dependency>
<dependency>
<groupId>pentaho-kettle</groupId>
<artifactId>kettle-engine</artifactId>
<version>${kettle-version}</version>
</dependency>
<dependency>
<groupId>pentaho-kettle</groupId>
<artifactId>kettle-dbdialog</artifactId>
<version>${kettle-version}</version>
</dependency>
<!--kettle執行復雜腳本需要此包-->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>${janino-version}</version>
</dependency>
注意最後一個jar包(janino),執行簡單job和translation時用不上,但是複雜作業沒有此包會報錯。
- application.yml
將kettle的資源庫、模板目錄、日誌級別等的配置放在springboot的默認配置文件,當然也可以將其拆出來,或者直接寫在代碼裏(不建議)。
# kettle相關配置
kettle:
filerepository:
path: D:/ch/Kettle-repo/test
id: kettleRepo
name: kettleRepo
description: 恩施kettle文件資源庫
templates: #數據模板文件路徑
path: D:/ch/Kettle-repo/templates
log:
level: basic # 對應nothing error minimal basic detailed debug rowlevel
path: D:/hx/log/kettle_log
說明:kettle.filerepository.path爲kettle本地文件資源庫的路徑,後面的java代碼通過該路徑來讀取資源庫,取得該文件資源庫下的作業和轉換(有興趣的朋友可以使用數據庫資源庫)。
- java代碼,對java代碼調用kettle工具執行kettle作業或轉換進行了封裝,僅列出關鍵部分,可做參考,本項目github上有完整代碼。
/**
* 配置kettle文件庫資源庫環境
**/
public KettleFileRepository fileRepositoryCon() throws KettleException {
String msg;
//初始化
/*EnvUtil.environmentInit();
KettleEnvironment.init();*/
//資源庫元對象
KettleFileRepositoryMeta fileRepositoryMeta = new KettleFileRepositoryMeta(this.KETTLE_REPO_ID, this.KETTLE_REPO_NAME, this.KETTLE_REPO_DESC, this.KETTLE_REPO_PATH);
// 文件形式的資源庫
KettleFileRepository repo = new KettleFileRepository();
repo.init(fileRepositoryMeta);
//連接到資源庫
repo.connect("", "");//默認的連接資源庫的用戶名和密碼
if (repo.isConnected()) {
msg = "kettle文件庫資源庫【" + KETTLE_REPO_PATH + "】連接成功";
logger.info(msg);
return repo;
} else {
msg = "kettle文件庫資源庫【" + KETTLE_REPO_PATH + "】連接失敗";
logger.error(msg);
throw new KettleDcException(msg);
}
}
public void callTrans(String transPath, String transName, Map<String,String> namedParams, String[] clParams) throws Exception {
String msg;
KettleFileRepository repo = this.fileRepositoryCon();
TransMeta transMeta = this.loadTrans(repo, transPath, transName);
//轉換
Trans trans = new Trans(transMeta);
//設置命名參數
if(null != namedParams) {
for(Iterator<Map.Entry<String, String>> it = namedParams.entrySet().iterator(); it.hasNext();){
Map.Entry<String, String> entry = it.next();
trans.setParameterValue(entry.getKey(), entry.getValue());
}
}
trans.setLogLevel(this.getLogerLevel(KETTLE_LOG_LEVEL));
//執行
trans.execute(clParams);
trans.waitUntilFinished();
//記錄日誌
String logChannelId = trans.getLogChannelId();
LoggingBuffer appender = KettleLogStore.getAppender();
String logText = appender.getBuffer(logChannelId, true).toString();
logger.info(logText);
//拋出異常
if (trans.getErrors() > 0) {
msg = "There are errors during transformation exception!(轉換過程中發生異常)";
logger.error(msg);
throw new KettleDcException(msg);
}
}
public boolean callJob(String jobPath, String jobName, Map<String,String> variables, String[] clParams) throws Exception {
String msg;
KettleFileRepository repo = this.fileRepositoryCon();
JobMeta jobMeta = this.loadJob(repo, jobPath, jobName);
Job job = new Job(repo, jobMeta);
//向Job 腳本傳遞參數,腳本中獲取參數值:${參數名}
if(null != variables) {
for(Iterator<Map.Entry<String, String>> it = variables.entrySet().iterator(); it.hasNext();){
Map.Entry<String, String> entry = it.next();
job.setVariable(entry.getKey(), entry.getValue());
}
}
//設置日誌級別
job.setLogLevel(this.getLogerLevel(KETTLE_LOG_LEVEL));
job.setArguments(clParams);
job.start();
job.waitUntilFinished();
//記錄日誌
String logChannelId = job.getLogChannelId();
LoggingBuffer appender = KettleLogStore.getAppender();
String logText = appender.getBuffer(logChannelId, true).toString();
logger.info(logText);
if (job.getErrors() > 0) {
msg = "There are errors during job exception!(執行job發生異常)";
logger.error(msg);
throw new KettleDcException(msg);
}
return true;
}
/**
* 加載轉換
*/
private TransMeta loadTrans(KettleFileRepository repo, String transPath, String transName) throws Exception{
String msg;
RepositoryDirectoryInterface dir = repo.findDirectory(transPath);//根據指定的字符串路徑找到目錄
if(null == dir){
msg = "kettle資源庫轉換路徑不存在【"+repo.getRepositoryMeta().getBaseDirectory()+transPath+"】!";
throw new KettleDcException(msg);
}
TransMeta transMeta = repo.loadTransformation(repo.getTransformationID(transName, dir), null);
if(null == transMeta){
msg = "kettle資源庫【"+dir.getPath()+"】不存在該轉換【"+transName+"】!";
throw new KettleDcException(msg);
}
return transMeta;
}
/**
* 加載job
*/
private JobMeta loadJob(KettleFileRepository repo, String jobPath, String jobName) throws Exception{
String msg;
RepositoryDirectoryInterface dir = repo.findDirectory(jobPath);//根據指定的字符串路徑找到目錄
if(null == dir){
msg = "kettle資源庫Job路徑不存在【"+repo.getRepositoryMeta().getBaseDirectory()+jobPath+"】!";
throw new KettleDcException(msg);
}
JobMeta jobMeta = repo.loadJob(repo.getJobId(jobName, dir), null);
if(null == jobMeta){
msg = "kettle資源庫【"+dir.getPath()+"】不存在該轉換【"+jobName+"】!";
throw new KettleDcException(msg);
}
return jobMeta;
}
調用
Map<String,String> variables = new HashMap<>();
//傳入文件解壓出來後的路徑
variables.put("param",FileUtil.getBasePath(t_relative_path));
boolean re = kettleManager.callJob(t_job_path,t_job_name,variables, null);
2、spring boot整合mybatis
- POM
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>${druid-version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql-connector-java-version}</version>
<!--<scope>runtime</scope>-->
</dependency>
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>${mybatis-spring-boot-starter-version}</version>
</dependency>
數據庫連接池使用的druid,相關配置文件爲druid.properties
- mybatis配置
mybatis配置文件爲resources/mybatis/mybatis-config.xml,mapper.xml放在resources/mybatis/mapper/目錄下。如需在控制檯打印sql語句以便調試,在mybatis-config.xml里加入
<setting name="logImpl" value="STDOUT_LOGGING"/>
即可
springboot的mybatis配置通過註解@Configuration完成,如下:
@Configuration
@PropertySource(value = "classpath:druid.properties")
public class SpringConfig {
@Bean(name = "dataSource")
@ConfigurationProperties(prefix = "spring.datasource")
public DataSource druidDataSource() {
DruidDataSource druidDataSource = new DruidDataSource();
return druidDataSource;
}
/*==================MyBatis配置====================*/
@Bean(name = "sqlSessionFactory")
@Primary
public SqlSessionFactory sqlSessionFactory(@Qualifier("dataSource") DataSource dataSource) throws Exception {
//此句必須要加上,不然打包後運行jar包時無法識別mybatis別名
VFS.addImplClass(SpringBootVFS.class);
SqlSessionFactoryBean bean = new SqlSessionFactoryBean();
bean.setDataSource(dataSource);
// 設置mybatis的主配置文件
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource mybatisConfigXml = resolver.getResource("classpath:mybatis/mybatis-config.xml");
bean.setConfigLocation(mybatisConfigXml);
//設置mybatis掃描的mapper.xml文件的路徑(非常重要,否則找不到mapper.xml文件)
Resource[] mapperResources = resolver.getResources("classpath:mybatis/mapper/*.xml");
bean.setMapperLocations(mapperResources);
// 設置別名包,便於在mapper.xml文件中ParemeType和resultType不要寫完整的包名
bean.setTypeAliasesPackage("com.ch.dataclean.model");
return bean.getObject();
}
@Bean(name = "sqlSessionTemplate")
@Primary
public SqlSessionTemplate sqlSessionTemplate(@Qualifier("sqlSessionFactory") SqlSessionFactory sqlSessionFactory) throws Exception {
return new SqlSessionTemplate(sqlSessionFactory);
}
//初始化kettle環境
@Bean(name = "KettleEnvironmentInit")
public StartInit startInit(){
return new StartInit();
}
}
特別注意:配置bean SqlSessionFactory時一定要加上這句[VFS.addImplClass(SpringBootVFS.class);] ,不加的話在idea裏運行正常,可是打成jar運行時會識別不了mybatis的別名,即使使用了註解@Alias("")。
- DAO
public interface DAO {
/**
* 保存對象
*/
public Object save(String str, Object obj) throws Exception;
/**
* 修改對象
*/
public Object update(String str, Object obj) throws Exception;
/**
* 刪除對象
*/
public Object delete(String str, Object obj) throws Exception;
/**
* 查找對象
*/
public Object findForObject(String str, Object obj) throws Exception;
/**
* 查找對象
*/
public Object findForList(String str, Object obj) throws Exception;
/**
* 查找對象封裝成Map
*/
public Object findForMap(String sql, Object obj, String key, String value) throws Exception;
}
@Repository
public class DaoSupport implements DAO {
@Resource(name = "sqlSessionTemplate")
private SqlSessionTemplate sqlSessionTemplate;
/**
* 保存對象
*/
public Object save(String str, Object obj) throws Exception {
return sqlSessionTemplate.insert(str, obj);
}
/**
* 批量更新
*/
public Object batchSave(String str, List objs )throws Exception{
return sqlSessionTemplate.insert(str, objs);
}
/**
* 修改對象
*/
public Object update(String str, Object obj) throws Exception {
return sqlSessionTemplate.update(str, obj);
}
/**
* 批量更新
*/
public void batchUpdate(String str, List objs )throws Exception{
SqlSessionFactory sqlSessionFactory = sqlSessionTemplate.getSqlSessionFactory();
//批量執行器
SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH,false);
try{
if(objs!=null){
for(int i=0,size=objs.size();i<size;i++){
sqlSession.update(str, objs.get(i));
}
sqlSession.flushStatements();
sqlSession.commit();
sqlSession.clearCache();
}
}finally{
sqlSession.close();
}
}
/**
* 批量更新
*/
public Object batchDelete(String str, List objs )throws Exception{
return sqlSessionTemplate.delete(str, objs);
}
/**
* 刪除對象
*/
public Object delete(String str, Object obj) throws Exception {
return sqlSessionTemplate.delete(str, obj);
}
/**
* 查找對象
*/
public Object findForObject(String str, Object obj) throws Exception {
return sqlSessionTemplate.selectOne(str, obj);
}
/**
* 查找對象
*/
public Object findForList(String str, Object obj) throws Exception {
return sqlSessionTemplate.selectList(str, obj);
}
public Object findForMap(String str, Object obj, String key, String value) throws Exception {
return sqlSessionTemplate.selectMap(str, obj, key);
}
public SqlSessionTemplate getSqlSessionTemplate() {
return sqlSessionTemplate;
}
public void setSqlSessionTemplate(SqlSessionTemplate sqlSessionTemplate) {
this.sqlSessionTemplate = sqlSessionTemplate;
}
}
3、pagehelper插件
- POM
<dependency>
<groupId>com.github.pagehelper</groupId>
<artifactId>pagehelper</artifactId>
<version>5.1.8</version>
</dependency>
本文使用的是pagehelper5.1.8,使用5.0以後的版本即可,5.0以前的沒有如下方法,既不能通過此方法設置排序參數。
public static <E> Page<E> startPage(int pageNum, int pageSize, String orderBy)
- 配置
在mybatis的配置文件mybatis-config.xml裏configuration下加入:
<plugins>
<!-- com.github.pagehelper爲PageHelper類所在包名 -->
<plugin interceptor="com.github.pagehelper.PageInterceptor">
<!-- 4.0.0以後版本可以不設置該參數 -->
<!-- 設置數據庫類型 Oracle,Mysql,MariaDB,SQLite,Hsqldb,PostgreSQL六種數據庫-->
<!--<property name="helperDialect" value="mysql"/>-->
<!-- 3.3.0版本可用 - 分頁參數合理化,默認false禁用 -->
<!-- 啓用合理化時,如果pageNum<1會查詢第一頁,如果pageNum>pages會查詢最後一頁 -->
<!-- 禁用合理化時,如果pageNum<1或pageNum>pages會返回空數據 -->
<property name="reasonable" value="true"/>
</plugin>
</plugins>
- Page.java
//此處省略setter/getter
/**
* Description: pagehelper分頁實體類
* Created by Aaron on 2018/11/21
*/
public class Page<T> {
private int pageNum = 1;
private int pageSize = 10;
private int startRow;
private int endRow;
private long total;
private int pages;
//排序
private String orderBy;
private List<T> rows;
/**
* 分頁查詢
*/
public Page<T> queryForPage(SqlSessionTemplate sqlSessionTemplate, String sqlMappingStr, Map param, Page page){
if(null != this.orderBy && "" != this.orderBy.trim()){
PageHelper.startPage(page.getPageNum(),page.getPageSize(),this.orderBy);
}else {
PageHelper.startPage(page.getPageNum(),page.getPageSize());
}
List<T> list = sqlSessionTemplate.selectList(sqlMappingStr, param);
PageInfo pageInfo = new PageInfo(list);
page.setPageNum(pageInfo.getPageNum());
page.setPageSize(pageInfo.getPageSize());
page.setRows(list);
page.setTotal(pageInfo.getTotal());
page.setPages(pageInfo.getPages());
page.setStartRow(pageInfo.getEndRow());
page.setEndRow(pageInfo.getEndRow());
return page;
}
}
結語:此項目也可作爲springboot入門,想學習springboot的朋友可以借鑑,裏面有本人對於springboot整合其他框架摸索出來的一些思想。以前都是自己寫的分頁,缺點是分頁查詢的時候需要另外寫一個sql查詢總數,關於pagehelper,目前的感受是真的方便,性能方面如何暫且不知,這都是後話了,先上線後迭代,哈哈。就寫這麼多吧,熬夜傷身,此刻思路已經不清晰了。該項目源碼見 kettle-springboot,比較簡單,也希望對你們有用。晚安各位。