kettle如何3秒內寫入100萬條數據到Redis
實現結果
先來看下實現結果,如下圖,本地寫入100萬數據,耗時2.3s,每秒44萬。接下來說說如何實現:
數據存儲結構樣例:
生成記錄
用於生成測試數據:
增加序列
用於生成redis的key值
Json輸出
用於將原始數據封裝爲一個json,存儲到redis中:
json輸出:字段頁籤,用於說明json中包含的字段信息:
Java 寫入redis緩存
主要使用到了Pipeline類,實現批量提交:
詳細代碼如下:
// etl-java-redis
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
import redis.clients.jedis.Pipeline;
private Jedis jedis=null;
private JedisPool pool=null;
Pipeline pipe = null;
int cache_size=10000; // 批量提交大小
int cur_size=0; // 當前數據緩存量
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
if (first) {
first = false;
// connect to redis server
String redis_ip = getVariable("redis.ip", "127.0.0.1");
String redis_port = getVariable("redis.port", "6379");
String redis_password = getVariable("redis.password", "");
cache_size = Integer.valueOf(getVariable("redis.cache_size", "10000"));
logBasic(redis_ip+":"+redis_port);
logBasic("redis_password:"+redis_password);
// 連接池方式
JedisPoolConfig config = new JedisPoolConfig();
config.setMaxIdle(8);
config.setMaxTotal(18);
pool = new JedisPool(config, redis_ip, Integer.valueOf(redis_port), 2000, redis_password);
jedis = pool.getResource();
jedis.select(1);// 切換數據庫
pipe = jedis.pipelined(); // 創建pipeline 對象
logBasic("Server is running: " + jedis.ping());
}
Object[] r = getRow();
if (r == null) {
setOutputDone();
pipe.sync();
jedis.close();
pool.close();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
/*
Redis數據存儲(Redis-String)
key : KEY
value : JsonData
*/
String key = get(Fields.In, "id").getString(r);
String value = get(Fields.In, "JsonData").getString(r);
logDebug(key + "\t" + value);
// 寫入緩存
pipe.set(key, value);
cur_size++;
if (cur_size % cache_size == 0 && cur_size > 0) {// 當達到緩存最大值時提交
pipe.sync(); // 同步
cur_size=0; // 復位
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
命名參數
將可變參數存儲到命名參數中,方便遷移:
– 本文結束 –