這裏以mysql數據庫爲例,提供批量插入數據的高效方式,並做一定的對比,模擬10萬條數據
一、環境配置
Application.yml文件配置
server:
port: 8086
spring:
application:
name: batch
jpa:
database: mysql
show-sql: true
properties:
hibernate:
dialect: org.hibernate.dialect.MySQL5InnoDBDialect
generate_statistics: true
jdbc:
batch_size: 500
batch_versioned_data: true
order_inserts: true
order_updates: true
datasource:
url: jdbc:mysql://localhost:3306/hr?rewriteBatchedStatements=true&serverTimezone=UTC&useUnicode=true&characterEncoding=utf-8&useSSL=true&allowMultiQueries=true
username: root
password: ****
driver-class-name: com.mysql.cj.jdbc.Driver
數據:
public void inits(){
//這裏模擬10萬條數據
for(int i=0;i<100000;i++){
User user = new User();
user.setAge(i);
user.setId(i+"");
user.setName("name"+i);
userList.add(user);
}
}
二、幾種批量插入方式的比較
(1) JPA的SaveAll方法
傳統jpa的saveAll方法還是太慢了,加了配置依舊很慢,10萬數據需要23s的時間
(2)使用EntityManager的persist方法
@PersistenceContext
private EntityManager em;
private static final int BATCH_SIZE = 10000;
/**
* 批量增加,需要配置,10萬條數據的消耗是 2549 ms
* @param list
*/
@Transactional(rollbackFor = Exception.class)
public void batchInsertWithEntityManager(List<T> list){
Iterator iterator = list.listIterator();
int index = 0;
while (iterator.hasNext()){
em.persist(iterator.next());
index++;
if (index % BATCH_SIZE == 0){
em.flush();
em.clear();
}
}
if (index % BATCH_SIZE != 0){
em.flush();
em.clear();
}
}
效果:
@Test
public void testBatchInsert(){
long saveStart = System.currentTimeMillis();
batchDao.batchInsertWithEntityManager(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); //the save total time is 2549 ms
userDao.deleteAllInBatch(); //一條語句,批量刪除
}
(3)使用jdbcTemplate的batchUpdate方法
@Autowired
private JdbcTemplate jdbcTemplate;
/**
* jdbcTemplate,batchUpdate增加,需自己定義sql,需要配置
* @param list
*/
public void batchWithJDBCTemplate(List<User> list){
String sql = "Insert into t_user(id,name,age) values(?,?,?)";
jdbcTemplate.batchUpdate(sql,new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setString(1,list.get(i).getId());
ps.setString(2,list.get(i).getName());
ps.setInt(3,list.get(i).getAge());
}
@Override
public int getBatchSize() {
return list.size();
}
});
}
效果:
@Test
public void testBatchWithJDBC(){
long saveStart = System.currentTimeMillis();
batchDao.batchWithJDBCTemplate(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); // the save total time is 1078 ms
userDao.deleteAllInBatch(); //一條語句,批量刪除
}
(4)原生sql的方式
/**
* 使用數據庫原生的方式執行,不需要配置
* @param list
*/
public void batchWithNativeSql(List<User> list) throws SQLException {
String sql = "Insert into t_user(id,name,age) values(?,?,?)";
DataSource dataSource = jdbcTemplate.getDataSource();
try{
Connection connection = dataSource.getConnection();
connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 10000;
int count = 0;
for(User user :list){
ps.setString(1,user.getId());
ps.setString(2,user.getName());
ps.setInt(3,user.getAge());
ps.addBatch();
count++;
if(count % batchSize == 0 || count == list.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
}catch (SQLException e){
e.printStackTrace();
}
}
效果:
@Test
public void testBatchWithNativeSql() throws SQLException {
long saveStart = System.currentTimeMillis();
batchDao.batchWithNativeSql(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); // the save total time is 899 ms
userDao.deleteAllInBatch();
}
三、結論
JdbcTemplate的batchUpdate和原生SQL操作兩種方式基本上能滿足數據大的操作需求,前者需要在進行配置,而後者不需要配置。對於小批量的數據操作,根據自己的需要選擇。
溫馨提醒:配置失效或者沒有效果,請重新檢查一下配置文件,同時也檢查表是否存在觸發器,有觸發器則需要與相關人員進行溝通,建議把觸發器涉及的業務進行抽取成幾張表的批量插入。