SpringBoot實現Mysql百萬級數據量導出並避免OOM的解決方案

 

前言

動態數據導出是一般項目都會涉及到的功能。它的基本實現邏輯就是從mysql查詢數據,加載到內存,然後從內存創建excel或者csv,以流的形式響應給前端。

參考 https://grokonez.com/spring-framework/spring-boot/excel-file-download-from-springboot-restapi-apache-poi-mysql。SpringBoot下載excel基本都是這麼幹。

雖然這是個可行的方案,然而一旦mysql數據量太大,達到十萬級,百萬級,千萬級,大規模數據加載到內存必然會引起OutofMemoryError。

要考慮如何避免OOM,一般有兩個方面的思路。

一方面從產品設計上考慮,先問問產品一下幾個問題:

  1. 我們爲什麼要導出這麼多數據呢?這個設計是不是合理的呢?
  2. 怎麼做好權限控制?百萬級數據導出確定不會泄露公司機密?
  3. 如果要導出百萬級數據,那爲什麼不直接找大數據或者DBA來幹呢?然後以郵件形式傳遞不行嗎?
  4. 爲什麼要通過後端的邏輯來實現,不考慮時間成本,流量成本嗎?
  5. 如果通過分頁導出,每次點擊按鈕只導2萬條,分批導出難道不能滿足業務需求嗎?

如果產品缺個筋,聽不懂你的話,堅持要一次性導出全量數據,那就只能從技術實現上考慮如何實現了。

從技術上講,爲了避免OOM,我們一定要注意一個原則:

  • 不能將全量數據一次性加載到內存之中。

全量加載不可行,那我們的目標就是如何實現數據的分批加載了。實際上,java8提供的stream可以實現將你需要的數據逐條返回,我們可以通過stream的形式將數據逐條刷入到文件中,每次刷入後再從內存中移除這條數據,從而避免OOM。

由於採用了數據逐條刷入文件,所以文件格式就不要採用excel了,這裏推薦:

  • 以csv代替excel。

考慮到當前SpringBoot持久層框架通常爲JPA和mybatis,我們可以分別從這兩個框架實現百萬級數據導出的方案。

JPA實現百萬級數據導出

具體方案不妨參考:http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html

實現項目對應:https://github.com/knes1/todo

核心註解如下,需要加入到具體的Repository之上。方法的返回類型定義成Stream。Integer.MIN_VALUE告訴jdbc driver逐條返回數據。

	@QueryHints(value = @QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE))
	@Query(value = "select t from Todo t")
	Stream<Todo> streamAll();

此外還需要在Stream處理數據的方法之上添加@Transactional(readOnly = true),保證事物是隻讀的。

同時需要注入EntityManager,通過detach從內存中移除Stream的遍歷的對象。

	@RequestMapping(value = "/todos.csv", method = RequestMethod.GET)
	@Transactional(readOnly = true)
	public void exportTodosCSV(HttpServletResponse response) {
		response.addHeader("Content-Type", "application/csv");
		response.addHeader("Content-Disposition", "attachment; filename=todos.csv");
		response.setCharacterEncoding("UTF-8");
		try(Stream<Todo> todoStream = todoRepository.streamAll()) {
			PrintWriter out = response.getWriter();
			todoStream.forEach(rethrowConsumer(todo -> {
				String line = todoToCSV(todo);
				out.write(line);
				out.write("\n");
				entityManager.detach(todo);
			}));
			out.flush();
		} catch (IOException e) {
			log.info("Exception occurred " + e.getMessage(), e);
			throw new RuntimeException("Exception occurred while exporting results", e);
		}
	}

MyBatis實現百萬級數據導出

MyBatis實現逐條獲取數據,必須要自定義ResultHandler,然後在mapper.xml文件中,對應的select語句中添加fetchSize="-2147483648"。

最後通過SqlSession來執行查詢,並將返回的結果進行處理。

代碼如下:

自定義的ResultHandler,用於獲取數據:

@Slf4j
public class CustomResultHandler implements ResultHandler {

  private final CallbackProcesser callbackProcesser;

  public CustomResultHandler(
      CallbackProcesser callbackProcesser) {
    super();
    this.callbackProcesser = callbackProcesser;
  }

  @Override
  public void handleResult(ResultContext resultContext) {
    TradingDetailsDownload detailsDownload = (TradingDetailsDownload)resultContext.getResultObject();
    log.info("detailsDownload:{}",detailsDownload);
    callbackProcesser.processData(detailsDownload);
  }
}

獲取數據後的回調處理類CallbackProcesser,這個類專門用來將對象寫入到csv。

public class CallbackProcesser {

  private final HttpServletResponse response;

  public CallbackProcesser(HttpServletResponse response) {
    this.response = response;
    String fileName = System.currentTimeMillis() + ".csv";
    this.response.addHeader("Content-Type", "application/csv");
    this.response.addHeader("Content-Disposition", "attachment; filename="+fileName);
    this.response.setCharacterEncoding("UTF-8");
  }

  public <E> void processData(E record) {
    try {
      response.getWriter().write(record.toString()); //如果是要寫入csv,需要重寫toString,屬性通過","分割
      response.getWriter().write("\n");
    }catch (IOException e){
      e.printStackTrace();
    }
  }
}

獲取數據的核心service:(由於只做個簡單演示,就懶得寫成接口了) 

@Service
public class TradingDetailsService {

  private final SqlSessionTemplate sqlSessionTemplate;

  public TradingDetailsService(SqlSessionTemplate sqlSessionTemplate) {
    this.sqlSessionTemplate = sqlSessionTemplate;
  }

  public void downloadAsCsv(String coreComId, HttpServletResponse httpServletResponse)
      throws IOException {
    TradingDetailsDownloadExample tradingDetailsDownloadExample = new TradingDetailsDownloadExample();
    TradingDetailsDownloadExample.Criteria criteria = tradingDetailsDownloadExample.createCriteria();
    criteria.andIdIsNotNull();
    criteria.andCoreComIdEqualTo(coreComId);
    tradingDetailsDownloadExample.setOrderByClause(" id desc");
    HashMap<String, Object> param = new HashMap<>();
    param.put("oredCriteria", tradingDetailsDownloadExample.getOredCriteria());
    param.put("orderByClause", tradingDetailsDownloadExample.getOrderByClause());
    CustomResultHandler customResultHandler = new CustomResultHandler(new CallbackProcesser(httpServletResponse));
    sqlSessionTemplate.select(
        "com.example.demo.mapper.TradingDetailsDownloadMapper.selectByExample", param, customResultHandler);
    httpServletResponse.getWriter().flush();
    httpServletResponse.getWriter().close();
  }
}

下載的入口controller:

@RestController
@RequestMapping("down")
public class HelloController {

  private final TradingDetailsService tradingDetailsService;

  public HelloController(TradingDetailsService tradingDetailsService) {
    this.tradingDetailsService = tradingDetailsService;
  }

  @GetMapping("download_csv")
  public void downloadAsCsv(@RequestParam("coreComId") String coreComId, HttpServletResponse response)
      throws IOException {
    tradingDetailsService.downloadAsCsv(coreComId, response);
  }
}

實體類如下,由mybatis-generator自動生成,實際字段很多,下面只放出了部分字段。:

public class TradingDetailsDownload {
    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.id
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private Integer id;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.core_com_id
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String coreComId;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.add
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String add;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.amt_rate
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private Double amtRate;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.bank_acceptance_bill_amt
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private Double bankAcceptanceBillAmt;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.brch
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String brch;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.city
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String city;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.com_typ
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String comTyp;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.conduct_financial_transactions_amt
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private Double conductFinancialTransactionsAmt;

    /**
     * This field was generated by MyBatis Generator.
     * This field corresponds to the database column trading_details_download.core_com_nam
     *
     * @mbggenerated Thu Oct 31 11:43:00 CST 2019
     */
    private String coreComNam;
}

寫個存儲過程造它100萬條假數據進去:

CREATE DEFINER=`root`@`%` PROCEDURE `NewProc`()
BEGIN

SET @num = 1;
WHILE
@num < 1000000 DO


INSERT INTO `trading_details_download`(`core_com_id`, `add`, `amt_rate`, `bank_acceptance_bill_amt`, `brch`, `city`, `com_typ`, `conduct_financial_transactions_amt`, `core_com_nam`, `credit_amt`, `cust_id`, `district`, `dpst`, `dpst_daily`, `following_loan`, `fpa`, `industry1`, `industry2`, `is_client`, `is_hitech`, `is_loan`, `is_new_sp_or_sa`, `is_on_board`, `is_recent_three_mons_new_sp_or_sa`, `is_sl`, `is_vip_com`, `last_year_dpst`, `last_year_dpst_daily`, `letter_of_credit_amt`, `letter_of_guarantee_amt`, `level`, `loan_amt`, `nam`, `non_performing_loan`, `province`, `recent_year_in`, `recent_year_integerims`, `recent_year_out`, `recent_year_out_tims`, `registered_capital`, `registered_date`, `role`, `scale`, `substitute_tims`, `team`, `tel`, `this_year_amt`, `this_year_eva`, `this_year_in`, `this_year_substitute_amt`, `this_year_substitute_tims`, `this_year_tims`, `three_month_core_com_in`, `three_month_core_com_out`, `trade_mons`, `un_cmb_amt`) VALUES ('APRG21246', '科興科學園A1棟49樓', 0.1422, 3208132746360.17, '某上市企業總部', '漳州', '合夥', 8619650727872.58, '某醫藥技術有限公司', 6624346214937.97, '4888864469', '龍文區', 2319401156645.59, 8648101521823.08, 4026817601311.52, 2791285180576.86, '電力、熱力、燃氣及水生產和供應業', '大健康交易鏈', '是', '是', '是', '否', '是', '否', '否', '總部', 1421134076032.25, 6948196537024.35, 4534195429054.29, 220191634435.78, 2, 8460047551335.62, '北京燕京啤酒股份有限公司', 6320618506709.96, '福建', 7848516990131.7, NULL, 2512926179345.88, 760435, 1327348247660.02, '20130121', '供應商', '中型', '是', '戰略客戶團隊', '18308778888', 7538773630340.53, 3520434718096.73, 1607690495836.96, 7458777479144.64, 216257, 855652, 6486558126565.19, 5002169133815.36, 3, 2213873926430.35);

SET @num = @num + 1;

END WHILE;

然後運行存儲過程函數:

call NewProc();

然後瀏覽器運行: http://localhost:8080/down/download_csv?coreComId=APRG21246。同時終端運行top或者通過任務管理器觀察java進程,內存始終保持在一個比較低的使用率。

本機配置一般,8G內存,2.3 GHz Intel Core i5,cpu使用率最高大概維持在70%- 80%。

實驗證實了此方案確實避免了OOM,並且成功導出了csv,但是文件本身還是比較大的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章