关于定时任务中批量更新方案思考

可参考：

https://www.cnblogs.com/ShaYeBlog/p/5762553.html

https://blog.csdn.net/li396864285/article/details/53607536

一，场景：

定时任务需要从中间表同步数据到业务表，然后更新中间表状态为已同步。

二，处理方案：

【分批处理】每次查询500条数据，对500条数据批量写入业务表，批量更新中间表状态

【循环处理】每次执行定时任务，通过do while，条件是只要能够查到中间表存在未同步的数据，就执行同步操作。这样好处：

每次跑定时任务可以尽可能的多的处理数据，甚至可以处理完一段时间内写入中间表的所有未同步数据。

可能会有疑问，如果把定时任务执行周期缩短，不也可以快速处理数据。这两种方案可以综合使用。

可能又有疑问，执行周期过短，会导致上一个任务还没处理完成，下一个任务就开始进行业务处理，引发并发问题，数据被重复获取处理。在现在这种场景，中间表数据会被重复获取，重复写入业务表，重复更新中间表。对于业务表重复写入，因为根据有唯一索引，存在更新，不存在写入，并不会导致错误；对于更新中间表，每次只是更新状态已完成，重复更新也没有问题。

其他不同场景，执行周期过短可能会引发严重问题。之前有种场景是，在仓储系统中，定时取原始数据组装成拣货单，导致原始数据被重复获取，导致重复组单，这时需要考虑业务的执行时间与定时任务的执行周期，同时考虑加分布式锁。

三，疑问：

1，每批处理500基础什么原因考虑，多条更合适，如果数据量过大引发什么问题？对于查询，可能会很慢，对于更新，可能会锁表，还是行级锁，出现锁定问题，有引发其他什么问题？

2，使用do while是否可以避免，执行周期过短导致的并发问题

因为先执行do内容，在判断条件，do中业务执行完毕只有才会进行下一次条件判断。不会存在中间表中未同步数据被重复取到的问题。

四，定时任务实现

使用的当当网的elastci-job

五，代码实现

1，定时任务类

@Component
@Slf4j
@ElasticSimpleJob(taskName = "syncBaseDeliveryAreaConfigFreshJob")
public class SyncBaseDeliveryAreaConfigFreshJob implements SimpleJob {

    @Resource
    private BaseDeliveryAreaConfigFreshService baseDeliveryAreaConfigFreshService;

    @Override
    public void execute(ShardingContext shardingContext) {
        log.info("SyncBaseDeliveryAreaConfigFreshJob start:{}", shardingContext);
        try {
            List<MidDeliveryAreaConfigFresh> list;
            do {
                list = baseDeliveryAreaConfigFreshService.selectPendingDataList(BizConstant.SPLIT_NUM);
                baseDeliveryAreaConfigFreshService.syncBaseDeliveryAreaConfigFresh(list);
            }
            while (CollectionUtils.isNotEmpty(list));
        } catch (Exception e) {
            log.error("SyncBaseDeliveryAreaConfigFreshJob  exception!:{}", e);
        }
        log.info("SyncBaseDeliveryAreaConfigFreshJob end");
    }
}

2，业务实现类

    @Transactional
    @Override
    public void syncBaseDeliveryAreaConfigFresh(List<MidDeliveryAreaConfigFresh> list) {
        if (CollectionUtils.isEmpty(list)) {
            return;
        }
        // 取配送区域名称，取中间表，取结果表可能未同步
        List<String> deliveryAreaCodeList = list.stream().map(MidDeliveryAreaConfigFresh::getDeliveryAreaCode).distinct().collect(Collectors.toList());
        List<MidDeliveryAreaFresh> midDeliveryAreaFreshList = midDeliveryAreaFreshMapper.queryByDeliveryAreaCodes(deliveryAreaCodeList);
        Map<String, String> map = midDeliveryAreaFreshList.stream().collect(Collectors.toMap(v -> v.getDeliveryAreaCode(), v -> v.getDescription(), (a, b) -> b));

        // 根据结果表唯一索引分组，取中间表重复最新一条
        Map<String, List<MidDeliveryAreaConfigFresh>> mapGroup = list.stream().collect(Collectors.groupingBy(v ->
                v.getDcCode() + v.getBigCategoryCode() + v.getSmallCategoryCode() + v.getProductCode() + v.getStockLoc()));
        List<MidDeliveryAreaConfigFresh> resultList = mapGroup.values().stream().map(listv -> listv.get(listv.size() - 1)).collect(Collectors.toList());

        // 将创建时间最大的数据插入或者更新到结果表
        baseDeliveryAreaConfigFreshMapper.saveOrUpdateBatch(conventResult(resultList, map));

        // 将中间表此批次处理的所有数据状态置为已同步
        List<Long> ids = list.stream().map(MidDeliveryAreaConfigFresh::getId).collect(Collectors.toList());
        midDeliveryAreaConfigFreshMapper.updateProcessFlagByIds(ids);

    }

    @Override
    public List<MidDeliveryAreaConfigFresh> selectPendingDataList(int num) {
        return midDeliveryAreaConfigFreshMapper.selectPendingDataList(num);
    }

3，批量更新or写入业务表

注意判断维度是什么，表的唯一索引：

void saveOrUpdateBatch(@Param("list") List<BaseDeliveryAreaConfigFresh> baseDeliveryAreaConfigFreshList);

 <insert id="saveOrUpdateBatch">
        insert into base_delivery_area_config_fresh (
        dc_code, stock_loc_code, delivery_area_code, delivery_area_name, small_category_code,
        big_category_code, product_code, is_delete, created_time, created_by, updated_time,
        updated_by
        )values
        <foreach collection="list" item="item" separator=",">
            (
            #{item.dcCode,jdbcType=VARCHAR},
            #{item.stockLocCode,jdbcType=VARCHAR},
            #{item.deliveryAreaCode,jdbcType=VARCHAR},
            #{item.deliveryAreaName,jdbcType=VARCHAR},
            #{item.smallCategoryCode,jdbcType=VARCHAR},
            #{item.bigCategoryCode,jdbcType=VARCHAR},
            #{item.productCode,jdbcType=VARCHAR},
            #{item.isDelete,jdbcType=INTEGER},
            #{item.createdTime,jdbcType=TIMESTAMP},
            #{item.createdBy,jdbcType=VARCHAR},
            #{item.updatedTime,jdbcType=TIMESTAMP},
            #{item.updatedBy,jdbcType=VARCHAR}
            )
        </foreach>
        ON DUPLICATE KEY UPDATE
        `dc_code` = VALUES(`dc_code`),
        `stock_loc_code` = VALUES(`stock_loc_code`),
        `delivery_area_code` = VALUES(`delivery_area_code`),
        `delivery_area_name` = VALUES(`delivery_area_name`),
        `small_category_code` = VALUES(`small_category_code`),
        `big_category_code` = VALUES(`big_category_code`),
        `product_code` = VALUES(`product_code`),
        `is_delete` = VALUES(`is_delete`),
        `created_time` = VALUES(`created_time`),
        `updated_time` = VALUES(`updated_time`),
        `created_by` = VALUES(`created_by`),
        `updated_by` = VALUES(`updated_by`)
    </insert>
</mapper>

4，更新中间表

void updateProcessFlagByIds(@Param("ids") List<Long> ids);

<update id="updateProcessFlagByIds">
  update mid_delivery_area_config_fresh set process_flag = 1
  where id in
  <foreach collection="ids" item="id" open="(" close=")" separator=",">
    #{id}
  </foreach>
</update>

关于定时任务中批量更新方案思考

一，场景：

二，处理方案：

三，疑问：

四，定时任务实现

五，代码实现

linux安装cuda和cudnn

测试人员都是画画大神，让我看看谁还不会用代码图？

Object.values()对象遍历

我拍了拍Redis，被移出了群聊···

网络现代化通向云原生应用的高速公路

面试官：说说你对序列化的理解

我宣布，这是我找到的史上AI最全论文体系！

記錄一次多表關聯查詢

apollo重要服務之metaService

阿里雲服務使用docker安裝mysql

什麼是 Elasticsearch？一篇搞懂

ES_BoolQueryBuilder

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結