很多同學都有這樣的困擾:
- 工作中項目的數據量不大,遇不到sql優化的場景:單表就幾萬,我優化個der啊;
- 業務對性能要求不高,遠遠沒達到性能瓶頸:咱這項目又不是不能跑,優化個der啊;
確實,如果你的項目體量不大,不管是數據層還是應用層,都很難接觸到性能優化,但是我們可以自己造數據啊!!
掃碼體驗
今天我帶來了一個demo,不僅讓你能把多線程運用到實際項目中,還能用它往數據庫造測試數據,讓你體驗下大數據量的表優化
定個小目標,今天造它一億條數據!!
首先搞清楚,不要爲了用技術而用技術,技術一定是爲了實現需求:
- 插入一億條數據,這是需求;
- 爲了提高效率,運用多線程異步插入,這是方案;
1、爲了儘可能模擬真實場景,我們new個對象
靠phone和createTime倆字段,能大大降低數據重複度,拋開別的字段不說,這倆字段基本能保證沒有重複數據,所以我們最終的數據很真實,沒有一條是重複的,而且,最後還能通過createTime來統計每秒插入條數,nice~
public class Person {
private Long id;
private String name;//姓名
private Long phone;//電話
private BigDecimal salary;//薪水
private String company;//公司
private Integer ifSingle;//是否單身
private Integer sex;//性別
private String address;//住址
private LocalDateTime createTime;
private String createUser;
}
2、想要插的更快,我們得使用MyISAM引擎,並且要主鍵自增
DDL:
CREATE TABLE `person` (
`id` bigint NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`phone` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`salary` decimal(10,2) NOT NULL,
`company` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`if_single` tinyint NOT NULL,
`sex` tinyint NOT NULL,
`address` varchar(225) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`create_time` datetime NOT NULL,
`create_user` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=30170001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
3、爲了模擬真實數據,我們得用到一些枚舉值和隨機算法
部分屬性枚舉值:
private String[] names = {"黃某人", "負債程序猿", "譚sir", "郭德綱", "蔡徐雞", "蔡徐老母雞", "李狗蛋", "鐵蛋", "趙鐵柱"};
private String[] addrs = {"二仙橋", "成華大道", "春熙路", "錦裏", "寬窄巷子", "雙子塔", "天府大道", "軟件園", "熊貓大道", "交子大道"};
private String[] companys = {"京東", "騰訊", "百度", "小米", "米哈遊", "網易", "字節跳動", "美團", "螞蟻", "完美世界"};
隨機獲取person
private Person getPerson() {
Person person = Person.builder()
.name(names[random.nextInt(names.length)])
.phone(18800000000L + random.nextInt(88888888))
.salary(new BigDecimal(random.nextInt(99999)))
.company(companys[random.nextInt(companys.length)])
.ifSingle(random.nextInt(2))
.sex(random.nextInt(2))
.address("四川省成都市" + addrs[random.nextInt(addrs.length)])
.createUser(names[random.nextInt(names.length)]).build();
return person;
}
5、orm層用的mybatis
<insert id="insertList" parameterType="com.example.demos.entity.Person">
insert into person (name, phone, salary, company, if_single, sex, address, create_time, create_user)
values
<foreach collection="list" item="item" separator=",">
(#{item.name}, #{item.phone}, #{item.salary}, #{item.company}, #{item.ifSingle}, #{item.sex},
#{item.address}, now(), #{item.createUser})
</foreach>
</insert>
準備工作完成,開始寫核心邏輯
思路:
1、想要拉高插入效率,肯定不能夠一條一條插了,必須得foreach批量插入,經測試,單次批量3w條以下時性價比最高,並且不用修改mysql配置
2、文章開頭說了,得開多個線程異步插入,我們先把應用層效率拉滿,mysql頂不頂得住
3、我們不可能單次提交一億次insert,這誰頂得住,而且大量插入操作會很耗時,短時間內完不成,我們不可能一直守着,我的方案是用定時任務
算了屁話不多說,直接上demo
@Component
public class PersonService {
private static final int THREAD_COUNT = 10;
@Autowired
private PersonMapper personMapper;
@Autowired
private ThreadPoolExecutor executor;
private AtomicInteger integer = new AtomicInteger();
private Random random = new Random();
private String[] names = {"黃某人", "負債程序猿", "譚sir", "郭德綱", "蔡徐雞", "蔡徐母雞", "李狗蛋", "鐵蛋", "趙鐵柱"};
private String[] addrs = {"二仙橋", "成華大道", "春熙路", "錦裏", "寬窄巷子", "雙子塔", "天府大道", "軟件園", "熊貓大道", "交子大道"};
private String[] companys = {"京東", "騰訊", "百度", "小米", "米哈遊", "網易", "字節跳動", "美團", "螞蟻", "完美世界"};
@Scheduled(cron = "0/15 * * * * ?")
public void insertList() {
System.out.println("本輪任務開始,總任務數:" + THREAD_COUNT);
long start = System.currentTimeMillis();
AtomicLong end = new AtomicLong();
for (int i = 0; i < THREAD_COUNT; i++) {
Thread thread = new Thread(() -> {
try {
for (int j = 0; j < 20; j++) {
personMapper.insertList(getPersonList(5000));
}
end.set(System.currentTimeMillis());
System.out.println("本輪任務耗時:" + (end.get() - start) + "____已執行" + integer.addAndGet(1) + "個任務" + "____當前隊列任務數" + executor.getQueue().size());
} catch (Exception e) {
e.printStackTrace();
}
});
try {
executor.execute(thread);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
private ArrayList<Person> getPersonList(int count) {
ArrayList<Person> persons = new ArrayList<>(count);
for (int i = 0; i < count; i++) {
persons.add(getPerson());
}
return persons;
}
private Person getPerson() {
Person person = Person.builder()
.name(names[random.nextInt(names.length)])
.phone(18800000000L + random.nextInt(88888888))
.salary(new BigDecimal(random.nextInt(99999)))
.company(companys[random.nextInt(companys.length)])
.ifSingle(random.nextInt(2))
.sex(random.nextInt(2))
.address("四川省成都市" + addrs[random.nextInt(addrs.length)])
.createUser(names[random.nextInt(names.length)]).build();
return person;
}
}
我的線程池配置,我電腦配置比較拉跨,只有12個線程…
@Configuration
public class ThreadPoolExecutorConfig {
@Bean
public ThreadPoolExecutor threadPoolExecutor() {
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 12, 5, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100));
executor.allowCoreThreadTimeOut(true);
return executor;
}
}
測試
現在表是空的
項目跑起來
已經在開始插了,掛在後臺讓它自己跑!
25 minutes later ,看下數據庫
已經插入了1.04億條數據,需求完成
第一條數據是15:54:15開始的,耗時大概25min
再來從數據庫中看下一秒插入多少條,直接count某秒即可
一秒8.5w,嘎嘎快
來說下demo中核心的幾個點:
- 關於線程:我的cpu只有十二個線程,所以核心線程設置的10,留兩個線程打雜;
- 關於線程中的邏輯:每個線程有20次循環,每次循環插入5000條;
- 關於獲取隨機對象:我沒有統計創建對象的耗時,因爲即使是創建100w個對象,但是這都是內存操作,跟insert這種io操作比起來,耗時幾乎可以忽略,我就不測試了,你可以自己試試;
- 關於效率:你們看到的版本(10 * 20 * 5000)是我只優化過幾次的半成品,這種搭配最終的效率是100w條耗時12.5s,效率肯定不是最高的,但基本夠用!!
可以看看之前的測試效率記錄
10 * 100 * 1000:22-23s
10 * 50 * 2000:19-20s
10 * 10 * 10000 :18-20s
可以參考記錄進行深度調優
哦對了,想效率更快的話,表不要建索引,insert時維護索引也是一筆不小的開銷!