都知道mysql5.7提供了json類型,mongodb也是有json,作爲dba,經常有rd諮詢如何選擇的問題。
下面對比了5.7的json和mongodb的json,可以看出來應該怎麼選擇了。
一:測試數據準備
mysql> select count() from m_test;
±---------+
| count() |
±---------+
| 20999199 |
±---------+
mongodb
db.test2.count();
21000001
都是2千萬的記錄
mysql的buffer設置了12g,數據文件也是12g,基本上都是內存操作了
db.test2.find().limit(1);
{ "_id" : ObjectId("5e205cf703eb69e9acd2f0c7"), "user_basic_info" : { "name" : "張三", "age" : "29", "address" : { "city" : "上海", "province" : "上海" } }, "work_exprs" : [ { "company" : "公司999999", "date_range" : "2001-2003" }, { "company" : "公司999999999999", "date_range" : "2003-2004" } ], "educations" : [ { "school" : "學校A", "date_range" : "1995-1998" }, { "school" : "學校B", "date_range" : "1995-1998" } ], "name" : "張三999999" }
mysql> select * from m_test limit 1\G
*************************** 1. row ***************************
id: 35212
user_info: {"name": "張三999999", "educations": [{"school": "學校A", "date_range": "1995-1998"}, {"school": "學校B", "date_range": "1995-1998"}], "work_exprs": [{"company": "公司999999", "date_range": "2001-2003"}, {"company": "公司999999999999", "date_range": "2003-2004"}], "user_basic_info": {"age": "29", "name": "張三", "address": {"city": "上海", "province": "上海"}}}
在表的字段上沒有加任何索引,先根據name進行查詢
mysql多次執行,去掉加載數據到內存的誤差
mysql> select * from m_test where json_extract(json_extract(user_info,'$.user_basic_info'),'$.name')='張三3752028932291';
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | user_info |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 35213 | {"educations": [{"school": "學校A", "company": "學校3752028932291", "date_range": "1995-1998"}, {"school": "學校B", "company": "學校37520289322913752028932291", "date_range": "1995-1998"}], "work_exprs": [{"company": "公司3752028932291", "date_range": "2001-2003"}, {"company": "公司37520289322913752028932291", "date_range": "2003-2004"}], "user_basic_info": {"age": "3752028932291", "name": "張三3752028932291", "address": {"city": "上海", "province": "上海"}}} |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (39.03 sec)
db.test2.find({"user_basic_info.name":"張三3752028932291"})
執行近20s
可以看到這個時間上mysql就差了一倍。
下面在mysql的json字段上添加索引,對比下加了索引的情況
Create Table: CREATE TABLE `m_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_info` json DEFAULT NULL,
`names_virtual2` varchar(50) GENERATED ALWAYS AS (json_extract(json_extract(`user_info`,'$.user_basic_info'),'$.name')) VIRTUAL,
`names_virtual3` varchar(50) GENERATED ALWAYS AS (json_extract(json_extract(`user_info`,'$.user_basic_info'),'$.name')) STORED,
PRIMARY KEY (`id`),
KEY `idx_name3` (`names_virtual3`),
KEY `idx_name` (`names_virtual2`)
) ENGINE=InnoDB AUTO_INCREMENT=21039616 DEFAULT CHARSET=utf8
看到加了2個虛擬列,並且創建了索引,但是我進行查詢的時候,通過索引卻無法找到記錄,實際上是有索引的。
mysql> select * from m_test where id=35213\G
*************************** 1. row ***************************
id: 35213
user_info: {"educations": [{"school": "學校A", "company": "學校3752028932291", "date_range": "1995-1998"}, {"school": "學校B", "company": "學校37520289322913752028932291", "date_range": "1995-1998"}], "work_exprs": [{"company": "公司3752028932291", "date_range": "2001-2003"}, {"company": "公司37520289322913752028932291", "date_range": "2003-2004"}], "user_basic_info": {"age": "3752028932291", "name": "張三3752028932291", "address": {"city": "上海", "province": "上海"}}}
names_virtual2: "張三3752028932291"
names_virtual3: "張三3752028932291"
1 row in set (0.00 sec)
mysql> select * from m_test where names_virtual2='張三3752028932291';
Empty set (0.00 sec)
mysql> select * from m_test where names_virtual3='張三3752028932291';
Empty set (0.00 sec)
mysql> select * from m_test where json_extract(json_extract(`user_info`,'$.user_basic_info'),'$.name')='張三3752028932291';
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------+
| id | user_info | names_virtual2 | names_virtual3 |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------+
| 35213 | {"educations": [{"school": "學校A", "company": "學校3752028932291", "date_range": "1995-1998"}, {"school": "學校B", "company": "學校37520289322913752028932291", "date_range": "1995-1998"}], "work_exprs": [{"company": "公司3752028932291", "date_range": "2001-2003"}, {"company": "公司37520289322913752028932291", "date_range": "2003-2004"}], "user_basic_info": {"age": "3752028932291", "name": "張三3752028932291", "address": {"city": "上海", "province": "上海"}}} | "張三3752028932291" | "張三3752028932291" |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------+
1 row in set (1 min 28.49 sec)
mongodb加了索引後,查詢時間是0s
上面通過索引無法找到數據,原因是返回的是json,需要返回字符傳
另外在創建store的虛擬咧上的時候,磁盤空間直接翻倍了,達到了20g,mongodb的磁盤文件大小是2.4g。
爭取的使用方式是
alter table m_test add names_virtual2 varchar(50) GENERATED ALWAYS AS (user_info->>"$.user_basic_info.name") VIRTUAL;
Query OK, 0 rows affected (0.09 sec)
mysql> show create table m_test\G
*************************** 1. row ***************************
Table: m_test
Create Table: CREATE TABLE `m_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_info` json DEFAULT NULL,
`names_virtual2` varchar(50) GENERATED ALWAYS AS (json_unquote(json_extract(`user_info`,'$.user_basic_info.name'))) VIRTUAL,
PRIMARY KEY (`id`),
KEY `idx_name` (`names_virtual2`)
) ENGINE=InnoDB AUTO_INCREMENT=21039616 DEFAULT CHARSET=utf8
這樣查詢是正確的,並且查詢速度也是0s
mysql> select * from m_test where names_virtual2='張三3752028932291'\G
*************************** 1. row ***************************
id: 35213
user_info: {"educations": [{"school": "學校A", "company": "學校3752028932291", "date_range": "1995-1998"}, {"school": "學校B", "company": "學校37520289322913752028932291", "date_range": "1995-1998"}], "work_exprs": [{"company": "公司3752028932291", "date_range": "2001-2003"}, {"company": "公司37520289322913752028932291", "date_range": "2003-2004"}], "user_basic_info": {"age": "3752028932291", "name": "張三3752028932291", "address": {"city": "上海", "province": "上海"}}}
names_virtual2: 張三3752028932291
1 row in set (0.00 sec)
從上面的測試看,在2kw的數據量下,單個的訪問json的速度是一樣的。
mysql使用的磁盤容量是15g,比mongodb的磁盤消耗要大很多