在mysql中將JSON數組轉換爲行數據

一、背景

在mysql表中有一個字符串類型的字段,裏面儲存了JSON格式的數組。
由於mysql單個字段的長度是有限的,若JSON數組太長,容易出現長度溢出的異常,所以要將該字段轉換爲一張獨立的mysql表。

此文檔適用於已知JSON數組最大長度的場景,一般情況下可以通過字符串的長度換算得到數組的最大長度。

二、基本知識

mysql從5.7開始,增加了JSON函數,支持對字符串進行JSON格式轉換。這裏主要用到兩個函數:JSON_EXTRACTJSON_UNQUOTE

  • JSON_EXTRACT(json_doc, path[, path] ...)

    json_doc中解析JSON文檔,返回path參數指定的數據。如果任一參數爲NULL,那麼返回值也是NULL。如果json_doc不是合法的JSON數據,或者path不是合法的參數,都會拋出異常。
    如果提供了多個path參數,返回結果會自動封裝爲數組,按照提供的參數順序封裝數據。如果只有一個path參數,返回結果就只有一個數據。

    示例:

    mysql> SELECT JSON_EXTRACT('[10, 20, [30, 40]]', '$[1]');
    +--------------------------------------------+
    | JSON_EXTRACT('[10, 20, [30, 40]]', '$[1]') |
    +--------------------------------------------+
    | 20                                         |
    +--------------------------------------------+
    mysql> SELECT JSON_EXTRACT('[10, 20, [30, 40]]', '$[1]', '$[0]');
    +----------------------------------------------------+
    | JSON_EXTRACT('[10, 20, [30, 40]]', '$[1]', '$[0]') |
    +----------------------------------------------------+
    | [20, 10]                                           |
    +----------------------------------------------------+
    mysql> SELECT JSON_EXTRACT('[10, 20, [30, 40]]', '$[2][*]');
    +-----------------------------------------------+
    | JSON_EXTRACT('[10, 20, [30, 40]]', '$[2][*]') |
    +-----------------------------------------------+
    | [30, 40]                                      |
    +-----------------------------------------------+
    
  • JSON_UNQUOTE(json_val)

    反引文JSON數據,返回一個utf8mb4編碼的字符串。如果JSON數據爲NULL,返回也是NULL

    針對普通字符串,該函數相當於去掉字符串的雙引號。針對特殊字符串,則會根據sql_mode進行轉換。此文檔不作詳細介紹。

    示例:

    • 普通字符串
    mysql> SET @j = '"abc"';
    mysql> SELECT @j, JSON_UNQUOTE(@j);
    +-------+------------------+
    | @j    | JSON_UNQUOTE(@j) |
    +-------+------------------+
    | "abc" | abc              |
    +-------+------------------+
    mysql> SET @j = '[1, 2, 3]';
    mysql> SELECT @j, JSON_UNQUOTE(@j);
    +-----------+------------------+
    | @j        | JSON_UNQUOTE(@j) |
    +-----------+------------------+
    | [1, 2, 3] | [1, 2, 3]        |
    +-----------+------------------+
    
    • 特殊字符串
    mysql> SELECT @@sql_mode;
    +------------+
    | @@sql_mode |
    +------------+
    |            |
    +------------+
    
    mysql> SELECT JSON_UNQUOTE('"\\t\\u0032"');
    +------------------------------+
    | JSON_UNQUOTE('"\\t\\u0032"') |
    +------------------------------+
    |       2                           |
    +------------------------------+
    
    mysql> SET @@sql_mode = 'NO_BACKSLASH_ESCAPES';
    mysql> SELECT JSON_UNQUOTE('"\\t\\u0032"');
    +------------------------------+
    | JSON_UNQUOTE('"\\t\\u0032"') |
    +------------------------------+
    | \t\u0032                     |
    +------------------------------+
    
    mysql> SELECT JSON_UNQUOTE('"\t\u0032"');
    +----------------------------+
    | JSON_UNQUOTE('"\t\u0032"') |
    +----------------------------+
    |       2                         |
    +----------------------------+
    

三、實現原理

爲了將JSON數組轉換爲行,需要遍歷數組的所有元素。

  • 通過枚舉下標的方式,與JSON數組進行聯合查詢,獲得所有數組元素。
  • 過濾所有空數據

3.1 數據準備

-- 創建原始表
CREATE TABLE `application_info` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主鍵',
  `application_id` varchar(100) NOT NULL COMMENT '在執行引擎上的任務ID,如Presto作業ID,YARN的applicationId',
  `query_id_str` VARCHAR(1024) COMMENT 'JSON數組' ,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='作業信息表'

-- 插入數據
INSERT INTO `application_info` VALUES ('application_01','[\"20200520_072820_00012_syrpv\",\"20200520_072820_00013_syrpv\"]');

-- 創建拆分表
CREATE TABLE `application_job_id_of_engine` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '自增主鍵',
  `application_info_id` bigint(20) NOT NULL COMMENT '任務主鍵',
  `application_id` varchar(100) NOT NULL COMMENT '任務ID。該字段是冗餘字段,方便排查問題時,快速查看DS的任務ID',
  `job_id` varchar(100) NOT NULL COMMENT '任務在執行引擎中的唯一標識。Presto - query_id;YARN - application_id',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任務在執行引擎中的唯一標識。Presto - query_id;YARN - application_id'

3.2 遷移數據

INSERT INTO application_job_id_of_engine (application_info_id ,application_id , job_id )
SELECT
    id,
    application_id,
    JSON_UNQUOTE(JSON_EXTRACT(query_id_str , CONCAT('$[', idx, ']'))) AS query_id
FROM application_info
-- 內嵌關聯表,生成JSON數組的下標
JOIN (
    SELECT  0  AS idx UNION
    SELECT  1  AS idx UNION
    SELECT  2  AS idx UNION
    SELECT  3  AS idx UNION
    SELECT  4  AS idx UNION
    SELECT  5  AS idx UNION
    SELECT  6  AS idx UNION
    SELECT  7  AS idx UNION
    SELECT  8  AS idx UNION
    SELECT  9  AS idx UNION
    SELECT  10 AS idx UNION
    SELECT  11 AS idx UNION
    SELECT  12 AS idx UNION
    SELECT  13 AS idx UNION
    SELECT  14 AS idx UNION
    SELECT  15 AS idx UNION
    SELECT  16 AS idx UNION
    SELECT  17 AS idx UNION
    SELECT  18 AS idx UNION
    SELECT  19 AS idx UNION
    SELECT  20 AS idx UNION
    SELECT  21 AS idx UNION
    SELECT  22 AS idx UNION
    SELECT  23 AS idx UNION
    SELECT  24 AS idx UNION
    SELECT  25 AS idx UNION
    SELECT  26 AS idx UNION
    SELECT  27 AS idx UNION
    SELECT  28 AS idx UNION
    SELECT  29 AS idx UNION
    SELECT  30 AS idx UNION
    SELECT  31 AS idx UNION
    SELECT  32
    -- query_id_str(1024)最多存儲33個query_id(31)
) AS indexes
-- 過濾空數據
WHERE JSON_EXTRACT(query_id_str, CONCAT('$[', idx, ']')) IS NOT NULL
ORDER BY id;
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章