技術分享 | MySQL Load Data 的多種用法

作者：餘振興

愛可生 DBA 團隊成員，熟悉 Oracle、MySQL、MongoDB、Redis，最近在盤 TiDB，擅長架構設計、故障診斷、數據遷移、災備構建等等。負責處理客戶 MySQL 及我司自研 DMP 數據庫管理平臺日常運維中的問題。熱衷技術分享、編寫技術文檔。

本文來源：原創投稿

* 愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註明來源。

本文目錄

一、LOAD 基本背景

二、LOAD 基礎參數

三、LOAD 示例數據及示例表結構

四、LOAD 場景示例

場景 1. LOAD 文件中的字段比數據表中的字段多
場景 2. LOAD 文件中的字段比數據表中的字段少
場景 3. LOAD 生成自定義字段數據
場景 4. LOAD 定長數據

五、LOAD 總結

一、LOAD 基本背景

我們在數據庫運維過程中難免會涉及到需要對文本數據進行處理，並導入到數據庫中，本文整理了一些導入導出時常見的場景進行示例演示。

二、LOAD 基礎參數

文章後續示例均使用以下命令導出的 csv 格式樣例數據（以 , 逗號做分隔符，以 " 雙引號作爲界定符）

-- 導出基礎參數
select * into outfile '/data/mysql/3306/tmp/employees.txt'
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
from employees.employees limit 10;

-- 導入基礎參數
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
...

三、LOAD 示例數據及示例表結構

以下爲示例數據，表結構及對應關係信息

-- 導出的文件數據內容
[root@10-186-61-162 tmp]# cat employees.txt
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10004","1954-05-01","Chirstian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

-- 示例表結構
SQL > desc demo.emp;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| emp_no      | int           | NO   | PRI | NULL    |       |
| birth_date  | date          | NO   |     | NULL    |       |
| first_name  | varchar(16)   | NO   |     | NULL    |       |
| last_name   | varchar(16)   | NO   |     | NULL    |       |
| fullname    | varchar(32)   | YES  |     | NULL    |       | -- 表新增字段,導出數據文件中不存在
| gender      | enum('M','F') | NO   |     | NULL    |       |
| hire_date   | date          | NO   |     | NULL    |       |
| modify_date | datetime      | YES  |     | NULL    |       | -- 表新增字段,導出數據文件中不存在
| delete_flag | char(1)       | YES  |     | NULL    |       | -- 表新增字段,導出數據文件中不存在
+-------------+---------------+------+-----+---------+-------+

-- 導出的數據與字段對應關係
emp_no  birth_date    first_name   last_name    gender  hire_date
"10001"  "1953-09-02"  "Georgi"     "Facello"    "M"    "1986-06-26"
"10002"  "1964-06-02"  "Bezalel"    "Simmel"     "F"    "1985-11-21"
"10003"  "1959-12-03"  "Parto"      "Bamford"    "M"    "1986-08-28"
"10004"  "1954-05-01"  "Chirstian"  "Koblick"    "M"    "1986-12-01"
"10005"  "1955-01-21"  "Kyoichi"    "Maliniak"   "M"    "1989-09-12"
"10006"  "1953-04-20"  "Anneke"     "Preusig"    "F"    "1989-06-02"
"10007"  "1957-05-23"  "Tzvetan"    "Zielinski"  "F"    "1989-02-10"
"10008"  "1958-02-19"  "Saniya"     "Kalloufi"   "M"    "1994-09-15"
"10009"  "1952-04-19"  "Sumant"     "Peac"       "F"    "1985-02-18"
"10010"  "1963-06-01"  "Duangkaew"  "Piveteau"   "F"    "1989-08-24"

四、LOAD 場景示例

場景 1. LOAD 文件中的字段比數據表中的字段多

只需要文本文件中部分數據導入到數據表中

-- 臨時創建2個字段的表結構
SQL > create table emp_tmp select emp_no,hire_date from emp;
SQL > desc emp_tmp;
+-----------+------+------+-----+---------+-------+
| Field     | Type | Null | Key | Default | Extra |
+-----------+------+------+-----+---------+-------+
| emp_no    | int  | NO   |     | NULL    |       |
| hire_date | date | NO   |     | NULL    |       |
+-----------+------+------+-----+---------+-------+

-- 導入數據語句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp_tmp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6) -- 該部分對應employees.txt文件中6列數據
-- 只對導出數據中指定的2個列與表中字段做匹配,mapping關係指定的順序不影響導入結果
set hire_date=@C6,
    emp_no=@C1; 

-- 導入數據結果示例
SQL > select * from emp_tmp;
+--------+------------+
| emp_no | hire_date  |
+--------+------------+
|  10001 | 1986-06-26 |
|  10002 | 1985-11-21 |
|  10003 | 1986-08-28 |
|  10004 | 1986-12-01 |
|  10005 | 1989-09-12 |
|  10006 | 1989-06-02 |
|  10007 | 1989-02-10 |
|  10008 | 1994-09-15 |
|  10009 | 1985-02-18 |
|  10010 | 1989-08-24 |
+--------+------------+
10 rows in set (0.0016 sec)

場景 2. LOAD 文件中的字段比數據表中的字段少

表字段不僅包含文本文件中所有數據，還包含了額外的字段

-- 導入數據語句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6) -- 該部分對應employees.txt文件中6列數據
-- 將文件中的字段與表中字段做mapping對應,表中多出的字段不做處理
set emp_no=@C1,
   birth_date=@C2,
   first_name=@C3,
   last_name=@C4,
   gender=@C5,
   hire_date=@C6;

場景 3. LOAD 生成自定義字段數據

從場景 2 的驗證可以看到，emp 表中新增的字段 fullname,modify_date,delete_flag 字段在導入時並未做處理，被置爲了 NULL 值，如果需要對其進行處理，可在 LOAD 時通過 MySQL支持的函數 或給定 固定值 自行定義數據，對於文件中存在的字段也可做函數處理，結合導入導出，實現簡單的 ETL 功能，如下所示：

-- 導入數據語句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)              -- 該部分對應employees.txt文件中6列數據

-- 以下部分明確對錶中字段與數據文件中的字段做Mapping關係,不存在的數據通過函數處理生成(也可設置爲固定值)
set emp_no=@C1,
   birth_date=@C2,
   first_name=upper(@C3),              -- 將導入的數據轉爲大寫
   last_name=lower(@C4),               -- 將導入的數據轉爲小寫
   fullname=concat(first_name,' ',last_name),    -- 對first_name和last_name做拼接
   gender=@C5,
   hire_date=@C6 ,
   modify_date=now(),                 -- 生成當前時間數據
   delete_flag=if(hire_date<'1988-01-01','Y','N'); -- 對需要生成的值基於某一列做條件運算

場景 4. LOAD 定長數據

定長數據的特點如下所示，可以使用函數取出字符串中固定長度來生成指定列數據

SQL > select 
    c1 as sample_data,
    substr(c1,1,3)  as c1,
    substr(c1,4,3)  as c2,
    substr(c1,7,2)  as c3,
    substr(c1,9,5)  as c4,
    substr(c1,14,3) as c5,
    substr(c1,17,3) as c6 from t1
    
*************************** 1. row ***************************
sample_data: ABC餘振興CDMySQLEFG數據庫
         c1: ABC
         c2: 餘振興
         c3: CD
         c4: MySQL
         c5: EFG
         c6: 數據庫

定長數據導入需要明確每列數據佔用的字符個數，以下直接使用 rpad 對現有的表數據填充空格的方式生成定長數據用作示例使用

-- 生成定長數據
SQL > select 
    concat(rpad(emp_no,10,' '),
          rpad(birth_date,19,' '),
          rpad(first_name,14,' '),
          rpad(last_name,16,' '),
          rpad(gender,2,' '),
          rpad(hire_date,19,' ')) as fixed_length_data 
      from employees.employees limit 10;

+----------------------------------------------------------------------------------+
| fixed_length_data                                                                |
+----------------------------------------------------------------------------------+
| 10001     1953-09-02         Georgi        Facello         M 1986-06-26          |
| 10002     1964-06-02         Bezalel       Simmel          F 1985-11-21          |
| 10003     1959-12-03         Parto         Bamford         M 1986-08-28          |
| 10004     1954-05-01         Chirstian     Koblick         M 1986-12-01          |
| 10005     1955-01-21         Kyoichi       Maliniak        M 1989-09-12          |
| 10006     1953-04-20         Anneke        Preusig         F 1989-06-02          |
| 10007     1957-05-23         Tzvetan       Zielinski       F 1989-02-10          |
| 10008     1958-02-19         Saniya        Kalloufi        M 1994-09-15          |
| 10009     1952-04-19         Sumant        Peac            F 1985-02-18          |
| 10010     1963-06-01         Duangkaew     Piveteau        F 1989-08-24          |
+----------------------------------------------------------------------------------+

-- 導出定長數據
select 
    concat(rpad(emp_no,10,' '),
          rpad(birth_date,19,' '),
          rpad(first_name,14,' '),
          rpad(last_name,16,' '),
          rpad(gender,2,' '),
          rpad(hire_date,19,' ')) as fixed_length_data 
into outfile '/data/mysql/3306/tmp/employees_fixed.txt'
character set utf8mb4
lines terminated by '\n'
from employees.employees limit 10;

-- 導出數據示例
[root@10-186-61-162 tmp]# cat employees_fixed.txt
10001     1953-09-02         Georgi        Facello         M 1986-06-26
10002     1964-06-02         Bezalel       Simmel          F 1985-11-21
10003     1959-12-03         Parto         Bamford         M 1986-08-28
10004     1954-05-01         Chirstian     Koblick         M 1986-12-01
10005     1955-01-21         Kyoichi       Maliniak        M 1989-09-12
10006     1953-04-20         Anneke        Preusig         F 1989-06-02
10007     1957-05-23         Tzvetan       Zielinski       F 1989-02-10
10008     1958-02-19         Saniya        Kalloufi        M 1994-09-15
10009     1952-04-19         Sumant        Peac            F 1985-02-18
10010     1963-06-01         Duangkaew     Piveteau        F 1989-08-24

-- 導入定長數據
load data infile '/data/mysql/3306/tmp/employees_fixed.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@row)  -- 對一行數據定義爲一個整體
set emp_no   = trim(substr(@row,1,10)),-- 使用substr取前10個字符,並去除頭尾空格數據
   birth_date = trim(substr(@row,11,19)),-- 後續字段以此類推
   first_name = trim(substr(@row,30,14)),
   last_name  = trim(substr(@row,44,16)),
   fullname  = concat(first_name,' ',last_name),  -- 對first_name和last_name做拼接
   gender   = trim(substr(@row,60,2)),
   hire_date  = trim(substr(@row,62,19)),
   modify_date = now(),
   delete_flag = if(hire_date<'1988-01-01','Y','N'); -- 對需要生成的值基於某一列做條件運算

五、LOAD 總結

1. 默認情況下導入的順序以文本文件 列-從左到右，行-從上到下 的順序導入

2. 如果表結構和文本數據不一致，建議將文本文件中的各列依次順序編號並與表中字段建立 mapping 關係，以防數據導入到錯誤的字段

3. 對於待導入的文本文件較大的場景，建議將文件 按行拆分 爲多個小文件，如用 split 拆分

4. 對文件導入後建議執行以下語句驗證導入的數據是否有 Warning,ERROR 以及導入的數據量

GET DIAGNOSTICS @p1=NUMBER,@p2=ROW_COUNT;
select @p1 AS ERROR_COUNT,@p2 as ROW_COUNT;

5. 文本文件數據與表結構存在過大的差異或數據需要做清洗轉換，建議還是用專業的 ETL 工具或先粗略導入 MySQL 中再進行加工轉換處理。

文章推薦：

技術分享 | MySQL 主從複製中創建複製用戶的時機探討

技術分享 | 一次數據庫遷移

技術分享 | XtraBackup 備份加速

社區近期動態

本文關鍵字：#Load Data# #文本導入# #定長導入#

點一下“閱讀原文”瞭解更多資訊

本文分享自微信公衆號 - 愛可生開源社區（ActiontechOSS）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

技術分享 | MySQL Load Data 的多種用法

一、LOAD 基本背景

二、LOAD 基礎參數

三、LOAD 示例數據及示例表結構

四、LOAD 場景示例

場景 1. LOAD 文件中的字段比數據表中的字段多

場景 2. LOAD 文件中的字段比數據表中的字段少

場景 3. LOAD 生成自定義字段數據

場景 4. LOAD 定長數據

五、LOAD 總結

使用 @NoRepositoryBean 簡化數據庫訪問

MySQL查出時間比實際晚8小時的解決方案

什麼是IPD項目管理模式？聊聊IPD下的產品研發流程

aaaaaa1

Java編程工具：簡潔高效實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結