Sqoop(發音:skup)是一款開源的工具,主要用於在Hadoop(Hive)與傳統的數據庫(mysql、postgresql...)間進行數據的傳遞,可以將一個關係型數據庫(例如 : MySQL ,Oracle ,Postgres等)中的數據導進到Hadoop的HDFS中,也可以將HDFS的數據導進到關係型數據庫中。
Sqoop配置很簡單,只需要修改conf下的sqoop-env.sh即可
我們這裏只添加了hadoop和hive的路徑
查看相關命令參數
bin/sqoop help
1、列出MySQL中的數據庫表
bin/sqoop list-databases \
--connect jdbc:mysql://yourAddress:3306 \
--username root \
--password 1234 \
//mysql中準備數據
create table user(
id tinyint(4) not null auto_increment,
name varchar(255) default null,
passwd varchar(255) default null,
primary key (id)
)
insert into user values ('1','a','a');
insert into user values ('2','b','b');
insert into user values ('3','c','c');
insert into user values ('4','d','d');
insert into user values ('5','e','e');
//導入數據
目錄默認在hdfs user目錄下
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user
目錄默認在hdfs user目錄下
自定義目錄
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user \
--target-dir /user/BPF/sqoop/imp_user \
--num-mappers 1
sqoop的底層實現就是mapreduce,對於import操作來說,僅僅運行map task
>>>>>>>>>>>>>>>>以parquet導入>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user \
--target-dir /user/BPF/sqoop/imp_user_parquet \
--num-mappers 1 \
--as-parquetfile
在hive中創建一張表進行檢驗之前導入的數據
create table default.user(
id int,
name string,
passwd string
)
row format delimited fields terminated by ','
stored as parquetfile
load data inpath '/user/yourAddress/sqoop/imp_user_parquet' into table default.user;
在實際項目中,要處理數據往往需要進行初步清洗(--query 查詢語句)
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourAddress \
--username root \
--password 1234 \
--query 'select id, name from user where $CONDITIONS' \
--target-dir /user/yourAddress/sqoop/imp_user_query \
--num-mappers 1 \
=============================================================================
SNAPPY壓縮
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourAddress \
--username root \
--password 1234 \
--table user \
--delete-target-dir \
--target-dir /user/yourAddress/sqoop/imp_user_compress \
--num-mappers 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--fields-terminated-by '\t'
測試snappy數據
create table default.user_snappy(
id int,
name string,
passwd string
)
row format delimited fields terminated by '\t'
load data inpath '/user/yourAddress/sqoop/imp_user_compress' into table default.user_snappy;
MYSQL---->導出數據(可以進行壓縮如snappy)---->HDFS---->加載數據---->HIVE的表---->查詢整理
3、將HDFS中的數據導出到MySQL
bin/sqoop export \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user \
--export-dir /user/BPF/sqoop/exp/user \
--num-mappers 1
bin/sqoop import \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user \
--fields-terminated-by '\t' \
--delete-target-dir \
--num-mappers 1 \
--hive-database default \
--hive-import \
--hive-table user_hive
MySQL創建表
create table user2(
id tinyint(4) not null auto_increment,
name varchar(255) default null,
passwd varchar(255) default null,
primary key (id)
)
導出
bin/sqoop export \
--connect jdbc:mysql://yourAddress:3306/yourDB \
--username root \
--password 1234 \
--table user2 \
--export-dir /user/hive/warehouse/user_hive \
--num-mappers 1 \
--input-fields-terminated-by '\t'
sqoop --options-file /opt/cdh-5.5.0/datas/sqoop_import-hdfs.txt
sqoop_import-hdfs.txt內容:
import
--connect
jdbc:mysql://yourAddress:3306/yourAddress
--username
root
--password
1234
--table
user
--target-dir
/user/BPF/sqoop/imp_user_option
--num-mappers
1