數據倉庫 — 08_mysql和sqoop的安裝與配置(linux環境下mysql和sqoop的安裝、業務數據生成jar包和配置文件、mysql_to_hdfs同步腳本)


歡迎訪問筆者個人技術博客:http://rukihuang.xyz/
學習視頻來源於尚硅谷,視頻鏈接:尚硅谷大數據項目數據倉庫,電商數倉V1.2新版,Respect!

1 MySQL安裝

  • 卸載安裝均需要在root用戶身份下

1.1 安裝包準備

  1. 查看mysql是否安裝,如果安裝,先卸載(需要在root角色下)
#查看
rpm -qa|grep mysql
#卸載
rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64
  1. 解壓mysql-libs.zip文件到/opt/software目錄
unzip mysql-libs.zip

在這裏插入圖片描述

1.2 安裝mysql服務器

  1. 安裝mysql服務端(在/opt/software/mysql-libs目錄下)
    1. 如果安裝報錯(博主遇到了),參考這篇文章:https://blog.csdn.net/qq_42191775/article/details/103939104
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
  1. 查看產生的隨機密碼(之後改密碼需要用)
cat /root/.mysql_secret
  1. 查看mysql狀態
service mysql status
  1. 啓動mysql
service mysql start

1.3 安裝mysql客戶端

  1. 安裝mysql客戶端(在/opt/software/mysql-libs目錄下)
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
  1. 連接Mysql
mysql -uroot -p[步驟1.2.2得到的隨機密鑰]
  1. 修改密碼
set password=password('root')
  1. 退出
exit

1.4 mysql中主機配置(user表)

配置只要是root 用戶+密碼,在任何主機上都能登錄MySQL 數據庫。

  1. 進入mysql
mysql -uroot -proot
  1. 顯示數據庫
show databases;
  1. 使用mysql數據庫
use mysql;

在這裏插入圖片描述

  1. 顯示mysql中的所有表
show tables;

在這裏插入圖片描述

  1. 查詢user表
select User, Host, Password from user;
  1. 修改user 表,把Host 表內容修改爲%
update user set host='%' where host='localhost';
  1. 刪除root用戶其他host
delete from user where Host='hadoop102';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
  1. 刷新
flush privileges;

在這裏插入圖片描述

  1. 退出
quit

2 Sqoop的安裝

2.1 安裝sqoop

  1. 上傳安裝包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到hadoop102 的/opt/software路徑中
  2. 將安裝包解壓到/opt/module
tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/
  1. 修改文件夾的名字(/opt/module)
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop

2.2 修改配置文件

  1. 進入到/opt/module/sqoop/conf目錄,重命名配置文件
mv sqoop-env-template.sh sqoop-env.sh
  1. 修改配置文件
#增加如下內容
export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export ZOOCFGDIR=/opt/module/zookeeper-3.4.10/conf
export HBASE_HOME=/opt/module/hbase

2.3 拷貝JDBC驅動

  1. 進入到/opt/software/mysql-libs 路徑,解壓mysql-connector-java-5.1.27.tar.gz 到當前路徑
tar -zxvf mysql-connector-java-5.1.27.tar.gz
  1. 進入到/opt/software/mysql-libs/mysql-connector-java-5.1.27路徑,拷貝jdbc 驅動到sqoop的lib目錄下。
cp mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop/lib/

2.4 測試Sqoop是否能連接數據庫

bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password root

在這裏插入圖片描述

3 業務數據的生成

  1. 在hadoop102 的/opt/module/目錄下創建db_log 文件夾
mkdir db_log
  1. gmall-mock-db-2020-03-16-SNAPSHOT.jarapplication.properties 上傳到hadoop102的/opt/module/db_log 路徑上。

  2. 根據需求修改application.properties 相關配置

    1. 通過修改mock.date=2020-03-11,生成那天的數據
    2. 通過修改mock.clear=0,刪除原有的數據,生成新的隨機數據
logging.pattern.console=%m%n
logging.level.root=info

spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://hadoop102:3306/gmall?characterEncoding=utf-8&useSSL=false&serverTimezone=GMT%2B8
spring.datasource.username=root
spring.datasource.password=root


logging.pattern.console=%m%n

mybatis-plus.global-config.db-config.field-strategy=not_null

#業務日期
mock.date=2020-03-11
#是否重置 1表示重置 0表示不重置
mock.clear=0

#是否生成新用戶
mock.user.count=50
#男性比例
mock.user.male-rate=20

#收藏取消比例
mock.favor.cancel-rate=10
#收藏數量
mock.favor.count=100

#購物車數量
mock.cart.count=10
#每個商品最多購物個數
mock.cart.sku-maxcount-per-cart=3

#用戶下單比例
mock.order.user-rate=80
#用戶從購物中購買商品比例
mock.order.sku-rate=70
#是否參加活動
mock.order.join-activity=1
#是否使用購物券
mock.order.use-coupon=1
#購物券領取人數
mock.coupon.user-count=10

#支付比例
mock.payment.rate=70
#支付方式 支付寶:微信 :銀聯
mock.payment.payment-type=30:60:10

#評價比例 好:中:差:自動
mock.comment.appraise-rate=30:10:10:50

#退款原因比例:質量問題 商品描述與實際描述不一致 缺貨 號碼不合適 拍錯 不想買了 其他
mock.refund.reason-rate=30:10:20:5:15:5:5
  1. 並在該目錄下執行,如下命令,生成2020-03-10 日期數據:
java -jar gmall-mock-db-2020-03-16-SNAPSHOT.jar

4 同步策略

4.1 全量同步策略

  • 每天存儲一份完整的數據,適用於數據量不大,且每天既有新數據插入,也會有舊數據修改的場景。

4.2 增量同步數據

  • 每天存儲一份增量數據,適用於數據量大,且每天只會有新數據插入的場景。

4.3 新增及變化策略

  • 存儲創建時間和操作時間都是當天的數據

4.4 特殊策略

  • 只同步一遍就可以的數據。如客觀世界維度,日期維度,地區維度的數據。

5 mysql->sqoop->hdfs腳本編寫

  1. /home/atguigu/bin目錄下創建mysql_to_hdfs.sh
#! /bin/bash
sqoop=/opt/module/sqoop/bin/sqoop

do_date=`date -d '-1 day' +%F`
if [[ -n "$2" ]]; then
	do_date=$2
fi
import_data(){
$sqoop import \
--connect jdbc:mysql://hadoop102:3306/gmall \
--username root \
--password root \
--target-dir /origin_data/gmall/db/$1/$do_date \
--delete-target-dir \
--query "$2 and \$CONDITIONS" \
--num-mappers 1 \
--fields-terminated-by '\t' \
--compress \
--compression-codec lzop \
--null-string '\\N' \
--null-non-string '\\N'

hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/gmall/db/$1/$do_date
}
import_order_info(){
	import_data order_info "select
			id,
			final_total_amount,
			order_status,
			user_id,
			out_trade_no,
			create_time,
			operate_time,
			province_id,
			benefit_reduce_amount,
			original_total_amount,
			feight_fee
		from order_info
		where (date_format(create_time,'%Y-%m-%d')='$do_date'
		or date_format(operate_time,'%Y-%m-%d')='$do_date')"
}
import_coupon_use(){
	import_data coupon_use "select
			id,
			coupon_id,
			user_id,
			order_id,
			coupon_status,
			get_time,
			using_time,
			used_time
		from coupon_use
		where (date_format(get_time,'%Y-%m-%d')='$do_date'
		or date_format(using_time,'%Y-%m-%d')='$do_date'
		or date_format(used_time,'%Y-%m-%d')='$do_date')"
}
import_order_status_log(){
	import_data order_status_log "select
			id,
			order_id,
			order_status,
			operate_time
		from order_status_log
		where
		date_format(operate_time,'%Y-%m-%d')='$do_date'"
}
import_activity_order(){
	import_data activity_order "select
			id,
			activity_id,
			order_id,
			create_time
		from activity_order
		where
		date_format(create_time,'%Y-%m-%d')='$do_date'"
}
import_user_info(){
	import_data "user_info" "select
			id,
			name,
			birthday,
			gender,
			email,
			user_level,
			create_time,
			operate_time
		from user_info
		where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date'
		or DATE_FORMAT(operate_time,'%Y-%m-%d')='$do_date')"
}
import_order_detail(){
	import_data order_detail "select
			od.id,
			order_id,
			user_id,
			sku_id,
			sku_name,
			order_price,
			sku_num,
			od.create_time
		from order_detail od
		join order_info oi
		on od.order_id=oi.id
		where
		DATE_FORMAT(od.create_time,'%Y-%m-%d')='$do_date'"
}
import_payment_info(){
	import_data "payment_info" "select
			id,
			out_trade_no,
			order_id,
			user_id,
			alipay_trade_no,
			total_amount,
			subject,
			payment_type,
			payment_time
		from payment_info
		where
		DATE_FORMAT(payment_time,'%Y-%m-%d')='$do_date'"
}
import_comment_info(){
	import_data comment_info "select
			id,
			user_id,
			sku_id,
			spu_id,
			order_id,
			appraise,
			comment_txt,
			create_time
		from comment_info
		where date_format(create_time,'%Y-%m-%d')='$do_date'"
}
import_order_refund_info(){
			import_data order_refund_info "select
			id,
			user_id,
			order_id,
			sku_id,
			refund_type,
			refund_num,
			refund_amount,
			refund_reason_type,
			create_time
		from order_refund_info
		where
		date_format(create_time,'%Y-%m-%d')='$do_date'"
}
import_sku_info(){
	import_data sku_info "select
			id,
			spu_id,
			price,
			sku_name,
			sku_desc,
			weight,
			tm_id,
			category3_id,
			create_time
		from sku_info where 1=1"
}
import_base_category1(){
	import_data "base_category1" "select
			id,
			name
		from base_category1 where 1=1"
}
import_base_category2(){
	import_data "base_category2" "select
			id,
			name,
			category1_id
		from base_category2 where 1=1"
}
import_base_category3(){
	import_data "base_category3" "select
			id,
			name,
			category2_id
		from base_category3 where 1=1"
}
import_base_province(){
	import_data base_province "select
			id,
			name,
			region_id,
			area_code,
			iso_code
		from base_province
		where 1=1"
}
import_base_region(){
	import_data base_region "select
			id,
			region_name
		from base_region
		where 1=1"
}
import_base_trademark(){
	import_data base_trademark "select
			tm_id,
			tm_name
		from base_trademark
		where 1=1"
}
import_spu_info(){
	import_data spu_info "select
			id,
			spu_name,
			category3_id,
			tm_id
		from spu_info
		where 1=1"
}
import_favor_info(){
	import_data favor_info "select
			id,
			user_id,
			sku_id,
			spu_id,
			is_cancel,
			create_time,
			cancel_time
		from favor_info
		where 1=1"
}
import_cart_info(){
	import_data cart_info "select
		id,
		user_id,
		sku_id,
		cart_price,
		sku_num,
		sku_name,
		create_time,
		operate_time,
		is_ordered,
		order_time
	from cart_info
	where 1=1"
}
import_coupon_info(){
	import_data coupon_info "select
			id,
			coupon_name,
			coupon_type,
			condition_amount,
			condition_num,
			activity_id,
			benefit_amount,
			benefit_discount,
			create_time,
			range_type,
			spu_id,
			tm_id,
			category3_id,
			limit_num,
			operate_time,
			expire_time
		from coupon_info
		where 1=1"
}
import_activity_info(){
	import_data activity_info "select
			id,
			activity_name,
			activity_type,
			start_time,
			end_time,
			create_time
		from activity_info
		where 1=1"
}
import_activity_rule(){
	import_data activity_rule "select
			id,
			activity_id,
			condition_amount,
			condition_num,
			benefit_amount,
			benefit_discount,
			benefit_level
		from activity_rule
		where 1=1"
}
import_base_dic(){
	import_data base_dic "select
			dic_code,
			dic_name,
			parent_code,
			create_time,
			operate_time
		from base_dic
		where 1=1"
}
case $1 in
"order_info")
	import_order_info
;;
"base_category1")
	import_base_category1
;;
"base_category2")
	import_base_category2
;;
"base_category3")
	import_base_category3
;;
"order_detail")
	import_order_detail
;;
"sku_info")
	import_sku_info
;;
"user_info")
	import_user_info
;;
"payment_info")
	import_payment_info
;;
"base_province")
	import_base_province
;;
"base_region")
	import_base_region
;;
"base_trademark")
	import_base_trademark
;;
"activity_info")
	import_activity_info
;;
"activity_order")
	import_activity_order
;;
"cart_info")
	import_cart_info
;;
"comment_info")
	import_comment_info
;;
"coupon_info")
	import_coupon_info
;;
"coupon_use")
	import_coupon_use
;;
"favor_info")
	import_favor_info
;;
"order_refund_info")
	import_order_refund_info
;;
"order_status_log")
	import_order_status_log
;;
"spu_info")
	import_spu_info
;;
"activity_rule")
	import_activity_rule
;;
"base_dic")
	import_base_dic
;;
"first")
	import_base_category1
	import_base_category2
	import_base_category3
	import_order_info
	import_order_detail
	import_sku_info
	import_user_info
	import_payment_info
	import_base_province
	import_base_region
	import_base_trademark
	import_activity_info
	import_activity_order
	import_cart_info
	import_comment_info
	import_coupon_use
	import_coupon_info
	import_favor_info
	import_order_refund_info
	import_order_status_log
	import_spu_info
	import_activity_rule
	import_base_dic
;;
"all")
	import_base_category1
	import_base_category2
	import_base_category3
	import_order_info
	import_order_detail
	import_sku_info
	import_user_info
	import_payment_info
	import_base_trademark
	import_activity_info
	import_activity_order
	import_cart_info
	import_comment_info
	import_coupon_use
	import_coupon_info
	import_favor_info
	import_order_refund_info
	import_order_status_log
	import_spu_info
	import_activity_rule
	import_base_dic
;;
esac
  • 說明1:[ -n 變量值] 判斷變量的值,是否爲空
    • 變量的值,非空,返回true
    • 變量的值,爲空,返回false
  1. 修改腳本權限
chmod 777 mysql_to_hdfs.sh
  1. 初次導入
mysql_to_hdfs.sh first 2020-03-10
  1. 每日導入
mysql_to_hdfs.sh all 2020-03-11

5.1 項目經驗

  • Hive 中的Null 在底層是以“\N”來存儲,而MySQL 中的Null 在底層就是Null,爲了保證數據兩端的一致性。在導出數據時採用--input-null-string--input-null-non-string 兩個參數。導入數據時採用--null-string--null-non-string
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章