大數據基礎(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安裝和sqoop與hdfs,hive,mysql導入導出

hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安裝

2016.05.15

本文測試環境:

hadoop2.6.2 ubuntu 14.04.04 amd64 jdk1.8

安裝版本:

maven 3.3.9 hbase 1.15 hive 1.2.1 sqoop2(1.99.6)和sqoop1(1.4.6)

另外,本文參考了一些文章,基本上都有原文鏈接。


前提:hadoop安裝:
參考:http://blog.csdn.net/xanxus46/article/details/45133977

本文的安裝教程可以輔助基本的hadoop日誌分析,詳細教程,參考:

http://www.cnblogs.com/edisonchou/p/4449082.html


一、maven
1、安裝jdk
2、下載:
http://maven.apache.org/download.cgi
wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
3、解壓:
tar -xzf apache-maven-3.3.9-bin.tar.gz
4、配置環境變量
vi ~/.bashrc
export MAVEN_HOME=/home/Hadoop/apache-maven-3.3.9
export PATH=$MAVEN_HOME/bin:$PATH
生效:
source ~/.bashrc
5、驗證
$mvn --version
結果:
root@spark:/usr/local/maven/apache-maven-3.3.9# mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /usr/local/maven/apache-maven-3.3.9
Java version: 1.8.0_65, vendor: Oracle Corporation
Java home: /usr/lib/java/jdk1.8.0_65/jre
Default locale: en_HK, platform encoding: UTF-8
OS name: "linux", version: "3.19.0-58-generic", arch: "amd64", family: "unix"
root@spark:/usr/local/maven/apache-maven-3.3.9# 
http://www.linuxidc.com/Linux/2015-03/114619.htm


二、hbase
1、下載:
http://mirrors.hust.edu.cn/apache/hbase/stable/
http://mirrors.hust.edu.cn/apache/hbase/stable/hbase-1.1.5-bin.tar.gz
2、解壓:
HBase的安裝也有三種模式:單機模式、僞分佈模式和完全分佈式模式,在這裏只介紹完全分佈模式。前提是Hadoop集羣和Zookeeper已經安裝完畢,並能正確運行。 
第一步:下載安裝包,解壓到合適位置,並將權限分配給hadoop用戶(運行hadoop的賬戶,比如root)
這裏下載的是hbase-1.1.5,Hadoop集羣使用的是2.6,將其解壓到/usr/local下
tar -zxvf hbase-1.1.5-bin.tar.gz
mkdir /usr/local/hbase
mv hbase-1.1.5 /usr/local/hbase
cd /usr/local
chmod -R 775 hbase
chmod -R root: hbase
3、環境變量
$vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase/hbase-1.1.5
PATH=$HBASE_HOME/bin:$PATH
source ~/.bashrc
4、配置文件
4.1 jdk[有默認的jdk,可以不改]
sudo vim /opt/hbase/conf/hbase-env.sh 
修改$JAVA_HOME爲jdk安裝目錄,這裏是/opt/jdk1.8.0 
4.2 hbase-site.xml
/usr/local/hbase/hbase-1.1.5/conf/hbase-site.xml
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://spark:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
</configuration>
5、驗證
先啓動hadoop
sbin/start-dfs.sh
sbin/start-yarn.sh
$hbase shell
結果:
root@spark:/usr/local/hbase/hbase-1.1.5/bin# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.5, r239b80456118175b340b2e562a5568b5c744252e, Sun May  8 20:29:26 PDT 2016
hbase(main):001:0> 
http://blog.csdn.net/xanxus46/article/details/45133977

集羣安裝:http://blog.sina.com.cn/s/blog_6145ed810102vtws.html



三、hive
1、下載:http://apache.fayea.com/hive/stable/
http://apache.fayea.com/hive/stable/apache-hive-1.2.1-bin.tar.gz
2、解壓:
tar xvzf apache-hive-1.2.1-bin.tar.gz 
3、環境變量
root@spark:/home/alex/xdowns# vi ~/.bashrc
export HIVE_HOME=/usr/local/hive/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
root@spark:/home/alex/xdowns# source ~/.bashrc
4、修改配置文件
首先將hive-env.sh.template和hive-default.xml.template進行復制並改名爲hive-env.sh和hive-site.xml。

/home/hadoop/apache-hive-1.0.0-bin/conf/hive-env.sh修改,如下所示:

export HADOOP_HEAPSIZE=1024

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/hadoop-2.5.2

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/apache-hive-1.0.0-bin/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/apache-hive-1.0.0-bin/lib
/home/hadoop/apache-hive-1.0.0-bin/conf/hive-site.xml修改,如下所示:

<property> 
  <name>hive.metastore.warehouse.dir</name> 
  <value>hdfs://Master:9000/hbase</value>
</property> 
<property> 
  <name>hive.querylog.location</name> 
  <value>/usr/hadoop/hive/log</value>
  <description> 
    存放hive相關日誌的目錄 
  </description> 
</property>
5、連接MySQL【可選】
5.1 禁用mysql 綁定本機
由於 mysql的默認安裝只允許本地登錄,所以需要修改配置文件將地址綁定註釋掉:
vi /etc/mysql/my.cnf
#bind-address           = 127.0.0.1
5.2 重啓mysql:  service mysql restart
5.3 登錄msql,mysql -uroot -proot
創建database: hive
create database hive;
show databases;
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
5.4 修改hive配置文件hive-site.xml
修改以下屬性:

<configuration>

    <property>

 <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.10.180:3306/hive?characterEncoding=UTF-8</value>


    </property>


    <property>


        <name>javax.jdo.option.ConnectionDriverName</name>


        <value>com.mysql.jdbc.Driver</value>


    </property>


    <property>


        <name>javax.jdo.option.ConnectionUserName</name>


        <value>root</value>


    </property>


    <property>


        <name>javax.jdo.option.ConnectionPassword</name>


        <value>alextong</value>


    </property>


</configuration>
5.5 把mySQL的JDBC驅動包複製到Hive的lib目錄下
這裏下載的版本是:mysql-connector-Java-5.0.8-bin.jar
http://dev.mysql.com/downloads/connector/j/5.0.html
tar xvzf mysql-connector-java-5.0.8.tar.gz 
mv mysql-connector-java-5.0.8-bin.jar  apache-hive-1.2.1-bin/lib

6、驗證:
6.1 先啓動hadoop
先啓動hadoop
start-dfs.sh
start-yarn.sh
6.2 hive
$hive
xxxxx
>hive
6.3 在hive上建立數據表
6.3.1
hive> show databases;
OK
default
Time taken: 1.078 seconds, Fetched: 1 row(s)
hive> 
6.3.2 建數據庫 hive
hive>create table test(id int,name string);
6.4 在mysql驗證
2)登錄mySQL查看meta信息
use hive;
show tables;
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| DATABASE_PARAMS           |
| DBS                       |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| PARTITIONS                |
| PARTITION_KEYS            |
| PART_COL_STATS            |
| ROLES                     |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| VERSION                   |
+---------------------------+


select* from TBLS;
成功
6.5 詳細驗證
6.5.1 文檔
root@spark:~# vi add.txt
5
2
:wq
6.5.2 上傳hdfs
root@spark:~# hadoop fs -put /home/alex/xdowns/add.txt /user
root@spark:~# hadoop fs -ls /user
-rw-r--r--   1 root supergroup        148 2016-05-15 16:03 /user/add.txt
6.5.3 hive建表
hive> create table tester(id int);
OK
Time taken: 0.301 seconds


6.5.4 hive load
a.在hdfs上的文件
hive> load data inpath 'hdfs://spark:9000/user/add.txt' into table tester;
Loading data to table default.tester
Table default.tester stats: [numFiles=1, totalSize=3]
OK
load完成後 hdfs上的文件自動刪除
6.5.5 hive select查詢結果
hive> select * from tester;
OK
5
2
Time taken: 0.313 seconds, Fetched: 2 row(s)
hive> 
6.5.6 mysql 查詢結果
mysql> SELECT * FROM TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
|      1 |  1463298658 |     1 |                0 | root  |         0 |     1 | test     | MANAGED_TABLE | NULL               | NULL               |
|      2 |  1463299661 |     1 |                0 | root  |         0 |     2 | testadd  | MANAGED_TABLE | NULL               | NULL               |
|      6 |  1463300857 |     2 |                0 | root  |         0 |     6 | testadd  | MANAGED_TABLE | NULL               | NULL               |
|     11 |  1463301301 |     1 |                0 | root  |         0 |    11 | test_add | MANAGED_TABLE | NULL               | NULL               |
|     12 |  1463301398 |     1 |                0 | root  |         0 |    12 | tester   | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
5 rows in set (0.01 sec)

b.如果是本地文件
hive> load data local inpath 'add.txt' into table testadd;
退出quit;
7、報錯
7.1 java.io
Exception in thread "main"Java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
辦法:
http://blog.csdn.net/zwx19921215/article/details/42776589
把hive-site.xml裏所有含system:java.io.tmpdir的像換成絕對路徑,比如/usr/local/hive/log
    <name>hive.exec.local.scratchdir</name>
    <value>/usr/local/hive/log</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/user/local/hive/log</value>
        <name>hive.querylog.location</name>
    <value>/usr/local/hive/log</value>
    <description>Location of Hive run time structured log file</description>
  </property>
7.2 jline
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
辦法:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
http://stackoverflow.com/questions/28997441/hive-startup-error-terminal-initialization-failed-falling-back-to-unsupporte
vi ~/.bashrc
export HADOOP_USER_CLASSPATH_FIRST=true
source ~/.bashrc
7.3 字符集的問題
2 For direct MetaStore DB connections, we don’t support retries at the client level.


當在Hive中創建表的時候報錯:


create table years (year string, event string) row format delimited fields terminated by '\t';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)


這是由於字符集的問題,需要配置MySQL的字符集:


mysql> alter database hive character set latin1;




四、sqoop2安裝,sqoop1見下一步五
(最好在安裝sqoop之前,把Hbase和Hive安裝上)
1.4.6也支持hadoop 2.6.2
1、下載:
http://mirror.bit.edu.cn/apache/sqoop/1.99.6/
http://mirror.bit.edu.cn/apache/sqoop/1.99.6/sqoop-1.99.6-bin-hadoop200.tar.gz
2、解壓:
tar xvzf sqoop-1.99.6-bin-hadoop200.tar.gz
3、環境變量
vi ~/.bashrc
export SQOOP_HOME=/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_HOME=$SQOOP_HOME/server  
export LOGDIR=$SQOOP_HOME/logs  
source ~/.bashrc
4、配置文件
4.1 配置${SQOOP_HOME}/server/conf/catalina.properties 文件【含hive的jar文件替換】
找到common.loader行,刪除hadoop和hive所有jar路徑,加入本機hadoop2的jar路徑 【一行裏邊,不要換行】
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/local/hive/apache-hive-1.2.1-bin/lib/*.jar
【如果還需要導入hive或hbase,對應的jar包也需要加入 
由於添加的jar包中包含了log4j.jar,爲了防止jar包衝突,刪除sqoop中的log4j.jar


[grid@hadoop6 sqoop-1.99.3]$ mv ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar.bak】
4.2 配置${SQOOP_HOME}/server/conf/sqoop.properties 文件
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/hadoop-2.6.2/etc/hadoop/
5、替換@LOGDIR@ 和@BASEDIR@ :【可選】
/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/base
/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/logs
6、jdbc驅動
然後找到你的數據庫jdbc驅動複製到sqoop/lib目錄下,如果不存在則創建.
下載mysql驅動包 mysql-connector-java-5.1.16-bin.jar 並放到 /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/server/lib 目錄下
7、啓動
7.1先啓動hadoop
./start-dfs
./start-yarn
7.2 啓動sqoop
7.2.1啓動 [root@db12c sqoop]# ./bin/sqoop.sh server start
Sqoop home directory: /home/likehua/sqoop/sqoop
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /home/likehua/sqoop/sqoop/server
Using CATALINA_HOME:   /home/likehua/sqoop/sqoop/server
Using CATALINA_TMPDIR: /home/likehua/sqoop/sqoop/server/temp
Using JRE_HOME:        /usr/local/jdk1.7.0
Using CLASSPATH:       /home/likehua/sqoop/sqoop/server/bin/bootstrap.jar
(sqoop服務端是一個跑在tomcat上的服務程序)
[關閉 sqoop server :./bin/sqoop.sh server stop]
7.2.2啓動sqoop客戶端:
注意:使用sqoop2-shell如果有hadoop jar包warning,說明jar包在4.1時沒有配置完全或者有錯誤,重新按教程配置,注意common.loader不要多行。
此外,sqoop2(1.99.x)沒有一些舊命令,比如輸入sqoop是不會進入shell的。
[root@db12c sqoop]# bin/sqoop.sh client
Sqoop home directory: /home/likehua/sqoop/sqoop
Sqoop Shell: Type 'help' or '\h' for help.


sqoop:000> show version --all#顯示版本:show version --all顯示連接器:show connector --all創建連接:create connection --cid 1


client version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
server version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
Protocol version:
  [1]
sqoop:000>
主要參考:http://www.th7.cn/db/nosql/201510/134172.shtml
http://www.cnblogs.com/likehua/p/3825489.html
exit退出
8、sqoop從hive導出數據到mysql
啓動
cd /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/bin
./sqoop2-shell 
爲客戶端配置服務器
sqoop:000> set server --host spark --port 12000 --webapp sqoop
Server is set successfully



五、Sqoop1安裝

Hadoop 2.6.2下Sqoop 1.4.6安裝以及Hive,HDFS,MySQL導入導出


環境:
Ubuntu 14.04.04 amd64 jdk1.8 Hadoop 2.6.2 ,Hive,Hbase


注:Sqoop2(1.99.6)使用起來有點不順手,等Sqoop2完善了再用,而用Sqoop1(1.4.6)操作簡單,兩個都可以選擇。
Sqoop2的安裝可以參考本人的相關文章。


參考:http://www.tuicool.com/articles/FZRJbuz
1、下載:
http://www.apache.org/dyn/closer.lua/sqoop/1.4.6
http://apache.fayea.com/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz


2、解壓:
tar -zxvf  sqoop-1.4.4-cdh5.1.2.tar.gz


3、配置
cd /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf
cp sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
添加如下:【如果有hive,hbase,zookeeper要改,最好都有】
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop


#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop


#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/hbase


#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive


#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper


4、添加MySQL connector jar包
cp  ~/hive/lib/mysql-connector-java-5.1.30.jar   ~/sqoop/lib/
或者自己下一個放到相應的路徑下


5、添加環境變量
vi ~/.bashrc
export SQOOP_HOME=/home/hadoop/sqoop
export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
source ~/.bashrc


6、測試MySQL數據庫的連接
sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
提示錯誤:
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
添加zookeeper home
vi ~/.bashrc
export ZOOKEEPER_HOME=/opt/zookeeper/zookeeper
export path=${ZOOKEEPER_HOME}/bin:$PATH
重新測試:
sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
仍有提示ACCUMULO_HOME之類的,不管,輸入密碼MySQL密碼
Enter password: 
2016-05-18 19:16:15,336 INFO  [main] manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test_hdfs


7、MySQL數據庫的表導入HDFS
注意:先啓動hadoop,否則有connection refused錯誤
root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-dfs.sh
root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-yarn.sh
之後:
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test111 
註釋:
-m 1是map的數量,
--target-dir必須是空目錄,否則報錯文件夾已存在,如果想自動刪除,改爲--delete-target-dir
如果有xxx streaming xxx .close()就添加 --driver com.mysql.jdbc.Driver
參考:http://stackoverflow.com/questions/26375269/sqoop-error-manager-sqlmanager-error-reading-from-database-java-sql-sqlexcept
http://www.cognoschina.net/home/space.php?uid=173321&do=blog&id=121081


8、MySQL數據庫的表導入Hive
就是在第7步後邊加--hive-import,注意導入的路徑是你在hive-site.xml裏邊指定的路徑
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test222 --hive-import


9、HDFS導入MySQL
sqoop export,--export-dir就是hdfs所在路徑,其他和第7步一樣,注意--table必須是空表,先用mysql創建好
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test111


10、Hive導入MySQL
與HDFS導入MySQL相同,注意--table必須是空表,先用mysql創建好
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test222
第二個例子 
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://192.168.10.180:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable3 --export-dir /hbase/tester
2016-05-18 20:57:29,927 INFO  [main] mapreduce.ExportJobBase: Transferred 125 bytes in 50.8713 seconds (2.4572 bytes/sec)
2016-05-18 20:57:29,968 INFO  [main] mapreduce.ExportJobBase: Exported 2 records.




11、Sqoop job
sqoop job --create myjob -- import --connect jdbc:mysql://192.168.10.180:3306/test --username root --password 123456 --table mytabs --fields-terminated-by '\t'
其中myjob表示作業名稱。在job中保存密碼,默認在調用時還會要求輸入密碼,需要將密碼直接保存在job中下次可以免密碼直接執行,可以將/conf/sqoop-site.xml中的sqoop.metastore.client.record.password注視去掉。
其他相關作業命令:① job sqoop job --list,查看作業列表;② job sqoop job --delete myjob,刪除作業。


可以參考:
http://www.th7.cn/db/mysql/201405/54683.shtml



#########################################

問題解決參考:

hive問題及解決

1.hiveserver2啓動後,beeline不能連接的涉及的問題:
原因:權限問題
解決:
/user/hive/warehouse
/tmp
/history (如果配置了jobserver 那麼/history也需要調整)
這三個目錄,hive在運行時要讀取寫入目錄裏的內容,所以把權限放開,設置權限:
hadoop fs -chmod -R 777 /tmp
hadoop fs -chmod -R 777 /user/hive/warehouse
2.beeline 鏈接拒絕報錯信息
原因:官方的一個bug
解決:
hive.server2.long.polling.timeout


hive.server2.thrift.bind.host 注意把host改成自己的host
3.字符集問題、亂碼的、顯示字符長度問題的
原因:字符集的問題,亂碼問題
解決:hive-site.xml中配置的mysql數據庫中去 alter database hive character set latin1;
類似附件中的圖片顯示錯誤。
4.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don’t support retries at the client level.)
這個是由於我的mysql不再本地(默認使用本地數據庫),這裏需要配置遠端元數據服務器
hive.metastore.uris


thrift://lza01:9083
Thrift URI for the remote metastore. Used by metastore client to connect to rem
ote metastore. 然後在hive服務端啓動元數據存儲服務 hive –service metastore




5.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes
修改mysql的字符集
alter database hive character set latin1;
轉載請註明:雲帆大數據學院(http://www.yfteach.com) » hive安裝問題及解決方法


版權聲明:本文爲博主原創文章,未經博主允許不得轉載。


目錄(?)[+]
1 Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT…


當啓動Hive的時候報錯:


Caused by: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
NestedThrowables:
java.sql.SQLException: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
1
2
3
1
2
3
這個問題是由於hive的元數據存儲MySQL配置不當引起的,可以這樣解決:


mysql> set global binlog_format='MIXED';
1
1
2 For direct MetaStore DB connections, we don’t support retries at the client level.


當在Hive中創建表的時候報錯:


create table years (year string, event string) row format delimited fields terminated by '\t';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
1
2
1
2
這是由於字符集的問題,需要配置MySQL的字符集:


mysql> alter database hive character set latin1;
1
1
3 HiveConf of name hive.metastore.local does not exist


當執行Hive客戶端時候出現如下錯誤:


WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
1
1
這是由於在0.10 0.11或者之後的HIVE版本 hive.metastore.local 屬性不再使用。將該參數從hive-site.xml刪除即可。


4 Permission denied: user=anonymous, access=EXECUTE, inode=”/tmp”


在啓動Hive報如下錯誤:


(Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------
1
1
這是由於Hive沒有hdfs:/tmp目錄的權限,賦權限即可:


hadoop dfs -chmod -R 777 /tmp
1
1
5 未完待續
http://blog.csdn.net/cjfeii/article/details/49363653
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章