shark部署筆記

參照CDH5官方文檔 hive mysql 授權訪問賬號

<!-- metastore config end -->

配置hive-site.xml支持lock manager(這個是需要藉助zookeeper集羣)
<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<!--指定端口-->
    <property>
      <name>hive.zookeeper.client.port</name>
      <value>2181</value>
    </property>
<!--指定在zk集羣中的znode名字-->
    <property>
      <name>hive.zookeeper.namespace</name>
      <value>hive_zookeeper_namespace</value>
    </property>

<!-- store directory in HDFS -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

---------------------------------------------------------

指定hive使用yarn
# vi /etc/default/hive-server2
添加如下
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

啓動metastore
# service hive-metastore start

創建HDFS存儲目錄
# sudo -u hdfs hadoop fs -mkdir -p /user/hive/warehouse
# sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
t標誌位表示允許用戶創建使用自己的表但是防止刪除不是自己所有的表

啓動hive server2(啓動前必須保證hive metastore先啓動)
# service hive-server2 start

beeline測試連接
# /usr/lib/hive/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000 org.apache.hive.jdbc.HiveDriver

0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (1.596 seconds)

---------------------------------------------------------------------------------


shark啓動錯誤
# /usr/lib/shark/bin/shark-withinfo -skipRddReload
-hiveconf hive.root.logger=INFO,console -skipRddReload
Starting the Shark Command Line Client
Exception in thread "main" java.lang.ClassFormatError: org.apache.hadoop.hive.cli.CliDriver (unrecognized class file version)
  原來是java被改成了版本1.5,安裝回1.7版本問題解決
----------------------------------------------------------------------------------

shark-env.sh內容爲
export SPARK_MEM=1g

export SHARK_MASTER_MEM=512m

export HIVE_HOME="/usr/lib/hive"
export HIVE_CONF_DIR="/etc/hive/conf"

export HADOOP_HOME="/usr/lib/hadoop"
export SPARK_HOME="/usr/lib/spark"
export MASTER="spark://saltdb:7077"

export SHARK_EXEC_MODE=yarn
export SPARK_ASSEMBLY_JAR="/usr/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar"
export SHARK_ASSEMBLY_JAR="/usr/lib/shark/target/scala-2.10/shark_2.10-0.9.1.jar"


SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS


啓動錯誤
# /usr/lib/shark/bin/shark-withinfo -skipRddReload
14/05/12 13:32:06 INFO ui.SparkUI: Started Spark Web UI at http://saltdb:4040
Exception in thread "main" org.apache.spark.SparkException: YARN mode not available ?
     Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
      解決辦法在/usr/lib/shark/run文件中添加如下

# check for shark with spark on yarn params
if [ "x$SHARK_EXEC_MODE" == "xyarn" ] ; then
  if [ "x$SPARK_ASSEMBLY_JAR" == "x" ] ; then
    echo "No SPARK_ASSEMBLY_JAR specified. Please set SPARK_ASSEMBLY_JAR for spark on yarn mode."
    exit 1
  else
 +   export SPARK_JAR=$SPARK_ASSEMBLY_JAR
 +   if [ -f "$SPARK_JAR" ] ; then
 +       SPARK_CLASSPATH+=":$SPARK_JAR"
 +       echo "SPARK CLASSPATH : "$SPARK_CLASSPATH
 +   fi

  fi


以服務器方式啓動(使用hdfs用戶)
$ bin/shark sharkserver2   (啓動shark命令行)
$ bin/shark --service sharkserver2
(以服務器方式啓動,注意啓動前需要hive-metastore啓動,9083端口開啓)
bin/shark --service cli  (和sharkserver2什麼區別??)


以beeline方式連接
# /usr/lib/shark/bin/beeline
beeline>  !connect jdbc:hive2://localhost:10000/default


運行命令
shark-withinfo和shark-withdebug都是調用shark並將信息定位到控制檯,而withdebug輸出更詳細的信息
 ./shark -H 輸出幫助信息

 

運行查詢被掛起

./bin/shark-withinfo

14/05/13 09:23:19 WARN scheduler.TaskSetManager: Loss was due to java.lang.RuntimeException
java.lang.RuntimeException: readObject can't find class org.apache.hadoop.hive.conf.HiveConf
      解決辦法
嘗試將hive-common*.jar包拷貝到各NM YARN路徑下(拷貝的是shark自帶的hive0.11包)
# cp ./edu.berkeley.cs.shark/hive-common/hive-common-0.11.0-shark-0.9.1.jar /usr/lib/hadoop-yarn/lib/
將所有hive相關包拷貝到YARN路徑下
# cp -v /opt/edu.berkeley.cs.shark/hive-*/* /usr/lib/hadoop-yarn/lib/
將shark包拷貝到各NM YARN路徑下
# cp /tmp/shark_2.10-0.9.1.jar /usr/lib/hadoop-yarn/lib/

執行正常

shark> select count(*) from media_visit_info;
OK
6186276
Time taken: 17.044 seconds

使用hive server2運行

 

使用sharkserver2 jdbc執行發現錯誤
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=ALL, inode="/tmp/hive-hdfs/hive_2014-05-13_17-31-19_028_189004440533846959/_task_tmp.-ext-10001":hdfs:hadoop:drwxr-xr-x

 


嘗試在所有nodemanger節點上安裝hive
yum install hive (注意只是安裝hive ,沒有hive-metastore和hive-server2)

錯誤變爲Loss was due to java.lang.ClassNotFoundException: shark.execution.HadoopTableReader$$anonfun$7

嘗試在所有節點安裝shark jar包

之後啓動sharkserver2並使用beeline運行查詢發現下列錯誤

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=ALL, inode="/tmp/hive-hdfs/hive_2014-05-16_14-39-35_906_907709274884560817/_task_tmp.-ext-10001":hdfs:hadoop:drwxr-xr-x

解決辦法
在beeline登錄時指定用戶名

beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hdfs
Enter password for jdbc:hive2://localhost:10000/default:

或者直接指定
0: jdbc:hive2://localhost:10000/default> !connect jdbc:hive2://localhost:10000/default hdfs ''


sharkserver2 運行後通過beeline提交的查詢可以從web ui
http://192.168.10.240:4040/中查到

=================================================================

shark源代碼編譯

首先需要安裝jdk,和scala
# wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
# tar xvf scala-2.10.3.tgz

下載shark
# git clone https://github.com/amplab/shark.git -b branch-0.9 shark-0.9


編譯命令
SHARK_HADOOP_VERSION=2.0.0-cdh4.4.0 sbt/sbt package -Dsbt.override.build.repos=true

 

發佈了5 篇原創文章 · 獲贊 0 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章