第一步:編譯支持hive的spark assembly
http://blog.csdn.net/xiao_jun_0820/article/details/44178169
第二步:讓cloudera manager裝的spark支持hql
http://blog.csdn.net/xiao_jun_0820/article/details/44680925
發現CDH5.5竟然把spark-sql,sparkR命令文件都沒有放。R文件夾也沒有。
第三步:拷貝文件和設置環境
sparkR --master yarn --executor-memory 1g
提示找不到hadoop的配置
IllegalArgumentException: requirement failed: Cannot read Hadoop config dir /opt/cloudera/parcels/CDH/lib/spark/conf/yarn-conf.
需要增加配置:
vi /etc/profile
source /etc/profile
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_CMD=/opt/cloudera/parcels/CDH/bin/hadoop
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin