這兩天被hive的權限問題,折騰的不輕.記錄一下

Hive的基本配置我就不細說了,自行配置,網上一堆堆的.

1.背景

要求可以使用hdfs和hive用戶操作自己創建的數據庫. 權限不可亂. 要求,如下,[基本就是裸奔,沒做任何配置,但依舊是坑不斷.]

1.hive沒有設置任何權限採用默認 NONE

2.hadoop權限體系採用默認最簡單的Simple機制.

3. 要求目錄權限不能設置777

4. hdfs-site.xml 要開啓權限檢查

hdfs-site.xml :

<property>
    <name>dfs.permissions</name>
    <value>true</value>
</property>

5.hive-site.xml 配置 : hive.server2.enable.doAs

    <property>
        <name>hive.server2.enable.doAs</name>
        <value>true</value>
         <description>
            hive.server2.enable.doAs設置成false則，
            yarn作業獲取到的hiveserver2用戶都爲hive用戶。
            設置成true則爲實際的用戶名
        </description>
    </property>

如果設置hive.server2.enable.doAs爲false的話,則任何用戶用hiveserver2連接的用戶都會使用hive用戶.

這樣的話,假設我用hdfs用戶在hive中創建的數據庫, 採用hiveserver2 就會報錯.

因爲你不管設置用戶名是什麼,都會以hive的權限去訪問hdfs用戶權限的hive數據.

2..命令行權限配置

[ 其實這樣是最基本的配置. ]

core-site.xml 配置代理.

在這裏要配置需要用hive連接,創建數據庫/操作數據的用戶.

這裏我只舉了兩個用戶. 一個是hive,一個是hdfs.

<configuration>
        <property>
                <name>httpfs.proxyuser.hive.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hive.groups</name>
                <value>*</value>
        </property>
 		<property>
                <name>httpfs.proxyuser.hdfs.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hdfs.groups</name>
                <value>*</value>
        </property>
</configuration>

配置完之後, 可以在命令行中用hive / hdfs 操作自己對應權限的用戶. 用戶名默認是當前登錄的系統名

2. 使用thrift方式,獲取元數據,直接操作數據. [Presto]

在這裏我直接拿的Presto進行舉例.因爲我這邊是用presto連接的hive.

直接上Presto 配置文件 /etc/catalog裏面關於hive.properties的配置.

我們看到這是通過thrift進行配置的.

但是我們用什麼用戶進行操作呢 ?? 我沒看到怎麼配置的. 比如我用的hdfs用戶和hive操作自己的數據庫怎麼辦? 怎麼區分權限 ??

[root@master catalog]# pwd
/opt/presto/etc/catalog
[root@master catalog]#
[root@master catalog]#
[root@master catalog]# ll
總用量 8
-rw-rw-r--. 1 presto hadoop 172 6月  18 13:07 hive.properties
-rw-rw-r--. 1 presto hadoop 124 4月  18 17:33 mysql.properties
[root@master catalog]#
[root@master catalog]# more hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metaserver:9083
hive.config.resources=/opt/hadoop/etc/hadoop/core-site.xml,/opt/hadoop/etc/hadoop/hdfs-site.xml
hive.config.resources=/opt/hadoop/etc/hadoop/core-site.xml,/opt/hadoop/etc/hadoop/hdfs-site.xml
[root@master catalog]#
[root@master catalog]#

直接說答案.

當不將Kerberos與HDFS一起使用時，Presto將使用Presto進程的OS用戶訪問HDFS(即系統用戶)。

例如，如果Presto以root身份運行，它將以身份訪問HDFS的權限爲root用戶的權限。

可以通過HADOOP_USER_NAME在Presto JVM Config中設置系統屬性來覆蓋此用戶名，並替換hdfs_user爲適當的用戶名：

-DHADOOP_USER_NAME=hdfs_user

[root@master etc]# pwd
/opt/presto/etc
[root@master etc]# ll
總用量 16
drwxr-xr-x. 2 presto hadoop  53 6月  18 13:07 catalog
-rw-r--r--. 1 presto hadoop 177 6月  18 13:34 config.properties
-rw-r--r--. 1 presto hadoop 194 6月  18 13:34 jvm.config
-rw-rw-r--. 1 presto presto  25 4月  18 17:33 log.properties
-rw-r--r--. 1 presto hadoop  85 6月  18 13:34 node.properties
[root@master etc]# more jvm.config
-server
-Xmx2G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-DHADOOP_USER_NAME=hdfs

登錄驗證:

cd ${PRESTO_HOME}/bin
./presto --server 192.168.100.100:8989 --catalog hive --schema default

當然,這裏面並不一定是用你配置的賬戶進行登錄.因爲有可能不管你用任何賬戶登錄,都可能是用hive的權限進行操作.

3.beeline,hiveserver2 ,Hive on Spark 權限設置

有同學會問,爲啥這個要單獨拎出來.因爲我遇到一個奇葩的問題. 折騰了我一天. 我先把報錯問題跑出來.

使用hdfs用戶去訪問hdfs用戶在命令行創建的數據庫. 在查詢的時候無法查出數據[報錯]. 但是使用hive確可以查出來.

運行條件:

hive運行的時候採用的是hive on spark 跑數據. 跑數據的時候,需要執行一條涉及到聚合操作的sql語句例: selct count(*) from table

hive採用beeline進行連接 : beeline -u 'jdbc:hive2://localhost:10000/default' -n hdfs

java代碼使用hiveserver2連接 : jdbc:hive2://localhost:10000/default 這個連接,設置hdfs 用戶進行登錄.

報錯信息:

hiveserver2.log:

FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.

Failed to create Spark client for Spark session 3688eac6-e3ff-4c45-b0c1-ef09c34909ef

2020-06-18 19:49:29,031 INFO  [23b1bf4c-9333-4095-b6f5-0f362ef59609 HiveServer2-Handler-Pool: Thread-75] reducesink.VectorReduceSinkEmptyKeyOpe
rator: VectorReduceSinkEmptyKeyOperator constructor vectorReduceSinkInfo org.apache.hadoop.hive.ql.plan.VectorReduceSinkInfo@582d2544
Query ID = hive_20200618194928_671bb64c-92e7-4930-8b09-b27bab7a64a0
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session
3688eac6-e3ff-4c45-b0c1-ef09c34909ef)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session
 3688eac6-e3ff-4c45-b0c1-ef09c34909ef
OK

報錯信息只有: Failed to create Spark client for Spark session

思考: 爲啥 ??? 因爲hive是運行在spark上 , spark 採用yarn模式進行運行. 所以查找了所有的日誌.最終發現.

hadoop-yarn-resourcemanager-xxxx-103.log

2020-06-18 20:49:58,590 INFO org.apache.hadoop.ipc.Server:
 Connection from 192.168.xxx.103:55442 for protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB 

is unauthorized for user hdfs (auth:PROXY) via hive (auth:SIMPLE)

注意這句: is unauthorized for user hdfs (auth:PROXY) via hive (auth:SIMPLE)

我去查了hadoop3.1.3的源碼

很明顯是用戶權限沒通過. 爲啥沒通過?

網上很多說是因爲沒有在core-site.xml裏面配置代理對象. 但是這個"確實已經加了"

<configuration>
        <property>
                <name>httpfs.proxyuser.hive.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hive.groups</name>
                <value>*</value>
        </property>
 		<property>
                <name>httpfs.proxyuser.hdfs.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hdfs.groups</name>
                <value>*</value>
        </property>
</configuration>

然後各種查資料,各種嘗試,不好用....

沒辦法了,將: hdfs-site.xml中的 dfs.permissions 中的權限校驗關掉了....

依舊不好用............

最後在:httpfs-site.xml 增加上面的配置.

<configuration>
        <property>
                <name>httpfs.proxyuser.hive.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hive.groups</name>
                <value>*</value>
        </property>
 		<property>
                <name>httpfs.proxyuser.hdfs.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>httpfs.proxyuser.hdfs.groups</name>
                <value>*</value>
        </property>
</configuration>

然後滿心歡喜的以爲好用了.

hadoop關於文件的權限校驗 hdfs-site.xml中的 dfs.permissions ,然後又不好用!!!!!!!!!!!

這時候已經到了第二天了.還沒解決. 心態已經要炸了....

已經要懷疑是不是版本不兼容的問題?

接下來沒辦法了, 上遠程調試[這部分我就不細說了,總之超級麻煩].

權限檢查的代碼在這裏:

//有興趣的可以去看看hadoop關於權限的源碼:
org.apache.hadoop.ipc.Server#authorizeConnection

 
/**
    * Authorize proxy users to access this server
    * @throws RpcServerException - user is not allowed to proxy
    */
private void authorizeConnection() throws RpcServerException {
    try {
    // If auth method is TOKEN, the token was obtained by the
    // real user for the effective user, therefore not required to
    // authorize real user. doAs is allowed only for simple or kerberos
    // authentication
    if (user != null && user.getRealUser() != null
        && (authMethod != AuthMethod.TOKEN)) {
        // 默認會走這裏
        ProxyUsers.authorize(user, this.getHostAddress());
    }
    authorize(user, protocolName, getHostInetAddress());
    if (LOG.isDebugEnabled()) {
        LOG.debug("Successfully authorized " + connectionContext);
    }
    rpcMetrics.incrAuthorizationSuccesses();
    } catch (AuthorizationException ae) {
    LOG.info("Connection from " + this
        + " for protocol " + connectionContext.getProtocol()
        + " is unauthorized for user " + user);
    rpcMetrics.incrAuthorizationFailures();
    throw new FatalRpcServerException(
        RpcErrorCodeProto.FATAL_UNAUTHORIZED, ae);
    }
}

在調試的工程中,發現加載配置文件,代理對象加載的時候,竟然沒有hive ???

思考了一下,因爲提交任務到yarn上面的時候,權限校驗不通過. ResourceManager負責權限以及資源調度.

所以直接看ResourceManager上的代理配置文件.

core-site.xml配置文件

一堆神獸跑過..........

又一堆神獸跑過..........

雙雙一堆神獸跑過..........

叒叒叒一堆神獸跑過..........

叕叕叕叕一堆神獸跑過..........

叕叕叕叕叕一堆神獸跑過..........

叕叕叕叕叕叕一堆神獸跑過..........

溜達一圈,泡杯咖啡,回來之後,去掉註釋,重啓.可以正常使用了.....

HIVE 權限配置 [沒有趟過坑的人生是不完美的]

1.背景

2..命令行權限配置

2. 使用thrift方式,獲取元數據,直接操作數據. [Presto]

3.beeline,hiveserver2 ,Hive on Spark 權限設置

dolphinscheduler-1.3.1 Task任務類型結構梳理

阿里Java開發手冊[泰山版] 關鍵速記 ?

Homebrew/Linuxbrew 更換數據源爲: 清華鏡像

MacOS 編譯OpenJDK13

配置yanagishima 既能連接Presto 又能連接Hive

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結