hive任務提交的相關權限認證詳析

最近在研究Hue，遇到一個問題，在Hive Editor寫一個HQL，提交後會報權限錯誤，類似這樣的

Authorization failed:No privilege 'Select' found for inputs {database:xxx, table:xxx, columnName:xxx}. Use show grant to get more details.

Hue的登錄用戶是hadoop,使用cli方式查詢的時候，是沒問題的，但是使用Hue連接HiveServer2的方式，就查詢不了對應的表了，排除Hue的干擾，使用Beeline來連接HiveServer2，同樣報權限的錯誤，堆棧信息如下圖

根據堆棧信息大概梳理了下源碼(只列出比較重要的代碼)，Hive提交SQL的權限驗證流程如下

    Driver.compile(String command, boolean resetTaskIds){
      if (HiveConf.getBoolVar(conf,
          HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) {
        try {
          perfLogger.PerfLogBegin(LOG, PerfLogger.DO_AUTHORIZATION);
          //進行權限校驗
          doAuthorization(sem);
        }
     }
     Driver.doAuthorization(BaseSemanticAnalyzer sem){
        //判斷op的操作類型爲QUERY
        if (op.equals(HiveOperation.CREATETABLE_AS_SELECT)
              || op.equals(HiveOperation.QUERY)) {
            if (cols != null && cols.size() > 0) {
                //進行更具體的驗證
                ss.getAuthorizer().authorize(tbl, null, cols,
                    op.getInputRequiredPrivileges(), null);
            }
         }
     }
     BitSetCheckedAuthorizationProvider.authorize(Table table, Partition part, List<String> columns,Privilege[] inputRequiredPriv, Privilege[] outputRequiredPriv){
            //驗證用戶對DB和Table的權限
            authorizeUserDBAndTable(table, inputRequiredPriv, outputRequiredPriv,inputCheck,outputCheck)
            //驗證用戶對Table中column的權限
            for (String col : columns) {
                PrincipalPrivilegeSet partColumnPrivileges = hive_db
                      .get_privilege_set(HiveObjectType.COLUMN, table.getDbName(),table.getTableName(),partValues, col,this.getAuthenticator().getUserName(), this.getAuthenticator().getGroupNames());
                authorizePrivileges(partColumnPrivileges, inputRequiredPriv, inputCheck2,
                       outputRequiredPriv, outputCheck2);
            }
     }

Hive的權限驗證首先會調用authorizeUserDBAndTable驗證用戶是否對訪問的DB和Table有訪問權限，對應到MetaStore的DB_PRIVS和TBL_PRIVS表，在進行驗證的時候，會通過thrift與HiveMetaStore進程進行交互來獲取MetaStore庫中對應表的相關信息。如果用戶對更大粒度的資源有訪問權限，則會直接返回，不會再繼續進行更細粒度的驗證，也就是說如果用戶對DB有相關的權限，則不會繼續驗證對Table和Column的訪問權限。

查看了下DB_PRIVS表，hadoop用戶對訪問的數據庫是有Select權限的，所以再傳統CLI模式下訪問是沒有問題的。看上述代碼也都是在預料之中，因爲實際上CLI模式和HiveServer模式的權限驗證是一套代碼。決定remote debug下，進而發現this.getAuthenticator().getUserName()的值爲hive，也即是啓動HiveServer2的用戶，而不是提交SQL的用戶hadoop,順藤摸瓜，找到了設置authenticator相關屬性的代碼

    SessionState.start(SessionState startSs) {
        //實例化默認的HadoopDefaultAuthenticator,方法內部，使用ReflectionUtils反射加載類的時候，進而調用了HadoopDefaultAuthenticator.setConf方法
        startSs.authenticator=HiveUtils.getAuthenticator(startSs.getConf(),HiveConf.ConfVars.HIVE_AUTHENTICATOR_MANAGER);
    }
    HadoopDefaultAuthenticator.setConf(Configuration conf){
        ugi = ShimLoader.getHadoopShims().getUGIForConf(conf);
    }
    HadoopShimsSecure.getUGIForConf(Configuration conf) throws IOException {
        return UserGroupInformation.getCurrentUser();
    }
    
UserGroupInformation.getCurrentUser() throws IOException {
    AccessControlContext context = AccessController.getContext();
    Subject subject = Subject.getSubject(context);
    //HiveServer剛啓動的時候,subject爲空,調用getLoginUser
    if (subject == null || subject.getPrincipals(User.class).isEmpty()) {
      return getLoginUser();
    } else {
      return new UserGroupInformation(subject);
    }
  }
UserGroupInformation.getLoginUser() {
    if (loginUser == null) {
      try {
        Subject subject = new Subject();
        LoginContext login;
        if (isSecurityEnabled()) {
          login = newLoginContext(HadoopConfiguration.USER_KERBEROS_CONFIG_NAME,
              subject, new HadoopConfiguration());
        } else {
          login = newLoginContext(HadoopConfiguration.SIMPLE_CONFIG_NAME, 
              subject, new HadoopConfiguration());
        }
        login.login();
        loginUser = new UserGroupInformation(subject);
        loginUser.setLogin(login);
        loginUser.setAuthenticationMethod(isSecurityEnabled() ?
                                          AuthenticationMethod.KERBEROS :
                                          AuthenticationMethod.SIMPLE);
        loginUser = new UserGroupInformation(login.getSubject());
        String fileLocation = System.getenv(HADOOP_TOKEN_FILE_LOCATION);
        if (fileLocation != null) {
          Credentials cred = Credentials.readTokenStorageFile(
              new File(fileLocation), conf);
          loginUser.addCredentials(cred);
        }
        loginUser.spawnAutoRenewalThreadForUserCreds();
      } catch (LoginException le) {
        LOG.debug("failure to login", le);
        throw new IOException("failure to login", le);
      }
      if (LOG.isDebugEnabled()) {
        LOG.debug("UGI loginUser:"+loginUser);
      }
    }
    return loginUser;
  }

HiveServer剛啓動時第一次調用getLoginUser()，loginUser爲空，接下來會創建LoginContext並調用其login方法，login方法最終會調用HadoopLoginModule的commit()方法。commit()方法的大致邏輯是這樣的

1.如果使用了kerberos，則爲kerberos登陸用戶

2.如果kerberos用戶爲空並且沒有開啓security，則從系統環境變量中取HADOOP_USER_NAME的值

3.如果環境變量中沒有設置HADOOP_USER_NAME，則使用系統用戶，即啓動HiveServer2進程的用戶

後續使用的用戶即爲啓動HiveServer2的用戶，所以authenticator的UserName屬性值即爲hive。所以使用hive去查MetaStore的相關權限表的時候，查不到相關的信息，授權不通過。除非授予hive用戶相關的權限。解決的辦法要麼爲hive用戶賦予相關的權限，可是這樣，權限驗證就失去了意義。更好的辦法實現自己的hive.security.authenticator.manager來實現根據提交SQL的用戶去進行權限驗證。

hive任務提交的相關權限認證詳析

hive多用戶使用的配置

MRv2內存監控強殺Container問題解決

利用QJM實現HDFS自動主從切換(HA Automatic Failover)源碼詳析

我的友情鏈接

spark sql on hive初探

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結