Hive1.1.0升級至2.3.6 踩坑記錄

1、union all 左右字段類型不匹配

Hive嘗試跨Hive類型組執行隱式轉換。隱式轉換支持類型如下表:
在這裏插入圖片描述
例:

hive> select 1 as c2,    2 as c2
    > union all
    > select 1.0 as c1, "2" as c1;
FAILED: SemanticException Schema of both sides of union should match: 
Column _col1 is of type int on first table and type string on second table. 
Cannot tell the position of null AST.

需要cast(“2” as int)強制將"2"轉換爲int

2、date_add和date_sub函數返回類型不再是string而是date

這會導致在使用自己寫的一些UDF時,拋類型不匹配異常。

3、union all之間不能有order by,order by只能出現在union all之後

雖然order by之後再union all也沒什麼意義,但是就是有這種SQL出現在線上環境。

3、表/字段名或別名與保留字衝突

以下保留字若需要作爲表/字段名或別名,需要加上反引號``

ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH, COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START, CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES, DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS
4、設置參數大小寫敏感

在這裏插入圖片描述

5、不指定類型 null類型已不兼容

在這裏插入圖片描述

6、元數據兼容升級

對Hive1.1元數據庫進行以下操作,然後mysqldump -t導出數據,再刪庫重新用Hive2.3 schemaTool初始化元數據,最後將數據導入即可。

alter table partitions drop foreign key PARTITIONS_FK3;
alter table partitions drop link_target_id;
alter table tbls drop foreign key TBLS_FK3;
alter table tbls drop link_target_id;
alter table tbls drop OWNER_TYPE;
update roles set role_name=lower(role_name);

alter table tbls add `IS_REWRITE_ENABLED` bit(1) NOT NULL default '';

alter table notification_log add `MESSAGE_FORMAT` varchar(16) DEFAULT NULL;

drop table auth_user;
drop table tmp_user_map_d;
drop table version;

CREATE TABLE `key_constraints` (
`CHILD_CD_ID` bigint(20) DEFAULT NULL,
`CHILD_INTEGER_IDX` int(11) DEFAULT NULL,
`CHILD_TBL_ID` bigint(20) DEFAULT NULL,
`PARENT_CD_ID` bigint(20) NOT NULL,
`PARENT_INTEGER_IDX` int(11) NOT NULL,
`PARENT_TBL_ID` bigint(20) NOT NULL,
`POSITION` bigint(20) NOT NULL,
`CONSTRAINT_NAME` varchar(400) NOT NULL,
`CONSTRAINT_TYPE` smallint(6) NOT NULL,
`UPDATE_RULE` smallint(6) DEFAULT NULL,
`DELETE_RULE` smallint(6) DEFAULT NULL,
`ENABLE_VALIDATE_RELY` smallint(6) NOT NULL,
PRIMARY KEY (`CONSTRAINT_NAME`,`POSITION`),
KEY `CONSTRAINTS_PARENT_TABLE_ID_INDEX` (`PARENT_TBL_ID`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
7、對於用戶角色,Hive2.3會將所有新建角色全轉爲小寫存入mysql,而在鑑權的時候,又是直接用=比較的。這樣就會導致之前Hive1.1中有大寫字符的roles在鑑權的時候會失敗。

異常如下:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error granting roles for A to role B: null

所以需要將roles表的role_name全轉爲小寫:

update roles set role_name=lower(role_name)

在這裏插入圖片描述

8、SQL中單引號內的雙引號個數若爲奇數,則需要轉義

如:'date":"(.+)"}'出現在SQL中會報<EOF>的錯誤
需要改成'date\":\"(.+)\"}'

9、權限兼容,我們選擇先繼續使用Hive1.1.0的AuthorizerV1。

Hive2.3.6中默認是使用AuthorizerV2來進行鑑權,我們需要在hive-site.xml中設置以下參數,即可使用AuthorizerV1:

  <property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
    <description>
      The Hive client authorization manager class name. The user defined authorization class should implement
      interface org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider.
    </description>
  </property>

(除此之外,爲了兼容ALL權限,還需要修改部分代碼,這邊代碼就不貼了)

權限ALL不能用了,需要替換成select,insert,update,delete
如果開啓了權限認證(hive.security.authorization.enabled=true)
則不能再使用ALL來代替select,insert,update,delete
需要修改Hive庫:
檢查以下幾張表:
DB_PRIVS
PART_COL_PRIVS
PART_PRIVS
TBL_COL_PRIVS
TBL_PRIVS

需要寫個腳本,重新生成權限,將ALL轉化爲SELECT,UPDATE,INSERT,DELETE四條權限。
然後直接刪除所有的ALL權限。

delete from TBL_PRIVS where TBL_PRIV like 'ALL%';
delete from DB_PRIVS where DB_PRIV like 'ALL%';

除此之外,也要留意下面這個參數配置:

<property>
    <name>hive.security.authorization.createtable.owner.grants</name>
    <value>select,insert,update,delete</value>
    <description>
      The privileges automatically granted to the owner whenever a table gets created.
      An example like "select,drop" will grant select and drop privilege to the owner
      of the table. Note that the default gives the creator of a table no access to the
      table (but see HIVE-8067).
    </description>
  </property>
10、執行SQL出現報錯,懷疑是Hive Bug

報錯:

Task with the most failures(4):
-----
Task ID:
  task_1566481621886_4925755_m_000000

URL:
  http://TXIDC65-bigdata-resourcemanager1:8088/taskdetails.jsp?jobid=job_1566481621886_4925755&tipid=task_1566481621886_4925755_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
	... 11 more
Caused by: java.lang.NullPointerException
	at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
	at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
	at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:75)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:75)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
	... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

解決方法:
在org/apache/hadoop/hive/ql/plan/TableScanDesc.java
private List<String> neededNestedColumnPaths;
改爲private List<String> neededNestedColumnPaths = new ArrayList<String>();

11、MetaException(message:Version information not found in metastore. )

hive-site.xml中將hive.metastore.schema.verification設置成false。

12、權限表tbl_privs和db_privs中是否存在TBL_PRIV或者DB_PRIV爲CREATE的權限,如果存在則根據具體情況替換爲ALL/SELECT等權限。

在這裏插入圖片描述

13、權限版本不同

Hive1.1.0用的是AuthorizerV1:

<property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>

Hive2.3.6用的是AuthorizerV2:

<property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>

這兩者有相當大的差別,比如在一個庫中建表或刪表,V2中必須是該庫的owner,而V1則只需要有相應的權限即可。

解決辦法:
1、升級權限,不得不說V2管理權限更爲嚴謹,更符合數倉規範。
2、修改Operation2Privilege.java,將相應的operation的權限設爲你需要的。

14、User: root is not allowed to impersonate root
20/01/14 10:11:59 ERROR HiveConnection: Error opening session
org.apache.thrift.protocol.TProtocolException: Required field 'serverProtocolVersion' is unset! Struct:TOpenSessionResp(status:TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root:14:13, 

這個不算是遷移問題,可以參考這篇文章解決。

我們這邊是設置了

  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>

因此只需要使用su hive,用hive去啓動metastore和hiveserver即可。

15、set hive.vectorized.execution.enabled=false

這個參數默認是false ,向量化執行。
若開啓就會報錯:(原因待查)

Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
16、雙引號問題
from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date":"(.+)"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time

需要轉義:

from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date\":\"(.+)\"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章