（记）org.apache.hadoop.hive.serde

原創

2020-03-13 06:41

1、Hive官方建表语句

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)]
INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

2、Hive序列化以及反序列化的过程

1、序列化
Row object –> Serializer –> <key, value> –> OutputFileFormat –> HDFS files
2、反序列化
HDFS files –> InputFileFormat –> <key, value> –> Deserializer –> Row object

3、Hive-SerDe

SerDe 类型	应用
LazySimpleSerDe: SerDe(`org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`) ，用来处理文本文件格式：`TEXTFILE`	jdbc:hive2://> CREATE TABLE test_serde_lz . . . . . . .> STORED AS TEXTFILE AS . . . . . . .> SELECT name from employee;
ColumnarSerDe: 用来处理 RCFile 的内置 SerDe	jdbc:hive2://> CREATE TABLE test_serde_cs . . . . . . .> ROW FORMAT SERDE . . . . . . .> 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' . . . . . . .> STORED ASRCFile AS . . . . . . .> SELECT name from employee
RegexSerDe: 用来处理文本文件的内置 JAVA 正则表达式 SerDe	jdbc:hive2://> CREATE TABLE test_serde_rex( . . . . . . .> name string, . . . . . . .> sex string, . . . . . . .> age string . . . . . . .> ) . . . . . . .> ROW FORMAT SERDE . . . . . . .> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' . . . . . . .> WITH SERDEPROPERTIES( . . . . . . .> 'input.regex' = '([^,]),([^,]),([^,]*)', . . . . . . .> 'output.format.string' = '%1$s %2$s %3$s' . . . . . . .> ) . . . . . . .> STORED AS TEXTFILE;
HBaseSerDe: 内置的 SerDe，可以让 Hive 跟 HBase 进行集成。我们可以利用 HBaseSerDe 来将 Hive 表存储到 HBase 中。	jdbc:hive2://> CREATE TABLE test_serde_hb( . . . . . . .> id string, . . . . . . .> name string, . . . . . . .> sex string, . . . . . . .> age string . . . . . . .> ) . . . . . . .> ROW FORMAT SERDE . . . . . . .> 'org.apache.hadoop.hive.hbase.HBaseSerDe' . . . . . . .> STORED BY . . . . . . .> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' . . . . . . .> WITH SERDEPROPERTIES ( . . . . . . .> "hbase.columns.mapping"= . . . . . . .> ":key,info:name,info:sex,info:age" . . . . . . .> ) . . . . . . .> TBLPROPERTIES("hbase.table.name" = "test_serde");
AvroSerDe: 用来在 Hive 表中读写 Avro 数据格式的内置 SerDe(参考：http://avro.apache.org/) Avro 是一个 RPC 和序列化框架，从 Hive 0.14.0 版本才本地支持 Avro ：`CREATE TABLE ... STORED AS AVRO`	jdbc:hive2://> CREATE TABLE test_serde_avro( . . . . . . .> name string, . . . . . . .> sex string, . . . . . . .> age string . . . . . . .> ) . . . . . . .> ROW FORMAT SERDE . . . . . . .> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' . . . . . . .> STORED AS INPUTFORMAT . . . . . . .> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' . . . . . . .> OUTPUTFORMAT . . . . . . .> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' . . . . . . .>; 详细内容请参考：https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
ParquetHiveSerDe: 用来在 Hive 中读写 Parquet 数据格式的内置 SerDe。从 Hive 0.13.0 版本开始本地支持。	jdbc:hive2://> CREATE TABLE test_serde_parquet . . . . . . .> STORED AS PARQUET AS . . . . . . .> SELECT name from employee;
OpenCSVSerDe: 用来读写 CSV 数据的 SerDe. 从 Hive 0.14.0 版本才发布的。我们可以通过从 Github 中下载源码进行安装(https://github.com/ogrodnek/csv-serde )	jdbc:hive2://> CREATE TABLE test_serde_csv( . . . . . . .> name string, . . . . . . .> sex string, . . . . . . .> age string . . . . . . .>) . . . . . . .> ROW FORMAT SERDE . . . . . . .> 'org.apache.hadoop.hive.serde2.OpenCSVSerde' . . . . . . .> STORED AS TEXTFILE;
JSONSerDe: 这是一个第三方的 SerDe，用来利用 Hive 读取 JSON 数据记录。你可以通知下载源码进行安装(https://github.com/rcongiu/Hive-JSON-Serde)	jdbc:hive2://> CREATE TABLE test_serde_js( . . . . . . .> name string, . . . . . . .> sex string, . . . . . . .> age string . . . . . . .> ) . . . . . . .> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' . . . . . . .> STORED AS TEXTFILE;

4、官方案例

RegEx

ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "<regex>"
)
STORED AS TEXTFILE;

Json

ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE

ADD JAR /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;

CREATE TABLE my_table(a string, b bigint, ...)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;

CSV/TSV

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;

5、序列化反序列话方法

1、序列化
public abstract Writable serialize(Object obj, ObjectInspector objInspector)
      throws SerDeException;
2、反序列化
public abstract Object deserialize(Writable blob) throws SerDeException;

感谢：https://blog.csdn.net/sinat_29581293/article/details/82106703

参考Hive讲解网站：

https://programtalk.com/java-api-usage-examples/org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe/

https://data-flair.training/blogs/hive-serde/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

（记）org.apache.hadoop.hive.serde

《Python进阶》学习笔记

一个docker容器暴露多个端口

leetcode 60 排列序列

Leetcode 3161. 物块放置查询

微服务实践之使用 Visual Studio 2022 调试Dapr 应用程序

wpf附加属性理解 WPF附加属性

centos7下的hadoop3.2.0分佈式搭建

jdk1.8，數組、集合、對象互轉

apache-hive-3.1.1-bin、apache-tez-0.9.2-bin以及遇到的一些問題

springMVC+hibernate框架實現後臺數據導出至excel表格中，並進行條件導出數據

上海大數據面試經歷（etl數據清洗崗位）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結