Hive的UDF函数简单示例开发

原創

fa124607857

2020-06-29 08:12

Hive函数

1.1、内置函数

内容较多，见《Hive官方文档》

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

1）查看系统自带的函数

hive> show functions;

2）显示自带的函数的用法

hive>desc function upper;

3）详细显示自带的函数的用法

hive>desc function extended upper;

1.2Hive自定义函数

1）Hive 自带了一些函数，比如：max/min等，但是数量有限，自己可以通过自定义UDF来方便的扩展。

2）当Hive提供的内置函数无法满足你的业务处理需要时，此时就可以考虑使用用户自定义函数（UDF：user-defined function）。

3）根据用户自定义函数类别分为以下三种：

（1）UDF（User-Defined-Function）

一进一出

（2）UDAF（User-Defined Aggregation Function）

聚集函数，多进一出

类似于：count/max/min

（3）UDTF（User-Defined Table-Generating Functions）

一进多出

如lateral view explore()

4）官方文档地址

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5）编程步骤：

（1）继承org.apache.hadoop.hive.ql.UDF

（2）需要实现evaluate函数；evaluate函数支持重载；

6）注意事项

（1）UDF必须要有返回类型，可以返回null，但是返回类型不能为void；

（2）UDF中常用Text/LongWritable等类型，不推荐使用java类型；

1.3、UDF开发实例

简单UDF示例

第一步：创建maven java 工程，导入jar包

pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>bigdata_12</artifactId>
<groupId>cn.itcast.bigdata</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>day06_hiveudf</artifactId>

<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.14.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.14.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*/RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>

第二步：开发java类继承UDF，并重载evaluate方法

package cn.itcast.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class MyUDF extends UDF {
/**
* 继承UDF这个类然后重写evalute这个方法，方法的参数就是
* 我们传入进来的数据
* @param line
* @return
*/
public Text evaluate(final Text line){
if(null != line && !line.toString().equals("")){
//将我们的一行数据转换成大写
String str = line.toString().toUpperCase();
line.set(str);
return line;
}
line.set("");
return line;
}

}

第三步：将我们的项目打包，并上传到hive的lib目录下

第四步：添加我们的jar包

重命名我们的jar包名称

cd /export/servers/hive-1.1.0-cdh5.14.0/lib

mv original-day_06_hive_udf-1.0-SNAPSHOT.jar udf.jar

hive的客户端添加我们的jar包

add jar /export/servers/hive-1.1.0-cdh5.14.0/lib/udf.jar;

第五步：设置函数与我们的自定义函数关联

create temporary function tolowercase as 'cn.itcast.udf.ItcastUDF';

第六步：使用自定义函数

select tolowercase('abc');

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive的UDF函数简单示例开发

Hive函数

1.1、内置函数

1.2Hive自定义函数

1.3、UDF开发实例

flume+kafka整合採集數據案例

sparkStreaming介紹及sparkStreaming整合Kafka

Spark 數據全局排序實現以及RangePartitioner的使用示例

大數據開發之sqoop數據遷移工具簡介

Sigmoid函數求導

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結