Hive 自定義函數函數

使用內置的函數無法完成分析任務，那麼需要寫自定義函數

show functions; //查看自帶的所有的內置函數

desc function upper; //查看具體的某個函數的用法

desc function extended upper; //帶有具體案例

##分三類

## UDF 一進一出處理原文件內容某些字段包含 [] ""

## UDAF 多進一出 sum() avg() max() min()

## UDTF 一進多出 ip -> 國家省市

UDF函數的開發

** 必須繼承UDF類

** 重寫evaluate函數支持重載

** 必須要有返回類型,可以返回null,但是返回類型不能爲void

** 建議使用Text/LongWritable

## 1.創建一個maven項目

## 2.修改pom.xml文件

<groupId>org.apache.hive</groupId>

</dependency>

<groupId>org.apache.hive</groupId>

</dependency>

<dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-common</artifactId>
   <version>2.5.0</version>
</dependency>

## 3.替換repository

## 4.包含hive的依賴的jar的repository

## 代碼實現(注意必須實現一個名爲evaluate的方法)

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class SalaryUDF extends UDF{

public Text evaluate(Text salaryText){

Text text = new Text();

//1.判斷salaryText是否爲null

if (salaryText == null) {

return null;

}

//2.判斷salaryText是否可轉換爲一個double類型

double salary = 0;

try {

salary = Double.valueOf(salaryText.toString());

} catch (NumberFormatException e) {

e.printStackTrace();

return null;

}

if (salary > 3000) {

text.set("大於3000的一組...");

return text;

}else if (salary <= 3000 && salary > 2000) {

text.set("小於等於3000並且大於2000的一組...");

return text;

}else {

text.set("小於等於2000的一組");

return text;

}

5.編寫使用UDF

1、編程

2、把程序到出爲jar包放到目標機器上去：

hive> add jar /home/beifeng/jars/lower.jar ;

3、創建臨時函數：

hive> CREATE TEMPORARY FUNCTION my_lower AS ‘包名.類名';

4、使用指定函數：

hive> show fuctions ;

hive> select my_lower(ename) from emp ;

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive 自定義函數函數

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

搭建SQOOP環境

修改HDFS上文件的權限

hadoop僞分佈模式環境安裝

Sqoop使用和簡介

hadoop僞分佈式環境apache版本切換爲CDH

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結