在hive中使用
show functions 查看所有hive支持的函數
describe function xxx 查看具體xxx函數的定義
以下表格是hive1.1.0支持的所有函數及定義,
其實主要用到的函數並不多,後續另外詳細列舉平時常用的函數例子。
hive function | describe |
---|---|
! | ! a - Logical not |
!= | a != b - Returns TRUE if a is not equal to b |
% | a % b - Returns the remainder when dividing a by b |
& | a & b - Bitwise and |
* | a * b - Multiplies a by b |
+ | a + b - Returns a+b |
- | a - b - Returns the difference a-b |
/ | a / b - Divide a by b |
< | a < b - Returns TRUE if a is less than b |
<= | a <= b - Returns TRUE if a is not greater than b |
<=> | a <=> b - Returns same result with EQUAL(=) operator for non-null operands, but returns TRUE if both are NULL, FALSE if one of the them is NULL |
<> | a <> b - Returns TRUE if a is not equal to b |
= | a = b - Returns TRUE if a equals b and false otherwise |
== | a == b - Returns TRUE if a equals b and false otherwise |
> | a > b - Returns TRUE if a is greater than b |
>= | a >= b - Returns TRUE if a is not smaller than b |
^ | a ^ b - Bitwise exclusive or |
abs | abs(x) - returns the absolute value of x |
acos | acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise |
add_months | add_months |
and | a and b - Logical and |
array | array(n0, n1…) - Creates an array with the given elements |
array_contains | array_contains(array, value) - Returns TRUE if the array contains value. |
ascii | ascii(str) - returns the numeric value of the first character of str |
asin | asin(x) - returns the arc sine of x if -1<=x<=1 or NULL otherwise |
assert_true | assert_true(condition) - Throw an exception if ‘condition’ is not true. |
atan | atan(x) - returns the atan (arctan) of x (x is in radians) |
avg | avg(x) - Returns the mean of a set of numbers |
base64 | base64(bin) - Convert the argument from binary to a base 64 string |
between | between a [NOT] BETWEEN b AND c - evaluate if a is [not] in between b and c |
bin | bin(n) - returns n in binary |
case | CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f |
cbrt | cbrt(double) - Returns the cube root of a double value. |
ceil | ceil(x) - Find the smallest integer not smaller than x |
ceiling | ceiling(x) - Find the smallest integer not smaller than x |
coalesce | coalesce(a1, a2, …) - Returns the first non-null argument |
collect_list | collect_list(x) - Returns a list of objects with duplicates |
collect_set | collect_set(x) - Returns a set of objects with duplicate elements eliminated |
compute_stats | compute_stats(x) - Returns the statistical summary of a set of primitive type values. |
concat | concat(str1, str2, … strN) - returns the concatenation of str1, str2, … strN or concat(bin1, bin2, … binN) - returns the concatenation of bytes in binary data bin1, bin2, … binN |
concat_ws | concat_ws(separator, [string |
context_ngrams | context_ngrams(expr, array<string1, string2, …>, k, pf) estimates the top-k most frequent n-grams that fit into the specified context. The second parameter specifies a string of words that specify the positions of the n-gram elements, with a null value standing in for a ‘blank’ that must be filled by an n-gram element. |
conv | conv(num, from_base, to_base) - convert num from from_base to to_base |
corr | corr(x,y) - Returns the Pearson coefficient of correlation between a set of number pairs |
cos | cos(x) - returns the cosine of x (x is in radians) |
count | count(*) - Returns the total number of retrieved rows, including rows containing NULL values. count(expr) - Returns the number of rows for which the supplied expression is non-NULL. count(DISTINCT expr[, expr…]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. |
covar_pop | covar_pop(x,y) - Returns the population covariance of a set of number pairs |
covar_samp | covar_samp(x,y) - Returns the sample covariance of a set of number pairs |
crc32 | crc32(str or bin) - Computes a cyclic redundancy check value for string or binary argument and returns bigint value. |
create_union | create_union(tag, obj1, obj2, obj3, …) - Creates a union with the object for given tag |
cume_dist | There is no documentation for function ‘cume_dist’ |
current_database | current_database() - returns currently using database name |
current_date | current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value. |
current_timestamp | current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value. |
current_user | current_user() - Returns current user name |
date_add | date_add(start_date, num_days) - Returns the date that is num_days after start_date. |
date_format | date_format(date/timestamp/string, fmt) - converts a date/timestamp/string to a value of string in the format specified by the date format fmt. |
date_sub | date_sub(start_date, num_days) - Returns the date that is num_days before start_date. |
datediff | datediff(date1, date2) - Returns the number of days between date1 and date2 |
day | day(date) - Returns the date of the month of date |
dayofmonth | dayofmonth(date) - Returns the date of the month of date |
dayofweek | dayofweek(param) - Returns the day of the week of date/timestamp (1 = Sunday, 2 = Monday, …, 7 = Saturday) |
decode | decode(bin, str) - Decode the first argument using the second argument character set |
degrees | degrees(x) - Converts radians to degrees |
dense_rank | There is no documentation for function ‘dense_rank’ |
div | a div b - Divide a by b rounded to the long integer |
e | e() - returns E |
elt | elt(n, str1, str2, …) - returns the n-th string |
encode | encode(str, str) - Encode the first argument using the second argument character set |
ewah_bitmap | ewah_bitmap(expr) - Returns an EWAH-compressed bitmap representation of a column. |
ewah_bitmap_and | ewah_bitmap_and(b1, b2) - Return an EWAH-compressed bitmap that is the bitwise AND of two bitmaps. |
ewah_bitmap_empty | ewah_bitmap_empty(bitmap) - Predicate that tests whether an EWAH-compressed bitmap is all zeros |
ewah_bitmap_or | ewah_bitmap_or(b1, b2) - Return an EWAH-compressed bitmap that is the bitwise OR of two bitmaps. |
exp | exp(x) - Returns e to the power of x |
explode | explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns |
field | field(str, str1, str2, …) - returns the index of str in the str1,str2,… list or 0 if not found |
find_in_set | find_in_set(str,str_array) - Returns the first occurrence of str in str_array where str_array is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument has any commas. |
first_value | There is no documentation for function ‘first_value’ |
floor | floor(x) - Find the largest integer not greater than x |
format_number | format_number(X, D) - Formats the number X to a format like ‘#,###,###.##’, rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. This is supposed to function like MySQL’s FORMAT |
from_unixtime | from_unixtime(unix_time, format) - returns unix_time in the specified format |
from_utc_timestamp | from_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0) |
get_json_object | get_json_object(json_txt, path) - Extract a json object from path |
greatest | greatest(v1, v2, …) - Returns the greatest value in a list of values |
hash | hash(a1, a2, …) - Returns a hash value of the arguments |
hex | hex(n, bin, or str) - Convert the argument to hexadecimal |
histogram_numeric | histogram_numeric(expr, nb) - Computes a histogram on numeric ‘expr’ using nb bins. |
hour | hour(date) - Returns the hour of date |
if | IF(expr1,expr2,expr3) - If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL) then IF() returns expr2; otherwise it returns expr3. IF() returns a numeric or string value, depending on the context in which it is used. |
in | test in(val1, val2…) - returns true if test equals any valN |
in_file | in_file(str, filename) - Returns true if str appears in the file |
index | index(a, n) - Returns the n-th element of a |
initcap | initcap(str) - Returns str, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space. |
inline | inline( ARRAY( STRUCT()[,STRUCT()] - explodes and array and struct into a table |
instr | instr(str, substr) - Returns the index of the first occurance of substr in str |
isnotnull | isnotnull a - Returns true if a is not NULL and false otherwise |
isnull | isnull a - Returns true if a is NULL and false otherwise |
java_method | java_method(class,method[,arg1[,arg2…]]) calls method with reflection |
json_tuple | json_tuple(jsonStr, p1, p2, …, pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string. |
lag | LAG (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause); The LAG function is used to access data from a previous row. |
last_day | last_day(date) - Returns the last day of the month which the date belongs to. |
last_value | There is no documentation for function ‘last_value’ |
lcase | lcase(str) - Returns str with all characters changed to lowercase |
lead | There is no documentation for function ‘last_value’ |
least | least(v1, v2, …) - Returns the least value in a list of values |
length | length(str |
levenshtein | levenshtein(str1, str2) - This function calculates the Levenshtein distance between two strings. |
like | like(str, pattern) - Checks if str matches pattern |
ln | ln(x) - Returns the natural logarithm of x |
locate | locate(substr, str[, pos]) - Returns the position of the first occurance of substr in str after position pos |
log | log([b], x) - Returns the logarithm of x with base b |
log10 | log10(x) - Returns the logarithm of x with base 10 |
log2 | log2(x) - Returns the logarithm of x with base 2 |
logged_in_user | logged_in_user() - Returns logged in user name |
lower | lower(str) - Returns str with all characters changed to lowercase |
lpad | lpad(str, len, pad) - Returns str, left-padded with pad to a length of len |
ltrim | ltrim(str) - Removes the leading space characters from str |
map | map(key0, value0, key1, value1…) - Creates a map with the given key/value pairs |
map_keys | map_keys(map) - Returns an unordered array containing the keys of the input map. |
map_values | map_values(map) - Returns an unordered array containing the values of the input map. |
matchpath | There is no documentation for function ‘last_value’ |
max | max(expr) - Returns the maximum value of expr |
md5 | md5(str or bin) - Calculates an MD5 128-bit checksum for the string or binary. |
min | min(expr) - Returns the minimum value of expr |
minute | minute(date) - Returns the minute of date |
month | month(date) - Returns the month of date |
months_between | months_between(date1, date2) - returns number of months between dates date1 and date2 |
named_struct | named_struct(name1, val1, name2, val2, …) - Creates a struct with the given field names and values |
negative | negative a - Returns -a |
next_day | next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated. |
ngrams | ngrams(expr, n, k, pf) - Estimates the top-k n-grams in rows that consist of sequences of strings, represented as arrays of strings, or arrays of arrays of strings. ‘pf’ is an optional precision factor that controls memory usage. |
noop | There is no documentation for function ‘row_number’ |
noopstreaming | There is no documentation for function ‘row_number’ |
noopwithmap | There is no documentation for function ‘row_number’ |
noopwithmapstreaming | There is no documentation for function ‘row_number’ |
not | not a - Logical not |
ntile | There is no documentation for function ‘row_number’ |
nvl | nvl(value,default_value) - Returns default value if value is null else returns value |
or | a or b - Logical or |
parse_url | parse_url(url, partToExtract[, key]) - extracts a part from a URL |
parse_url_tuple | parse_url_tuple(url, partname1, partname2, …, partnameN) - extracts N (N>=1) parts from a URL. |
percent_rank | There is no documentation for function ‘percent_rank’ |
percentile | percentile(expr, pc) - Returns the percentile(s) of expr at pc (range: [0,1]).pc can be a double or double array |
percentile_approx | percentile_approx(expr, pc, [nb]) - For very large data, computes an approximate percentile value from a histogram, using the optional argument [nb] as the number of histogram bins to use. A higher value of nb results in a more accurate approximation, at the cost of higher memory usage. |
pi | pi() - returns pi |
pmod | a pmod b - Compute the positive modulo |
posexplode | posexplode(a) - behaves like explode for arrays, but includes the position of items in the original array |
positive | positive a - Returns a |
pow | pow(x1, x2) - raise x1 to the power of x2 |
power | power(x1, x2) - raise x1 to the power of x2 |
printf | printf(String format, Obj… args) - function that can format strings according to printf-style format strings |
radians | radians(x) - Converts degrees to radians |
rand | rand([seed]) - Returns a pseudorandom number between 0 and 1 |
rank | There is no documentation for function ‘row_number’ |
reflect | There is no documentation for function ‘row_number’ |
reflect2 | There is no documentation for function ‘row_number’ |
regexp | str regexp regexp - Returns true if str matches regexp and false otherwise |
regexp_extract | regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp |
regexp_replace | regexp_replace(str, regexp, rep) - replace all substrings of str that match regexp with rep |
repeat | repeat(str, n) - repeat str n times |
reverse | reverse(str) - reverse str |
rlike | str rlike regexp - Returns true if str matches regexp and false otherwise |
round | round(x[, d]) - round x to d decimal places |
row_number | There is no documentation for function ‘row_number’ |
rpad | rpad(str, len, pad) - Returns str, right-padded with pad to a length of len |
rtrim | rtrim(str) - Removes the trailing space characters from str |
second | second(date) - Returns the second of date |
sentences | sentences(str, lang, country) - Splits str into arrays of sentences, where each sentence is an array of words. The ‘lang’ and’country’ arguments are optional, and if omitted, the default locale is used. |
sha2 | sha2(string/binary, len) - Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). |
sign | sign(x) - returns the sign of x ) |
sin | sin(x) - returns the sine of x (x is in radians) |
size | size(a) - Returns the size of a |
sort_array | sort_array(array(obj1, obj2,…)) - Sorts the input array in ascending order according to the natural ordering of the array elements. |
soundex | soundex(string) - Returns soundex code of the string. |
space | space(n) - returns n spaces |
split | split(str, regex) - Splits str around occurances that match regex |
sqrt | sqrt(x) - returns the square root of x |
stack | stack(n, cols…) - turns k columns into n rows of size k/n each |
std | std(x) - Returns the standard deviation of a set of numbers |
stddev | stddev(x) - Returns the standard deviation of a set of numbers |
stddev_pop | stddev_pop(x) - Returns the standard deviation of a set of numbers |
stddev_samp | stddev_samp(x) - Returns the sample standard deviation of a set of numbers |
str_to_map | str_to_map(text, delimiter1, delimiter2) - Creates a map by parsing text |
struct | struct(col1, col2, col3, …) - Creates a struct with the given field values |
substr | substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len |
substring | substring(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstring(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len |
sum | sum(x) - Returns the sum of a set of numbers |
tan | tan(x) - returns the tangent of x (x is in radians) |
to_date | to_date(expr) - Extracts the date part of the date or datetime expression expr |
to_unix_timestamp | to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp |
to_utc_timestamp | to_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0) |
translate | translate(input, from, to) - translates the input string by replacing the characters present in the from string with the corresponding characters in the to string |
trim | trim(str) - Removes the leading and trailing space characters from str |
trunc | trunc(date, fmt) - Returns returns date with the time portion of the day truncated to the unit specified by the format model fmt. If you omit fmt, then date is truncated to the nearest day. It now only supports ‘MONTH’/‘MON’/‘MM’ and ‘YEAR’/‘YYYY’/‘YY’ as format. |
ucase | ucase(str) - Returns str with all characters changed to uppercase |
unbase64 | unbase64(str) - Convert the argument from a base 64 string to binary |
unhex | unhex(str) - Converts hexadecimal argument to binary |
unix_timestamp | unix_timestamp(date[, pattern]) - Converts the time to a number |
upper | upper(str) - Returns str with all characters changed to uppercase |
uuid | uuid() - Returns a universally unique identifier (UUID) string. |
var_pop | var_pop(x) - Returns the variance of a set of numbers |
var_samp | var_samp(x) - Returns the sample variance of a set of numbers |
variance | variance(x) - Returns the variance of a set of numbers |
version | version() - Returns the Hive build version string - includes base version and revision. |
weekofyear | weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days. |
when | CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = true, returns b; when c = true, return d; else return e |
windowingtablefunction | There is no documentation for function ‘windowingtablefunction’ |
xpath | xpath(xml, xpath) - Returns a string array of values within xml nodes that match the xpath expression |
xpath_boolean | xpath_boolean(xml, xpath) - Evaluates a boolean xpath expression |
xpath_double | xpath_double(xml, xpath) - Returns a double value that matches the xpath expression |
xpath_float | xpath_float(xml, xpath) - Returns a float value that matches the xpath expression |
xpath_int | xpath_int(xml, xpath) - Returns an integer value that matches the xpath expression |
xpath_long | xpath_long(xml, xpath) - Returns a long value that matches the xpath expression |
xpath_number | xpath_number(xml, xpath) - Returns a double value that matches the xpath expression |
xpath_short | xpath_short(xml, xpath) - Returns a short value that matches the xpath expression |
xpath_string | xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the xpath expression |
year | year(date) - Returns the year of date |
| | a | b - Bitwise or |
~ | ~ n - Bitwise not |