查找hive表的存儲位置並查看錶文件大小及分區文件名

（作者：陳玓玏）

有時候我們需要查看Hive表對應文件的文件大小，那麼分兩步：

知道Hive表在HDFS中的存儲位置；
查看Hive表對應的文件大小。

1. 知道Hive表在HDFS中的存儲位置
使用show create table tableName來查看：

0: jdbc:hive2://nfjd-hadoop02-node46.jpushoa.> show create table tmp.cdl_push_r;
INFO  : Compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : UserName: chendl
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.398 seconds
INFO  : Executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.042 seconds
INFO  : OK
CREATE TABLE `tmp.cdl_push_r`(
  `imei` string, 
  `recall_date` bigint, 
  `feature` string, 
  `value` bigint)
PARTITIONED BY ( 
  `customer_name` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r'
TBLPROPERTIES (
  'spark.sql.create.version'='2.4.3', 
  'spark.sql.sources.schema.numPartCols'='1', 
  'spark.sql.sources.schema.numParts'='1', 
  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"im\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"recall_date\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"feature\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"value\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"customer_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}', 
  'spark.sql.sources.schema.partCol.0'='customer_name', 
  'transient_lastDdlTime'='1570433559')
22 rows selected (0.514 seconds)

結果中LOCATION爲’hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r’，即Hive表存儲的位置。
2. 查看Hive表對應的文件大小
根據找到的Hive表存儲位置，通過hdfs命令查看錶的大小，最後一個參數直接複製上面的LOCATION即可：

[chendl@cdl]$ hadoop fs -du -s -h hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
222.5 M  445.0 M  hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r

也可以查看此表有多少個分區：

[chendl@cdl]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
Found 1 items
drwxrwx---+  - chendl hive          0 2019-10-10 03:04 hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r/customer_name=test190924
>

參考資料： https://blog.csdn.net/lilychen1983/article/details/80912876

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

查找hive表的存儲位置並查看錶文件大小及分區文件名

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

設置jupyter可啓動python2或python3作爲kernel

Elastic Search中如何查看索引數據？

Hive解決return code 3問題

查找hive表的存儲位置並查看錶文件大小及分區文件名

Json錯誤JSONDecodeError: Extra data解決方案

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結