查找hive表的存儲位置並查看錶文件大小及分區文件名

(作者:陳玓玏)

有時候我們需要查看Hive表對應文件的文件大小,那麼分兩步:

  1. 知道Hive表在HDFS中的存儲位置;
  2. 查看Hive表對應的文件大小。

1. 知道Hive表在HDFS中的存儲位置
使用show create table tableName來查看:

0: jdbc:hive2://nfjd-hadoop02-node46.jpushoa.> show create table tmp.cdl_push_r;
INFO  : Compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : UserName: chendl
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.398 seconds
INFO  : Executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.042 seconds
INFO  : OK
CREATE TABLE `tmp.cdl_push_r`(
  `imei` string, 
  `recall_date` bigint, 
  `feature` string, 
  `value` bigint)
PARTITIONED BY ( 
  `customer_name` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r'
TBLPROPERTIES (
  'spark.sql.create.version'='2.4.3', 
  'spark.sql.sources.schema.numPartCols'='1', 
  'spark.sql.sources.schema.numParts'='1', 
  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"im\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"recall_date\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"feature\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"value\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"customer_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}', 
  'spark.sql.sources.schema.partCol.0'='customer_name', 
  'transient_lastDdlTime'='1570433559')
22 rows selected (0.514 seconds)

結果中LOCATION爲’hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r’,即Hive表存儲的位置。
2. 查看Hive表對應的文件大小
根據找到的Hive表存儲位置,通過hdfs命令查看錶的大小,最後一個參數直接複製上面的LOCATION即可:

[chendl@cdl]$ hadoop fs -du -s -h hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
222.5 M  445.0 M  hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r

也可以查看此表有多少個分區:

[chendl@cdl]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
Found 1 items
drwxrwx---+  - chendl hive          0 2019-10-10 03:04 hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r/customer_name=test190924
>

參考資料: https://blog.csdn.net/lilychen1983/article/details/80912876

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章