hdfs quota物理空間轉邏輯空間

1.現有quota的設置與使用

1.setQuota客戶端到NN的主要流程

setQuota的shell入口示例如下:

hdfs dfsadmin -D fs.defaultFS=DClusterNmg4 -setQuota  1819200 hdfs://ns1/user/prod_xxx
hdfs dfsadmin -D fs.defaultFS=DClusterNmg4 -setSpaceQuota  666T hdfs://ns1/user/prod_xxx

對應的NN端:該shell的執行過程從Client到NN的主要流程如下
——1.DFSAdmin$SetSpaceQuotaCommand#run

——2.DistributedFileSystem#setQuota

——3.DFSClient#setQuota

——4.FSNamesystem#setQuota

——5.FSDirAttrOp#unprotectedSetQuota

——6.INodeDirectory#setQuota

DirectoryWithQuotaFeature特性分爲兩種:quota數值和使用量:

private QuotaCounts quota;
private QuotaCounts usage;

在設置quota時,直接向客戶端傳入的long型的數值設置到Feature中。

因此在quota由物理改爲邏輯時,setQuota部分無需更改。
quota會落地到fsimage,usage每次加載時動態計算,usage的值的計算邏輯需要更改。

2.count -q / -u 查看quota

hadoop fs -count -q 或 hadoop fs -count -u 命令客戶端代碼如下:

// Count.java

protected void processPath(PathData src) throws IOException {

  StringBuilder outputString = new StringBuilder();

  if (showQuotasAndUsageOnly || showQuotabyType) {

    QuotaUsage usage = src.fs.getQuotaUsage(src.path);

    outputString.append(usage.toString(

        isHumanReadable(), showQuotabyType, storageTypes));

  } else {

    ContentSummary summary = src.fs.getContentSummary(src.path);

    outputString.append(summary.toString(

        showQuotas, isHumanReadable(), excludeSnapshots));

  }

  if(displayECPolicy){

    ContentSummary summary = src.fs.getContentSummary(src.path);

    if(!summary.getErasureCodingPolicy().equals("Replicated")){

      outputString.append("EC:");

    }

    outputString.append(summary.getErasureCodingPolicy());

    outputString.append(" ");

  }

  outputString.append(src);

  out.println(outputString.toString());

}

主要計算邏輯直接對應到NN端同名方法。這裏會走兩個方法:
- src.fs.getQuotaUsage(src.path): 只查看 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA(物理空間)四個與設置quota有關時使用該方法
- src.fs.getContentSummary(src.path): 除了上述還會額外顯示 DIR_COUNT FILE_COUNT CONTENT_SIZE(已用邏輯空間)

注意:getQuotaUsage和getContentSummary會走不同的方法:
- getQuotaUsage:直接取DirectoryWithQuotaFeature中的usage字段,該值是一個緩存值,啓動後放在內存中。NN啓動時會計算所有子目錄求和所得。
- getContentSummary:每次重新計算

 

2.quota在hadoop中的限制作用

 

超過quota限制時,NameNode端會返回DSQuotaExceededException異常,如下:

// DSQuotaExceededException

public String getMessage() {

  String msg = super.getMessage();

  if (msg == null) {

    return "The DiskSpace quota" + (pathName==null?""" of " + pathName)

        " is exceeded: quota = " + quota

        " B = " + long2String(quota, "B"2)

        " but diskspace consumed = " + count

        " B = " + long2String(count, "B"2);

  else {

    return msg;

  }

}

搜索該異常的全部調用方如下
——1.DirectoryWithQuotaFeature

  • DirectoryWithQuotaFeature#verifyNamespaceQuota
  • DirectoryWithQuotaFeature#verifyStoragespaceQuota

——2.DFSOutputStream

  • DFSOutputStream#addBlock
    • dfsClient.namenode.addBlock
  • DFSOutputStream#newStreamForCreate
    • dfsClient.namenode.create

——3.DFSClient
DFSClient直接調用NameNode對應的方法,如下

  • DFSClient#createSymlink
    • namenode.createSymlink
  • DFSClient#callAppend
    • DFSOutputStream.newStreamForAppend
  • DFSClient#setReplication
    • namenode.setReplication
  • DFSClient#rename
    • namenode.rename
    • namenode.rename2
  • DFSClient#primitiveMkdir
    • namenode.mkdirs
  • DFSClient#setQuota
    • namenode.setQuota

可知,NN在以下情況會做quota校驗:

  • create
  • append
  • setReplication
  • rename
  • mkdirs
  • setQuota

其中校驗方法爲:

  • DirectoryWithQuotaFeature#verifyNamespaceQuota
  • DirectoryWithQuotaFeature#verifyStoragespaceQuota
static boolean isViolated(final long quota, final long usage,

    final long delta) {

  return quota >= 0 && delta > 0 && usage > quota - delta;

}

2.SpaceQuota改邏輯空間

1.改動

主要是兩方面改動:

  1. create/mv/setrep等操作時,會判斷存儲增量(delta),這裏將原有的物理空間判斷改爲邏輯空間判斷。其中更新quota的邏輯如下;

  2. DirectoryWithQuotaFeature中的usage變量初始化邏輯由物理空間改爲邏輯空間。

2.測試

以下爲SpaceQuota改成邏輯空間的測試。

#新建目錄並設quota

[hadoop@cluster-host1 quota]$ hadoop fs -mkdir /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop dfsadmin -setSpaceQuota 1g /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G             1 G            1            0                  0 /test_quota/quota_1g

#創建100m大小文件

[hadoop@cluster-host1 quota]$ dd if=/dev/zero of=100m bs=1M count=100

#上傳文件

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /test_quota/quota_1g/100m_1

#以-q和-u兩種方式查看quota

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           924 M            1            1              100 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           924 M /test_quota/quota_1g

#上傳第二個

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /test_quota/quota_1g/100m_2

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           824 M            1            2              200 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           824 M /test_quota/quota_1g

#上傳中間幾個文件,命令省略

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /quota_1g/100m_8

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /quota_1g/100m_9

 

put

#上傳第10個文件出現超額

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /test_quota/quota_1g/100m_10

put: The DiskSpace quota of /test_quota/quota_1g is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1077936128 B = 1.00 GB

#此時查看quota還剩邏輯空間124M,上傳100M文件卻出現超額,是因爲寫數據時要滿足最小塊大小(測試環境128M)。

#1077936128/1024/1024=1028

#此時查看quota

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           124 M            1            9              900 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           124 M /test_quota/quota_1g

mv

#第一次mv成功,只判斷文件大小,不會再判斷塊

[hadoop@cluster-host1 quota]$ hadoop fs -mv /test/100m /test_quota/quota_1g/100m_10

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G            24 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G            24 M            1           10             1000 M /test_quota/quota_1g

#第二次mv失敗,需要1100M,但是隻有1024M

[hadoop@cluster-host1 quota]$ hadoop fs -mv /test/100m /test_quota/quota_1g/100m_11

mv: The DiskSpace quota of /test_quota/quota_1g is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1153433600 B = 1.07 GB

#1153433600=1100M

setrep

[hadoop@cluster-host1 quota]$ hadoop fs -setrep 10 /test_quota/quota_1g

Replication 10 set/test_quota/quota_1g/100m_1

Replication 10 set/test_quota/quota_1g/100m_10

Replication 10 set/test_quota/quota_1g/100m_2

Replication 10 set/test_quota/quota_1g/100m_3

Replication 10 set/test_quota/quota_1g/100m_4

Replication 10 set/test_quota/quota_1g/100m_5

Replication 10 set/test_quota/quota_1g/100m_6

Replication 10 set/test_quota/quota_1g/100m_7

Replication 10 set/test_quota/quota_1g/100m_8

Replication 10 set/test_quota/quota_1g/100m_9

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G            24 M            1           10             1000 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G            24 M /test_quota/quota_1g

# 增加副本數不會受到限制,符合預期

du

[hadoop@cluster-host1 quota]$ hadoop fs -du -s -h /test_quota/quota_1g

1000 M  9.8 G  /test_quota/quota_1g

rm

[hadoop@cluster-host1 quota]$ hadoop fs -rm /test_quota/quota_1g/100m_10

Deleted /test_quota/quota_1g/100m_10

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           124 M            1            9              900 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           124 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -rm /test_quota/quota_1g/100m_9

Deleted /test_quota/quota_1g/100m_9

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           224 M            1            8              800 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           224 M /test_quota/quota_1g

注意:由於重啓nn後,quota中的usage會重新計算。在上一版本的測試中發現,重啓nn後,使用hadoop fs -count - u查看的剩餘量不準(按物理空間量算了)。所以這一部分必須測試。

重啓後查看

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

2020-05-20 17:19:22,567 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           224 M            1            8              800 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

2020-05-20 17:19:32,411 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           224 M /test_quota/quota_1g

#正常

cp

[hadoop@cluster-host1 quota]$ hadoop fs -cp /test/100m /test_quota/quota_1g/100m_9

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

2020-05-20 17:22:10,908 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           124 M            1            9              900 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           124 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -cp /test/100m /test_quota/quota_1g/100m_10

cp: The DiskSpace quota of /test_quota/quota_1g is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1077936128 B = 1.00 GB

子目錄測試

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           224 M            1            8              800 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /test_quota/quota_1g/a/100m_1

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           124 M            2            9              900 M /test_quota/quota_1g

[hadoop@cluster-host1 quota]$ hadoop fs -put 100m /test_quota/quota_1g/a/100m_2

put: The DiskSpace quota of /test_quota/quota_1g is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1077936128 B = 1.00 GB

EC測試

[hadoop@cluster-host1 quota]$ hadoop dfsadmin -setSpaceQuota 1g /test_quota/quota_1g_2

#設置Ec目錄

[hadoop@cluster-host1 quota]$ hdfs ec -setPolicy -path /test_quota/quota_1g_2/ec -policy RS-3-2-1024k

Set RS-3-2-1024k erasure coding policy on /test_quota/quota_1g_2/ec

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g_2

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G             1 G            2            0                  0 /test_quota/quota_1g_2

#寫EC文件

[hadoop@cluster-host1 quota]$ hadoop fs -put 200m /test_quota/quota_1g_2/ec/200m_1

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g_2

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           824 M /test_quota/quota_1g_2

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g_2

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           824 M            2            1              200 M /test_quota/quota_1g_2

#寫EC文件

[hadoop@cluster-host1 quota]$ hadoop fs -put 200m /test_quota/quota_1g_2/ec/200m_2

[hadoop@cluster-host1 quota]$ hadoop fs -count -q -v -h /test_quota/quota_1g_2

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

        none             inf             1 G           624 M            2            2              400 M /test_quota/quota_1g_2

[hadoop@cluster-host1 quota]$ hadoop fs -count -u -v -h /test_quota/quota_1g_2

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA PATHNAME

        none             inf             1 G           624 M /test_quota/quota_1g_2

進一步測試

需要對副本、EC文件,小於、等於、大於一個塊(塊組)的情況進一步測試。

4.可能的問題

1.fsimage中字段無需改動

2.歷史quota需要全部找到,在升級版本後,刷成邏輯空間

3.namequota與spacequota的比例

4.quota會按磁盤的type來做精細化限制,內部版本不作考慮。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章