FOCUS : Shedding Light on the High Search Response Time in the Wild (學習記錄)

問題：解決在運維中，發現高搜索響應時間之後，使用機器學習算法發現異常

AIops 看的第一篇論文學習記錄。

對web應用中 high search response time 請求時間長影響因素的分析

然後使用決策樹算法得出影響結果以及分析影響的 type

通過分析影響因子KPI (Key performance indicator)的權重，然後針對性的進行優化

1. 第一部分提出三個問題：

① logs中的數據，有哪些屬性對HSRT有更高的影響權重‘

② 哪些 condition types 是更prevalent即更hot

③ attributes 對 condition types 的影響

過程：

a decision tree based classifier to identify HSRT conditions in search logs of each day; a clustering based condition type miner to combine similar HSRT conditions into one type, and find the prevalent condition types across days; and an attribute effect estimator to analyze the effect of each individual attribute on SRT within a prevalent condition type.

從第一個與的分析中，對 image 的傳輸優化對提升 SRT 有助，然後採用base64編碼，SRT提升了

2.第二部分：

① logs 的規定

對百度的 1% 的採樣，log組成：1)SRT and SRT components; 2) several attributes 認爲有影響的

1)SRT and SRT components

SRT can be broken down into four main components: Tserver, the server response time of the HTML file, which is recorded by servers; Tnet = t2 − t1 − Tserver , the network transmission time of the HTML file; Tbrowser = t3 − t2, the browser parsing time of the HTML; Tother = t4 − t3, the remaining time spent before the page is rendered, e.g., download time of images from image servers.

2) query attributes

browser engine、ISP、location 省份等、#image(html中嵌套的image數量)、Ads廣告鏈接、loading model 同步異步、background PVS (反應請求負載的，使用上線後30秒內的請求數，並用最大的進行歸一化處理)

CDF ：累積分佈函數，描述概率分佈

② HSRT and HSRT conditions

HSRT 定義爲 SRT(s) 超過 1 s 的

統計所有 attributes 值(有些橫座標的歸一化處理)下的 HSRT 比例如下：

* Challenges of identifying HSRT Conditions:

condition 可能是多個 attribute 的組合，但是attribute之間的關係比較複雜，比如;

三個挑戰：

1.之前基於一維的 (Fig 3) HSRT 比例分析不可行，因爲condition可能多個attribute組合。

2.attribute之間存在依賴：Fig.4(a)

3.避免condition 與 condition 之間存在 attribute 的重疊 {#images > 30}, {ads = yes}, and {#images > 20, ads = yes}

3.第三部分：Design

①Core Idea and System Overview

解決上面的三個挑戰

爲了找到需要的 conditions，問題轉換爲多維空間下，一個attribute爲一個維度，多維下的二分類問題。

選用決策樹的算法

FOCUS系統的總體原理如上圖：

1、首先是要決策樹算法，對每天的數據進行運算，找出 HSRT 的condition。

2、使用聚類的算法，condition type miner，來找出相似 HSRT 的conditions 的condition type。

3、使用 attribute effect estimator，分析一個 attribute 如何對 condition type 的 SRT與SRT components 產生影響

② Decision Tree Based Classifier

使用決策樹得到 HSRT 所有的 condition

1）Expressing Attribute splits

2) Evaluating splits:

3) Stopping Tree Growing

4) Assigning Labels

5) Identifying HSRT branching attribute conditions

③ Condition Type Miner

將很相似的 condition 歸類爲一個 condition type。聚類算法

不能根據具有相同的 attributes 的condition 歸爲一個 type，粒度太粗。

condition type 的標準：

(i) the same combination of attributes, 。attribute類型相同

(ii) the same value for each categoric attribute。分類屬性相同

(iii) similar intervals for each numeric attribute。數值屬性值相似

④ Attribute Effect Estimator

一個 condition type 由一個或多個 condition 組合成：C = {c1^c2^···ci ···^cn}

問題： attribute：Ci 如何影響 SRT 與 SRT components( Tnet 、 Tserber 等)？？？

方法：

1. 每次針對一個 condition type C，對他的其中一個 condition Ci 取反，

2. 計算並比較 HSRT的比例，計算 SRT components 的數據

第四部分： RESULTS

① 採集 1 到 31 號的數據

對比了兩種聚類算法的結果，對比了 condition 採集數目、滿足 HSRT condition的 HSRT數目/ HSRT數目、測試值中滿足 HSRT condition的 HSRT數目/ HSRT數目。三種數據對比分析，FOCUS 的要好。

② condition type：

之前提的其他幾種 attribute 影響不大，不在 condition 裏面。

③ Attribute Effects

分析 attribute 對 condition type 的影響，通過翻轉其中一個 attribute。

之後按照 HSRT% 進行排序，- 爲降低，+時間增長

④ Observations by Further Investigation

進一步的觀察:

提出問題：

1) Why does reducing #images increase Tserver, the time that servers prepare the result HTML (row 1, 2, 3, and 4 of Table IV)?

2) How do ads inflate SRT? Why do the pages with ads need more Tnet and Tbrowser (row 7)?

3) Why does WebKit engine perform better, especially greatly decreasing Tbrowser (row 5, 10, 11, and 12)?

4) It is natural that switching ISPs can affect network trans- mission time (Tnet), but why does switching to China Telecom reduce Tserver by over 20% (row 6, 8, and 9)?

得出解釋：

1）受歡迎的查詢多是圖像密集型的，而這種容易被 cache 起來

2）有廣告需要額外的下載與解析時間

3）WebKit 自身的強大性能與相關優化

4）ISP 的原因

第五部分：實踐中的HSRT優化

由 attribute 對 condition type 的影響可以得到，#image 有優化的潛力。

通過對 image 的BASE64編碼優化，得出下面的結果：

但是，HSRT 的condition type 改變了，變成了新的 condition types。

然後再次根據新的 condition types，判斷 attribute 對 condition type 的影響，再針對優化。

第六部分：

有用的參考論文：

類似的工作： Y. Chen, R. Mahajan, B. Sridharan, and Z.-L. Zhang, “A provider-side view of web search response time,” in SIGCOMM, ACM, 2013.

對比的聚類算法：J. Jiang, V. Sekar, et al., “Shedding light on the structure of internet video quality problems in the wild,” in CoNEXT, 2013.

FOCUS : Shedding Light on the High Search Response Time in the Wild (學習記錄)

1. 第一部分提出三個問題：