Linux 平均負載高了怎麼辦?

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"01 uptime命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常我們發現系統變慢時,我們都會執行top或者uptime命令,來查看當前系統的負載情況,比如像下面,我執行了uptime,系統返回的了結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"[[email protected] ~]# uptime\n 08:31:49 up 27 min, 1 user, load average: 0.07, 0.04, 0.00"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前幾列的信息,相信大家都很熟悉,它們分別是當前時間、系統運行時間和正在登陸的用戶個數,最後一個就是系統平均負載的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"08:31:49 // 當前時間\nup 27 min // 系統運行時間\n1 user // 正在登錄用戶數\nload average: 0.07, 0.04, 0.00 // 平均負載的情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Load Average的三個數字,依次則是"},{"type":"text","marks":[{"type":"strong"}],"text":"過去1分鐘、5分鐘、15分鐘的平均負載"},{"type":"text","text":"。可以通過觀察這三個數字的大小,可以簡單判斷系統的負載是下降的趨勢還是上升的趨勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 "},{"type":"codeinline","content":[{"type":"text","text":"load average: 1.00, 5.00, 10.00"}]},{"type":"text","text":" 三個數字依次"},{"type":"text","marks":[{"type":"strong"}],"text":"增大"},{"type":"text","text":",則說明在過去的 1 分鐘系統的負載比過去 15 分鐘系統的負載"},{"type":"text","marks":[{"type":"strong"}],"text":"小"},{"type":"text","text":",表明系統的負載是"},{"type":"text","marks":[{"type":"strong"}],"text":"下降"},{"type":"text","text":"的趨勢。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 "},{"type":"codeinline","content":[{"type":"text","text":"load average: 10.00, 5.00, 1.00"}]},{"type":"text","text":" 三個數字依次"},{"type":"text","marks":[{"type":"strong"}],"text":"降低"},{"type":"text","text":",則說明在過去的 1 分鐘系統的負載比過去 15 分鐘系統的負載"},{"type":"text","marks":[{"type":"strong"}],"text":"大"},{"type":"text","text":",表明系統的負載是"},{"type":"text","marks":[{"type":"strong"}],"text":"上升"},{"type":"text","text":"的趨勢。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 "},{"type":"codeinline","content":[{"type":"text","text":"load average: 0.07, 0.04, 0.0"}]},{"type":"text","text":" 三個數字基本相同,或者相差不大, 表明系統的負載是平穩的。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以分析系統的負載情況,必須要看三個不同時間間隔的平均值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"02 平均負載概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平均負載很多人容易理解成單位時間內的 CPU 使用率,這是不正確的。平均負載確實與 CPU 使用率有關係,但不是直接的關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單來說,平均負載是指單位時間內,系統處於"},{"type":"text","marks":[{"type":"strong"}],"text":"可運行狀態"},{"type":"text","text":"和*"},{"type":"text","marks":[{"type":"italic"}],"text":"不可中斷狀態"},{"type":"text","text":"*的平均進程數,也就是"},{"type":"text","marks":[{"type":"strong"}],"text":"平均活躍進程數"},{"type":"text","text":",它和 CPU 使用率並沒有直接關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可運行狀態,是指正在使用 CPU 或者正在等待 CPU 的進程,也就是在 ps 命令看到的 R 狀態的進程。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不可中斷狀態,是指正處於內核關鍵流程中的進程,並且這些流程是不可以打斷的,比如最常見的等待硬件設備的 I/O 響應,也就是在 ps 命令看到的 D 狀態的進程。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,平均負載其實就是平均活躍進程數,可以更直觀的理解成單位時間內的活躍進程數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然平均的是活躍進程數,那麼最理想的,就是每個CPU上剛好運行着一個進程,這樣每個CPU就得到了充分利用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如當平均負載爲2時,意味着:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在只有 2 個 CPU 的系統上,意味着所有的 CPU 都剛好被完全佔用。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在4個CPU的系統上,意味着 CPU 有 50% 的空閒。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在只有 1 個 CPU 的系統中,則意味着有一半的進程競爭不到 CPU。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"03 平均負載爲多少時合理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在評判你當前的系統平均負載是否合理的時,"},{"type":"text","marks":[{"type":"strong"}],"text":"首先你要知道系統有幾個 CPU"},{"type":"text","text":",可以通過 lscpu 命令或者從文件 /proc/cpuinfo 中讀取"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"bash"},"content":[{"type":"text","text":"# lscpu 命令查看 CPU 個數\n[[email protected] ~]# lscpu\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nCPU(s): 4 # 這裏數字表示 CPU 個數 \n....\n\n# 從文件 /proc/cpuinfo 中查看 CPU 個數\n[[email protected] ~]# grep 'model name' /proc/cpuinfo | wc -l\n4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了 CPU 個數,我們就可以判斷出,當平均負載比 CPU 個數還大的時候,系統已經出現了過載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我再舉個例子,假設我們在一個單 CPU 系統上看到平均負載爲 1.73,0.60,7.98"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在過去 1 分鐘內,系統有 73% 的超載"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在過 15 分鐘內,有 698%的超載,從整體趨勢來看,系統的負載在降低。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"平均負載高於 CPU 數量 70% 的時候"},{"type":"text","text":",就應該分析排查負載高的問題了。一旦負載過高,就可能導致進程響應變慢,進而影響服務的正常功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"04 平均負載與 CPU 使用率"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們經常容易把平均負載和 CPU 使用率混淆,所以在這裏,我也做一個區分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再次說明下,平均負載是指單位時間內,處於可運行狀態和不可中斷狀態的進程數。所以,它不僅包括了"},{"type":"text","marks":[{"type":"strong"}],"text":"正在使用 CPU"},{"type":"text","text":" 的進程,還包括"},{"type":"text","marks":[{"type":"strong"}],"text":"等待 CPU"},{"type":"text","text":" 和"},{"type":"text","marks":[{"type":"strong"}],"text":"等待 I/O"},{"type":"text","text":" 的進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而 CPU 使用率,是單位時間內 CPU 繁忙情況的統計,跟平均負載並不一定完全對應。比如:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU 密集型進程,使用大量 CPU 會導致平均負載升高,此時這兩者是一致的;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O 密集型進程,等待 I/O 也會導致平均負載升高,但 CPU 使用率不一定很高;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大量等待 CPU 的進程調度也會導致平均負載升高,此時的 CPU 使用率也會比較高。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"05 平均負載升高分析命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們現在很清楚的知道導致平均負載高的情況,不只是看 CPU 的使用率,也要觀察系統 I/O 等待時間高不高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當發現平均負載升高時,可以使用 "},{"type":"codeinline","content":[{"type":"text","text":"mpstat"}]},{"type":"text","text":" 命令查看 CPU 的性能。"}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"# -P ALL 表示監控所有CPU,後面數字1表示間隔1秒後輸出一組數據\n$ mpstat -P ALL 1\nLinux 2.6.32-431.el6.x86_64 (lzc) \t11/05/2019 \t_x86_64_\t(2 CPU)\n\n07:51:45 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle\n07:51:50 PM all 42.90 0.00 49.39 0.41 0.00 4.56 0.00 0.00 2.74\n07:51:50 PM 0 44.38 0.00 48.67 0.41 0.00 2.86 0.00 0.00 3.68\n07:51:50 PM 1 41.57 0.00 49.80 0.40 0.00 6.43 0.00 0.00 1.81"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面發現 "}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU 的用戶層(%usr)使用率高達45%左右;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU 的系統層(%sys)使用率高達50%左右;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU 的 I/0 - 等待(%iowait)佔用率爲0.41%;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU 的空閒率(%idle)只有2~3%。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以推斷出是由於 CPU 使用率導致平均負載升高的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設只有 CPU 的I/0 等待(%iowait)佔用率高,CPU 用戶層和系統層使用率很輕鬆,那麼導致平均負載升高的原因就是 iowait 的升高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"判斷了是因爲 CPU 使用率升高還是 iowait 升高導致平均負載升高後,我們還需要定位是哪個進程導致的。可以用 "},{"type":"codeinline","content":[{"type":"text","text":"pidstat"}]},{"type":"text","text":" 來查詢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"# 間隔1秒後輸出一組數據,-u表示CPU指標\n$ pidstat -u 1\n08:07:55 PM PID %usr %system %guest %CPU CPU Command\n08:07:56 PM 4 0.00 1.00 0.00 1.00 0 ksoftirqd/0\n08:07:56 PM 9 0.00 1.00 0.00 1.00 1 ksoftirqd/1\n08:07:56 PM 11 0.00 16.00 0.00 16.00 0 events/0\n08:07:56 PM 12 0.00 20.00 0.00 20.00 1 events/1\n08:07:56 PM 616 7.00 6.00 0.00 13.00 1 pppoe\n08:07:56 PM 2745 6.00 6.00 0.00 12.00 1 pppoe"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以發現是 "},{"type":"codeinline","content":[{"type":"text","text":"events/0"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"events/1"}]},{"type":"text","text":" 內核進程 CPU 使用率非常高,所以可能這兩個進程導致平均負載升高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"06 小結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平均負載提供了一個快速查看系統整體性能的手段,反映了整體的負載情況。但只看平均負載本身,我們並不能直接發現,到底是哪裏出現了瓶頸。所以,在理解平均負載時,也要注意:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平均負載高有可能是 CPU 密集型進程導致的;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平均負載高並不一定代表 CPU 使用率高,還有可能是 I/O 更繁忙了;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當發現負載高的時候,你可以使用 "},{"type":"codeinline","content":[{"type":"text","text":"mpstat"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"pidstat"}]},{"type":"text","text":" 等工具,輔助分析負載的來源。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章