性能調試---(三)CPU性能分析

2005年04月05日

打印自: unix中文寶庫

地址: http://www.douzhe.com/article/article.php/634

性能調試---(三)CPU性能分析

1:CPU的體系結構和工作原理

2:操作系統和進程

3:衡量CPU閒忙程度的指標

4:CPU資源成爲系統性能的瓶頸的徵兆

5:哪些進程是佔用CPU資源的大戶?

6:利用SAR工具分析CPU的利用率

7:利用SAR工具分析運行進程隊列長度

8:利用SAR工具分析系統調用

9:利用time命令測試某個命令和程序的執行效率

10:利用top命令查看最耗CPU資源的進程

11:利用uptime命令查看系統整體情況

12:利用GlancePlus分析系統CPU資源利用率

13:對CPU需求密集型系統的性能調試

CPU的體系結構和工作原理

我們所說的CPU一般是指微處理器，即Microprocessor，一般地，一個CPU的主要組成部分爲：

CPU(central processing unit)

cache：cache就是高速內存，它的存取時間一般是10-20微秒(ns)，這樣，CPU可以在一個時鐘週期內訪問一次cache；而一般的內存的存取時間爲80-90微秒(ns)，它的大小對CPU的性能有很大的影響。

TLB(translation lookaside boffer)：TLB是高速cache，它用於存放最近訪問的虛擬地址和與其對應的物理地址對，這樣TLB將可以把虛擬地址轉換爲物理地址。TLB是內存中系統轉換表的一個子集；TLB通常是指向一個內存頁面，而不是一個內存地址；它的大小對CPU的性能有很大的影響。

coprocessor

不同的CPU，一般有不同的時鐘頻率和高速緩存容量。

CPU在一次時鐘週期內一般可以從高速緩存內取到一個指令並執行它。因此，從理論上說，只要CPU的主頻越快，單位時間內所能執行的指令則越多。目前，有些CPU可以在一個時鐘週期內執行多條指令，如PA8500可以執行4條指令。

高速緩存的大小會制約CPU的執行效率，雖然CPU主頻很快，但它取不到數據，則只有空運行。因此，高速緩存的大小很重要；高速緩存又分數據高速緩存和指令高速緩存，分別存放從內存預先取來的即將執行的數據和指令單元。

虛擬尋址

一般，系統中的虛擬地址空間要比物理地址空間大得多，例如，如果系統是64位的，則理論上，它的尋址空間可以達到2的64次冪(2**64=18,447PB)，但由於受費用的因素的影響，實際上的物理內存最大隻有十幾GB的內存。

每個進程都有自己的唯一虛擬地址空間，然而，進程的運行必須把虛擬地址映射到物理地址，這需要TLB、高速緩存和內存三者的配合。如果需要的信息不在內存，則導致一個頁面錯。

流水線(Pipelining)

TLB和高速緩存試圖在一個時鐘週期內給CPU提供它所需的信息，然而，這個過程是100%的利用率，對CPU來說，它必須先用一個時鐘週期去取下一個指令，再一個時鐘週期去執行這條指令，這樣，CPU的利用率也只有50%。爲了讓CPU更忙，通常的做法是，採用流水線的方法。如PA8500是採用7個步驟的流水線。

操作系統和進程

HP-UX一個多用戶、多任務的UNIX操作系統。它的性能依賴於用戶數的多少、用戶任務的類型、硬/軟件件的配置。

HP－UX有兩種運行級別：

用戶級：系統用戶可以與操作系統進行交互操作，如運行應用和系統命令。用戶級通過系統調用接口訪問內核級。

內核級：操作系統自動運行一些功能，它們主要對硬件進行操作。

在操作系統中，用戶程序是以進程方式運行。進程的狀態有以下幾種：

SRUN

SSLEEP

SZOMB

SIDL

SSTOP

CPU的調度

一旦進程所需的數據調入內存後，它將等待CPU調度者來分配CPU時間。一般，在HP-UX中，每個進程都可以得一個固定的時間片來運行，這個時間片的長度爲十分之一秒(1/10秒)。

由於HP-UX是一個多任務的操作系統，它需要一種手段來進程的執行次序，這就是中斷。在系統中，時鐘中斷處理器是用來處理時鐘中斷的系統軟件。具體地說，它將收集系統和accounting statistics and does context switching.系統性能也與這種中斷髮生的頻率有關。

進程何優先級

每個進程都有自己的優先級；

實時優先級：-32~127，一個進程如果想以實時優先級運行，則必須用命令#rtprio來設置；

分時系統優先級：128～177；

分時用戶優先級：178～251；

優先級：252～255 are used by the system as virtual memory management priorities for process deactivation.

分時進程在初始優先級是由系統分配的，爲一個定值。用戶可以通過改變進程的nice值來改變分時進程的優先級。因爲進程會隨着它的執行，將以nice值來降低它的優先級，當它在等待執行時，又將以nice值來增加其優先級。nice值的系統缺值爲20。

在系統性能分析過程中，我關心不僅僅在完成一個進程耗時多少，還包括時間花在哪以及它的時間多少。

衡量CPU閒忙程度的指標

要分析系統的CPU資源是否夠的前提誰佔用了CPU資源，佔用了多少，時間多長。下面是一些衡量CPU閒忙程度的經用指標：

1)用戶使用CPU的情況

CPU運行常規用戶進程

CPU運行niced process

CPU運行實時進程

2)系統使用CPU的情況

用於系統調用

用於I/O管理：中斷和驅動

用於內存管理：paging and swapping

用於進程管理：context switch and process start

3)WIO：由於進程等待I/O而使CPU處於空閒狀態的比率，這些I/O主要指block I/O,raw I/O,VM paging/swapins；

4)CPU的空閒率，即除了上面的WIO以外的空閒情況；

5)CPU用於上下文交換的比率(Context Switch CPU utilization)

6)nice

7)real-time

8)運行進程隊列的長度，即處於可運行狀態的進程個數的大小，不過我們關心的是這些在等待CPU調度執行時所花的時間；

9)平均負載(load average)

CPU資源成爲系統性能的瓶頸的徵兆

CPU就像人的大腦，完成各種交給它的任務。如果任務太多，CPU就要忙不過來，它的運行效率就要下降。就像人生病會有一典型症狀一樣，當CPU資源成爲系統性能的瓶頸時，它也有一些典型的症狀：

很慢的響應時間(slow response time)

CPU空閒時間爲零(zero percent idle CPU)

過高的用戶佔用CPU時間(high percent user CPU)

過高的系統佔用CPU時間(high percent system CPU)

長時間的有很長的運行進程隊列(large run queue size sustained over time)

processes blocked on prority

必須注意的是，如果系統出現上面的這些症狀並不能說一定是由於CPU資源不夠，事實，有些症狀的出現很可能是由於其他資源的不足而引起，如內存不夠時，CPU會忙內存管理的事，這時從表面上， CPU的利用是100%，甚至顯得不夠，如果據此就簡單地認爲增加CPU就可以解決問題是大錯特錯了。

因此，還是那句話，必須用不同的工具、從不同的方面對系統進行分析後，才能做出結論，即使這樣，經驗將起到不可替代的作用。

哪些進程是佔用CPU資源的大戶?

在操作系統中，並不是所有的進程都以同樣的方式使用CPU資源。通常情況下，有些進程需要比其他進程更多的CPU時間片才能順利地完成任務。下面是一些典型的佔用CPU資源的大戶：

進程創建(process creation)

終端字符進程(teminal character processes(MUX- and LAN-based)

計算密集型進程和實時進程

X-終端和X-服務器進程(X-terminals and X-servers)

利用SAR工具分析CPU的利用率

利用SAR進行CPU的利用率分析的命令形式：

#sar -u，這時數據是通過sa1在後臺定時生成；

#sar -u 5 100，每隔5秒取樣一次，共取100次；

SAR -u:Report CPU utilization (the default); portion of time running in one of several modes. On a multi-processor system, if the -M option is used together with the -u option, per-CPU utilization as well as the average CPU utilization of all the processors are reported. If the -M option is not used, only the average CPU utilization of all the processors is reported:

cpu: cpu number (only on a multi-processor system with the -M option);

%usr: user mode;

%sys: system mode;

%wio: idle with some process waiting for I/O (only block I/O, raw I/O, or VM pageins/swapins indicated);

%idle: otherwise idle;

對結果的分析

首先，我們看%idle列的值，如果爲接近零，則再看對應%wio列的值，如果這列的大於7，則表明系統的磁盤或其他I/O可能有問題，需要進一步的分析：

用iostat命令分析各個磁盤的傳輸閒忙狀況，如#iostat -t 5 2，每隔5秒取樣一次，共取2次；

用sar -d命令分析各塊設備(磁盤、磁帶)活動情況；

用sar -b命令分析系統的緩存的活動情況；

用sar -w命令分析進程的deactivation/reactivation and switching activities of the system;

如果%idle列很小，而對應的%wio列的值也很小，這時，我們查看%usr列和%sys列的值。如果%usr列的值很大，說明有用戶進程佔用很多CPU時間；如果%sys列的值很大，則說明系統管理方面花了很多時間。需要進一步的分析：

用GlancePlus對佔用CPU時間最大的進程進行單獨分析，爲什麼它會佔用如此多的CPU時間。

如果%sys列的值很大，可以用SAR -C命令對系統調用進行進一步分解，看這些系統調用主要是做些什麼。同時，還必須分析是否有其他瓶頸，如paging也會引起%sys的值很大，這時，可以用sar -q查看系統的運行進程隊列長度，也可以用GlancePlus和vmstat查看內存的使用情況；

利用SAR工具分析運行進程隊列長度

利用SAR進行運行進程隊列長度分析的命令形式：

#sar -q，這時數據是通過sa1在後臺定時生成；

#sar -q 5 100，每隔5秒取樣一次，共取100次；

SAR -q: Report average queue length while occupied, and percent of time occupied. On a multi-processor machine, if the -M option is used together with the -q option, the per-CPU run queue as well as the average run queue of all the processors are reported. If the -M option is not used, only the average run queue information of all the processors is reported:

cpu: cpu number (only on a multi-processor system with the -M option);

runq-sz: Average length of the run queue(s) of processes (in memory and runnable);

%runocc: The percentage of time the run queue(s) were occupied by processes (in memory and runnable);

swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);

%swpocc: The percentage of time the swap queue of runnable processes (processes swapped out but ready to run) was occupied.

對結果的分析：

這些數據越小越好。

如果runq-sz大於4，或者%swapocc大於5時，則表明系統的CPU或內存可能有問題，需要進一步的分析：

用sar -u命令分析CPU的使用情況；

用sar -w命令分析進程的deactivation/reactivation and switching activities of the system;

也可以用GlancePlus；

利用SAR工具分析系統調用

利用SAR進行系統調用分析的命令形式：

#sar -c，這時數據是通過sa1在後臺定時生成；

#sar -c 5 100，每隔5秒取樣一次，共取100次；

SAR -c: Report system calls:

scall/s: Number of system calls of all types per second;

sread/s: Number of read() and/or readv() system calls per second;

swrit/s: Number of write() and/or writev() system calls per second;

swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);

fork/s: Number of fork() and/or vfork() system calls per second;

exec/s: Number of exec() system calls per second;

rchar/s: Number of characters transferred by read system calls block devices only) per second;

wchar/s: Number of characters transferred by write system calls (block devices only) per second.

對結果的分析：

如果scall/s列的值很大，那麼這麼多的系統調用的原因就必須仔細分析了。

我們可以查看fork/s和exec/s列的值，看看系統是否在創建大量新的進程。

利用time命令測試某個命令和程序的執行效率

我們可以利用time命令來測試一個命令的執行效率，語法爲：

time command

command is executed. Upon completion, time prints the elapsed time during the command, the time spent in the system, and the time spent executing the command. Times are reported in seconds.

Execution time can depend on the performance of the memory in which the program is running.

當我們覺得某個進程的性能不好時，最簡單的方法就是利用time命令來查看一下進程執行時它的時間分佈情況，然後再用其他工具進一步分析。

利用top命令查看最耗CPU資源的進程

我們可以利用top命令來查看最耗CPU資源的進程。top命令還會根據進程佔用CPU資源的多少而動態改變。

它的語法爲：

top [-s time] [-d count] [-q] [-u] [-h] [-n number]

其中各選項的含義爲：

-s time: 屏幕刷新的時間間隔time，缺省爲5秒；

-d count: 屏幕刷新count次後，top命令自己也退出；

-q: This option runs the top program at the same priority as if it is executed via a nice -20 command so that it will execute faster (see nice(1)). This can be very useful in discovering any system problem when the system is very sluggish. This option is accessibly only to users who have appropriate privileges.

-u: User ID (uid) numbers are displayed instead of usernames. This improves execution speed by eliminating the additional time required to map uid numbers to user names.

-h: Hides the individual CPU state information for systems having multiple processors. Only the average CPU status will be displayed.

-n number: Show only number processes per screen. Note that this option is ignored if number is greater than the maximum number of processes that can be displayed per screen.

在top命令運行時，我們可用以下幾個快捷鍵來翻屏：

j: 向前翻；

k: 向後翻；

t: 回到第一頁；

對結果的分析：

通過top命令，我們可以快速瞭解到目前系統的CPU資源使用情況，尤其是佔用CPU資源最多的進程是我們必須關注的對象。

我們通過RES(the current size of the process resident in memory)列可以知道每個進程佔用內存的數量。

我們通過NICE列可以知道系統是否使用NICE值來調節該進程的工作負載平衡。

利用uptime命令查看系統整體情況

uptime prints the current time, the length of time the system has been up, the number of users logged on to the system, and the average number of jobs in the run queue over the last 1, 5, and 15 minutes.

w is linked to uptime and prints the same output as uptime -w, displaying a summary of the current activity on the system.

它的語法爲：

uptime [-hlsuw] [user]

w [-hlsuw] [user]

其中各選項的含義爲：

-h: Suppress the first line and the heading line. This option should not be used with the -u option. This option assumes the use of the -w option to uptime.

-l: Use long output. This option assumes the use of the -w option to uptime.

-s: Use the short form of output for displaying terminal information. The terminal name is abbreviated; the login time and CPU times are suppressed.

-u: Print only the first line describing the overall state of the system. This is the default for the uptime command.ormation for systems having multiple processors. Only the average CPU status will be displayed.

-w: Print a summary of the current activity on the system for each user. This is the default for the w command.

利用GlancePlus分析系統CPU資源利用率

利用HP的GlancePlus工具可以對進程的整體情況和單獨的某個進程都詳細分析。

1)對CPU的整體使用情況的分析：

進入GlancePlus；

按?鍵進入聯機幫助界面；

按c鍵進入CPU的詳細界面；

按b鍵表示向後翻頁，按f鍵表示向前翻頁；

通過CPU Detail Screen，我們可以知道CPU時間的分佈情況，用戶用了多少，系統用了多少等。

2)對單個進程的CPU資源佔用情況分析：

進入GlancePlus；

按?鍵進入聯機幫助界面；

按g鍵進入進程列表界面；

按s鍵進入進程選擇界面，通常最忙的進程會作爲缺省進程；

輸入想查看的進程號；

按b鍵表示向後翻頁，按f鍵表示向前翻頁；

在對單個進程的分析中，我們通常要關注以下幾個值：

CPU Usage;

User CPU;

System CPU;

Priority;

Logical and Physical Reads and Writes;

Total RSS/VSS;

blocked on(通過按shift+>來得到);

對CPU需求密集型系統的性能調試

1)基於硬件的方法：

升級到更快的CPU；

升級到更大的高速緩存；

增加CPU個數；

把應用分佈到多個系統中；

使用無盤結點；

增加浮點處理器；

2)基於軟件的方法：

在不是高峯時間運行批處理；

Nice umimportant application;

使用rtpio命令來幫助重要的應用；

使用plock命令來幫助重要的應用；

Turn off system accounting;

Consider using Taskbroker or DCE;

優化應用；

考慮使用進程資源管理器(Process Resource Manager)，不過PRM只有在HP-UX平臺上有。

s_sword

發佈了21 篇原創文章 · 獲贊 1 · 訪問量 9萬+

私信關注

性能調試---(三)CPU性能分析

Glance監控命令在HP UX上的使用

性能調試---(四)內存性能分析

數據庫設備與存儲空間管理

RS6000 更換硬盤的過程

ODS在電信行業的應用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結