[轉帖]nginx後端節點的健康檢查

簡介

本文主要介紹nginx後端節點的健康檢查，在此之前我們先來介紹下nignx反向代理主要使用的模塊。

nginx原生模塊介紹

我們在使用nginx做反向代理都會使用到以下兩個模塊：
1.ngx_http_proxy_module
定義允許將請求傳遞到另一臺服務器。此模塊下常用指令如下：

proxy_pass
proxy_cache
proxy_connect_timeout
proxy_read_timeout
proxy_send_timeout
proxy_next_upstream

2.ngx_http_upstream_module
用於定義可由proxy_pass，fastcgi_pass等指令引用的服務器組。此模塊下常用指令如下：

upstream
server
ip_hash

默認負載均衡配置

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }
server {
    listen 80;

    location / {
        proxy_pass http://myapp1;
    }
}

}

此時nginx默認的負載均衡策略是輪詢外，還有其他默認參數，如下：

http {
    upstream myapp1 {
        server srv1.example.com weight=1 max_fails=1 fail_timeout=10;
        server srv2.example.com weight=1 max_fails=1 fail_timeout=10;
        server srv3.example.com weight=1 max_fails=1 fail_timeout=10;
    }
server {
    listen 80;
    proxy_send_timeout=60;
    proxy_connect_timeout=60;
    proxy_read_timeout=60;
    proxy_next_upstream=error timeout;

    location / {
        proxy_pass http://myapp1;
    }
}

}

其中：
1.故障轉移

Syntax: 	proxy_read_timeout time;
Default: 	
proxy_read_timeout 60s;
Context: 	http, server, location
定義從代理服務器讀取響應的超時。 僅在兩個連續的讀操作之間設置超時，而不是爲整個響應的傳輸。 如果代理服務器在此時間內未傳輸任何內容，則關閉連接。
Syntax: 	proxy_connect_timeout time;

Default: 	

proxy_connect_timeout 60s;

Context: 	http, server, location

定義與代理服務器建立連接的超時。 應該注意，此超時通常不會超過75秒。
Syntax: 	proxy_send_timeout time;

Default: 	

proxy_send_timeout 60s;

Context: 	http, server, location

設置將請求傳輸到代理服務器的超時。 僅在兩個連續的寫操作之間設置超時，而不是爲整個請求的傳輸。 如果代理服務器在此時間內未收到任何內容，則關閉連接
Syntax: 	proxy_next_upstream error | timeout | invalid_header | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | non_idempotent | off ...;

Default: 	

proxy_next_upstream error timeout;

Context: 	http, server, location

指定在何種情況下一個失敗的請求應該被髮送到下一臺後端服務器：

error      和後端服務器建立連接時，或者向後端服務器發送請求時，或者從後端服務器接收響應頭時，出現錯誤

timeout    和後端服務器建立連接時，或者向後端服務器發送請求時，或者從後端服務器接收響應頭時，出現超時

invalid_header  後端服務器返回空響應或者非法響應頭

http_500   後端服務器返回的響應狀態碼爲500

http_502   後端服務器返回的響應狀態碼爲502

http_503   後端服務器返回的響應狀態碼爲503

http_504   後端服務器返回的響應狀態碼爲504

http_404   後端服務器返回的響應狀態碼爲404

off        停止將請求發送給下一臺後端服務器

從以上幾個指令可以看出，在默認配置下，後端節點一旦出現error和timeout情況時，nginx會通過proxy_next_upstream進行故障轉移，將發往不健康節點的請求，自動轉移至健康節點。其中timeout設置和proxy_send_timeout time、proxy_connect_timeout time、proxy_read_timeout time有關。除了error、timeout，我們可以設置更詳細的觸發條件，如http_502、http_503等。

注意：只有在沒有向客戶端發送任何數據以前，將請求轉給下一臺後端服務器纔是可行的。也就是說，如果在傳輸響應到客戶端時出現錯誤或者超時，這類錯誤是不可能恢復的。

2.健康檢查

Syntax: 	server address [parameters];
Default: 	—
Context: 	upstream
max_fails=number   設定Nginx與服務器通信的嘗試失敗的次數。在fail_timeout參數定義的時間段內，如果失敗的次數達到此值，Nginx就認爲服務器不可用。此時在接下來的fail_timeout時間段，服務器不會再被嘗試。失敗的嘗試次數默認是1。設爲0就會停止統計嘗試次數，即不對後端節點進行健康檢查。認爲服務器是一直可用的。
fail_timeout=time  設定服務器被認爲不可用的時間段以及統計失敗嘗試次數的時間段。在這段時間中，服務器失敗次數達到指定的嘗試次數，服務器就被認爲不可用。

默認情況下，該超時時間是10秒。

以上有幾點需要解釋：
1.失敗次數中的失敗是怎麼定義的？
官網解釋是指由proxy_next_upstream，fastcgi_next_upstream，uwsgi_next_upstream，scgi_next_upstream，memcached_next_upstream和grpc_next_upstream指令定義，也是前面說的error、time、http_xxx狀態碼等。
2.如果mail_fail爲0，此時健康檢查無效。因此此時整個nginx，只會由proxy_next_upstream判斷，進行相關故障轉移。

小結

在使用nginx上述的兩個模塊由以下缺點：
1.fail_time內的失敗檢測，超時時間以系統設置爲主，效率低，等待超時影響性能；
2.後端一旦有問題，除後端禁用的fail_time時間段，其他時間nginx會把請求轉發給不健康節點的，然後再轉發給別的服務器，這樣以來就浪費了一次轉發。

因此除了上面介紹的nginx自帶模塊，還有一個更專業的模塊，來專門提供負載均衡器內節點的健康檢查的。這個就是淘寶技術團隊開發的nginx模塊。

nginx_upstream_check_module模塊

藉助淘寶技術團隊開發的nginx模快nginx_upstream_check_module來檢測後方realserver的健康狀態，如果後端服務器不可用，則會將其踢出upstream，所有的請求不轉發到這臺服務器。當期恢復正常時，將其加入upstream。

在淘寶自己的tengine上是自帶了該模塊的，大家可以訪問淘寶Tengine官網來獲取該版本的nginx，也可以到Gitbub
如果沒有使用淘寶的tengine的話，可以通過補丁的方式來添加該模塊到我們自己的nginx
中。

#打補丁
#注意不同版本對應的補丁
cd nginx-1.6.0
patch -p1 < ../nginx_upstream_check_module-master/check_1.5.12+.patch
 ./configure --user=nginx --group=nginx --prefix=/usr/local/nginx1.6 --sbin-path=/usr/local/nginx1.6 --conf-path=/usr/local/nginx1.6/nginx.conf --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --with-http_ssl_module --with-http_stub_status_module --with-http_gzip_static_module --with-http_gunzip_module --with-http_sub_module --with-pcre=/usr/local/src/nginx/pcre-8.36 --with-zlib=/usr/local/src/nginx/zlib-1.2.8 --add-module=/usr/local/src/nginx/ngx_cache_purge-2.1 --add-module=/usr/local/src/nginx/headers-more-nginx-module-master --add-module=/usr/local/src/nginx/nginx_upstream_check_module-master
make
不要執行make install命令
cd /usr/local/nginx1.6
備份命令
cp nginx nginx.bak

nginx -s stop

cp -r /usr/local/src/nginx/nginx-1.6.0/objs/nginx .

打完補丁後，可進行如下配置：

  http {
    upstream cluster {

        # simple round-robin
        server 192.168.0.1:80;
        server 192.168.0.2:80;

        check interval=5000 rise=1 fall=3 timeout=4000;

        #check interval=3000 rise=2 fall=5 timeout=1000 type=ssl_hello;

        #check interval=3000 rise=2 fall=5 timeout=1000 type=http;
        #check_http_send "HEAD / HTTP/1.0\r\n\r\n";
        #check_http_expect_alive http_2xx http_3xx;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://cluster;
        }

        location /status {
            check_status;

            access_log   off;
            allow SOME.IP.ADD.RESS;
            deny all;
       }
    }

}

其中：

Syntax:  check interval=milliseconds [fall=count] [rise=count] [timeout=milliseconds] [default_down=true|false] [type=tcp|http|ssl_hello|mysql|ajp] [port=check_port]
Default: 如果沒有配置參數，默認值是：interval=30000 fall=5 rise=2 timeout=1000 default_down=true type=tcp
Context: upstream
該指令可以打開後端服務器的健康檢查功能。指令後面的參數意義是：

interval：向後端發送的健康檢查包的間隔,單位爲毫秒。

fall(fall_count): 如果連續失敗次數達到fall_count，服務器就被認爲是down。

rise(rise_count): 如果連續成功次數達到rise_count，服務器就被認爲是up。

timeout: 後端健康請求的超時時間，單位毫秒。

default_down: 設定初始時服務器的狀態，如果是true，就說明默認是down的，如果是false，就是up的。默認值是true，也就是一開始服務器認爲是不可用，要等健康檢查包達到一定成功次數以後纔會被認爲是健康的。

type：健康檢查包的類型，現在支持以下多種類型：

tcp：簡單的tcp連接，如果連接成功，就說明後端正常。

ssl_hello：發送一個初始的SSL hello包並接受服務器的SSL hello包。

http：發送HTTP請求，通過後端的回覆包的狀態來判斷後端是否存活。

mysql: 向mysql服務器連接，通過接收服務器的greeting包來判斷後端是否存活。

ajp：向後端發送AJP協議的Cping包，通過接收Cpong包來判斷後端是否存活。

port: 指定後端服務器的檢查端口。你可以指定不同於真實服務的後端服務器的端口，比如後端提供的是443端口的應用，你可以去檢查80端口的狀態來判斷後端健康狀況。默認是0，表示跟後端server提供真實服務的端口一樣。該選項出現於Tengine-1.4.0。
Syntax: check_keepalive_requests request_num

Default: 1

Context: upstream

該指令可以配置一個連接發送的請求數，其默認值爲1，表示Tengine完成1次請求後即關閉連接。
Syntax: check_http_send http_packet

Default: "GET / HTTP/1.0\r\n\r\n"

Context: upstream

該指令可以配置http健康檢查包發送的請求內容。爲了減少傳輸數據量，推薦採用"HEAD"方法。
當採用長連接進行健康檢查時，需在該指令中添加keep-alive請求頭，如："HEAD / HTTP/1.1\r\nConnection: keep-alive\r\n\r\n"。 同時，在採用"GET"方法的情況下，請求uri的size不宜過大，確保可以在1個interval內傳輸完成，否則會被健康檢查模塊視爲後端服務器或網絡異常。
Syntax: check_http_expect_alive [ http_2xx | http_3xx | http_4xx | http_5xx ]

Default: http_2xx | http_3xx

Context: upstream

該指令指定HTTP回覆的成功狀態，默認認爲2XX和3XX的狀態是健康的。

例子如下：

server{
        listen 80;
    upstream test{
    	server 192.168.3.12:8080 weight=5 max_fails=3 fail_timeout=10s;
   		server 192.168.3.13:8080 weight=5 max_fails=3 fail_timeout=10s;
   		 
    	check interval=5000 rise=1 fall=3 timeout=4000 type=http default_down=false;
  		check_http_send "HEAD /test.jsp HTTP/1.0\r\n\r\n";
  	 	check_http_expect_alive http_2xx http_3xx;
   }

    location / {

            proxy_set_header X-Real-IP        $remote_addr;
            proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
            proxy_pass http://test;
            proxy_next_upstream error timeout  http_500 http_502 http_503;
    }
#後端階段健康狀態監控
    location /status {
            check_status;
            access_log off;
    }

}

以上我們同時使用了nginx原生的及淘寶的健康檢查模塊，但是淘寶的間隔時是毫秒級，而且可以自定義監控url，定製監控頁，響應速度快，比原生的敏感度要高。

</article>

[轉帖]nginx後端節點的健康檢查

簡介

nginx原生模塊介紹

默認負載均衡配置

小結

nginx_upstream_check_module模塊

不要執行make install命令

備份命令

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

達夢dimp備份恢復數據庫remap以及查看錶大小等

[轉帖]WEB請求處理三：Servlet容器請求處理

[轉帖]Java 內存分區之什麼是 CCS區 Compressed Class Space 類壓縮空間

[轉帖]帶你讀懂Spring 事務——事務的傳播機制

sysbench 多種測試數據庫一起編譯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結