解Bug之路-Nginx 502 Bad Gateway

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"解Bug之路-Nginx 502 Bad Gateway"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實證明,讀過Linux內核源碼確實有很大的好處,尤其在處理問題的時刻。當你看到報錯的那一瞬間,就能把現象/原因/以及解決方案一股腦的在腦中閃現。甚至一些邊邊角角的現象都能很快的反應過來是爲何。筆者讀過一些Linux TCP協議棧的源碼,就在解決下面這個問題的時候有一種非常流暢的感覺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Bug現場"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,這個問題其實並不難解決,但是這個問題引發的現象倒是挺有意思。先描述一下現象吧,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者要對自研的dubbo協議隧道網關進行壓測(這個網關的設計也挺有意思,準備放到後面的博客裏面)。先看下壓測的拓撲吧:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/28/28d1254f5c7ec005a39cc45f517db7cc.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了壓測筆者gateway的單機性能,兩端僅僅各保留一臺網關,即gateway1和gateway2。壓到一定程度就開始報錯,導致壓測停止。很自然的就想到,網關扛不住了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"網關的情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"去Gateway2的機器上看了一下,沒有任何報錯。而Gateway1則有大量的502報錯。502是Bad Gateway,Nginx的經典報錯,首先想到的就是Gateway2不堪重負被Nginx在Upstream中踢掉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7e/7e8b5a25d748807eda85935fc5f26304.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,就先看看Gateway2的負載情況把,查了下監控,發現Gateway2在4核8G的機器上只用了一個核,完全看不出來有瓶頸的樣子,難道是IO有問題?看了下小的可憐的網卡流量打消了這個猜想。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Nginx所在機器CPU利用率接近100%"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這時候,發現一個有意思的現象,Nginx確用滿了CPU!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8c/8c4616c1df8e5e0bb237ce5ae6b0e2f1.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再次壓測,去Nginx所在機器上top了一下,發現Nginx的4個Worker分別佔了一個核把CPU喫滿-_-!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bc/bc30b730e722c646d768b740ab9ce56a.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼,號稱性能強悍的Nginx竟然這麼弱,說好的事件驅動\\epoll邊沿觸發\\純C打造的呢?一定是用的姿勢不對!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"去掉Nginx直接通信毫無壓力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然猜測是Nginx的瓶頸,就把Nginx去掉吧。Gateway1和Gateway2直連,壓測TPS裏面就飆升了,而且Gateway2的CPU最多也就吃了2個核,毫無壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2ea751ede0b96ba2e6623390b880f06.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"去Nginx上看下日誌"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於Nginx機器權限並不在筆者手上,所以一開始沒有關注其日誌,現在就聯繫一下對應的運維去看一下吧。在accesslog裏面發現了大量的502報錯,確實是Nginx的。又看了下錯誤日誌,發現有大量的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"Cannot assign requested address\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於筆者讀過TCP源碼,一瞬間就反應過來,是端口號耗盡了!由於Nginx upstream和後端Backend默認是短連接,所以在大量請求流量進來的時候回產生大量TIME_WAIT的連接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9d/9d9dcb625c7f5c6c3a8787f8be074a27.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而這些TIME_WAIT是佔據端口號的,而且基本要1分鐘左右才能被Kernel回收。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/08/08c686a100e227f495a69f375ebd6aeb.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"cat /proc/sys/net/ipv4/ip_local_port_range\n32768 61000\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也就是說,只要一分鐘之內產生28232(61000-32768)個TIME_WAIT的socket就會造成端口號耗盡,也即470.5TPS(28232/60),只是一個很容易達到的壓測值。事實上這個限制是Client端的,Server端沒有這樣的限制,因爲Server端口號只有一個8080這樣的有名端口號。而在"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"upstream中Nginx扮演的就是Client,而Gateway2就扮演的是Nginx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8ec43a1080e17eb3d390a6f5da7b9c3e.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼Nginx的CPU是100%"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而筆者也很快想明白了Nginx爲什麼喫滿了機器的CPU,問題就出來端口號的搜索過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/30/30f6c6948f8e3f01664a0650ce3ff98f.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讓我們看下最耗性能的一段函數:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"int __inet_hash_connect(...)\n{\n // 注意,這邊是static變量\n static u32 hint;\n // hint有助於不從0開始搜索,而是從下一個待分配的端口號搜索\n u32 offset = hint + port_offset;\n .....\n inet_get_local_port_range(&low, &high);\n // 這邊remaining就是61000 - 32768\n remaining = (high - low) + 1\n ......\n for (i = 1; i <= remaining; i++) {\n port = low + (i + offset) % remaining;\n /* port是否佔用check */\n ....\n goto ok;\n }\n .......\nok:\n hint += i;\n ......\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看上面那段代碼,如果一直沒有端口號可用的話,則需要循環remaining次才能宣告端口號耗盡,也就是28232次。而如果按照正常的情況,因爲有hint的存在,所以每次搜索從下一個待分配的端口號開始計算,以個位數的搜索就能找到端口號。如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f4/f47a55ad2b9e8f1e8f09180378652c46.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以當端口號耗盡後,Nginx的Worker進程就沉浸在上述for循環中不可自拔,把CPU喫滿。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/78cd3c4aec6fc80c4648feb878eae716.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼Gateway1調用Nginx沒有問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很簡單,因爲筆者在Gateway1調用Nginx的時候設置了Keepalived,所以採用的是長連接,就沒有這個端口號耗盡的限制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5b/5bc9409799cce00d828b44e5660ff734.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Nginx 後面有多臺機器的話"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於是因爲端口號搜索導致CPU 100%,而且但凡有可用端口號,因爲hint的原因,搜索次數可能就是1和28232的區別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5d/5d4cc48785a8e63fe6d19775ad5b29c5.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲端口號限制是針對某個特定的遠端server:port的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,只要Nginx的Backend有多臺機器,甚至同一個機器上的多個不同端口號,只要不超過臨界點,Nginx就不會有任何壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/31/319e4d9c7d0d724048be18c18e961111.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"把端口號範圍調大"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比較無腦的方案當然是把端口號範圍調大,這樣就能抗更多的TIME_WAIT。同時將tcp_max_tw_bucket調小,tcp_max_tw_bucket是kernel中最多存在的TIME_WAIT數量,只要port範圍 - tcp_max_tw_bucket大於一定的值,那麼就始終有port端口可用,這樣就可以避免再次到調大臨界值得時候繼續擊穿臨界點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"cat /proc/sys/net/ipv4/ip_local_port_range\n22768 61000\ncat /proc/sys/net/ipv4/tcp_max_tw_buckets\n20000\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"開啓tcp_tw_reuse"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個問題Linux其實早就有了解決方案,那就是tcp_tw_reuse這個參數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"echo '1' > /proc/sys/net/ipv4/tcp_tw_reuse\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實上TIME_WAIT過多的原因是其回收時間竟然需要1min,這個1min其實是TCP協議中規定的2MSL時間,而Linux中就固定爲1min。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT\n * state, about 60 seconds */\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2MSL的原因就是排除網絡上還殘留的包對新的同樣的五元組的Socket產生影響,也就是說在2MSL(1min)之內重用這個五元組會有風險。爲了解決這個問題,Linux就採取了一些列措施防止這樣的情況,使得在大部分情況下1s之內的TIME_WAIT就可以重用。下面這段代碼,就是檢測此TIME_WAIT是否重用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"__inet_hash_connect\n |->__inet_check_established\nstatic int __inet_check_established(......)\n{\n ...... \n /* Check TIME-WAIT sockets first. */\n sk_nulls_for_each(sk2, node, &head->twchain) {\n tw = inet_twsk(sk2);\n // 如果在time_wait中找到一個match的port,就判斷是否可重用\n if (INET_TW_MATCH(sk2, net, hash, acookie,\n saddr, daddr, ports, dif)) {\n if (twsk_unique(sk, sk2, twp))\n goto unique;\n else\n goto not_unique;\n }\n }\n ......\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而其中的核心函數就是twsk_unique,它的判斷邏輯如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"int tcp_twsk_unique(......)\n{\n ......\n if (tcptw->tw_ts_recent_stamp &&\n (twp == NULL || (sysctl_tcp_tw_reuse &&\n get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {\n // 對write_seq設置爲snd_nxt+65536+2\n // 這樣能夠確保在數據傳輸速率<=80Mbit/s的情況下不會被迴繞 \n tp->write_seq = tcptw->tw_snd_nxt + 65535 + 2\n ......\n return 1;\n }\n return 0; \n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這段代碼邏輯如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b9/b91b193895cde2552e38e1bdfd171cc1.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開啓了tcp_timestamp以及tcp_tw_reuse的情況下,在Connect搜索port時只要比之前用這個port的TIME_WAIT狀態的Socket記錄的最近時間戳>1s,就可以重用此port,即將之前的1分鐘縮短到1s。同時爲了防止潛在的序列號衝突,直接將write_seq加上在65537,這樣,在單Socket傳輸速率小於80Mbit/s的情況下,不會造成序列號重疊(衝突)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時這個tw_ts_recent_stamp設置的時機如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7e/7e294762c8faf9731235fd5da92df590.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以如果Socket進入TIME_WAIT狀態後,如果一直有對應的包發過來,那麼會影響此TIME_WAIT對應的port是否可用的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓了這個參數之後,由於從1min縮短到1s,那麼Nginx單臺對單Upstream可承受的TPS就從原來的470.5TPS(28232/60)一躍提升爲28232TPS,增長了60倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果還嫌性能不夠,可以配上上面的端口號範圍調大以及tcp_max_tw_bucket調小繼續提升tps,不過tcp_max_tw_bucket調小可能會有序列號重疊的風險,畢竟Socket不經過2MSL階段就被重用了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"不要開啓tcp_tw_recycle"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓tcp_tw_recyle這個參數會在NAT環境下造成很大的影響,建議不開啓,具體見筆者的另一篇博客:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"https://my.oschina.net/alchemystar/blog/3119992\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Nginx upstream改成長連接"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實上,上面的一系列問題都是由於Nginx對Backend是短連接導致。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Nginx從 1.1.4 開始,實現了對後端機器的長連接支持功能。在Upstream中這樣配置可以開啓長連接的功能:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"upstream backend {\n server 127.0.0.1:8080;\n# It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.\n keepalive 32; \n keepalive_timeout 30s; # 設置後端連接的最大idle時間爲30s\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣前端和後端都是長連接,大家又可以愉快的玩耍了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4d/4dd9e95990b7ab6a492d899ae42b256a.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"由此產生的風險點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於對單個遠端ip:port耗盡會導致CPU喫滿這種現象。所以在Nginx在配置Upstream時候需要格外小心。假設一種情況,PE擴容了一臺Nginx,爲防止有問題,就先配一臺Backend看看情況,這時候如果量比較大的話擊穿臨界點就會造成大量報錯(而應用本身確毫無壓力,畢竟臨界值是470.5TPS(28232/60)),甚至在同Nginx上的非此域名的請求也會因爲CPU被耗盡而得不到響應。多配幾臺Backend/開啓tcp_tw_reuse或許是不錯的選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用再強大也還是承載在內核之上,始終逃不出Linux內核的樊籠。所以對於Linux內核本身參數的調優還是非常有意義的。如果讀過一些內核源碼,無疑對我們排查線上問題有着很大的助力,同時也能指導我們避過一些坑!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"公衆號"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關注筆者公衆號,獲取更多幹貨文章:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/046ecbf6ada994d1dade8690048697a9.jpeg","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章