postgresql socket讀數據返回-1

原創

2023-12-27 14:51

如下所示：

    {
        n = secure_raw_read(port, ptr, len);   // pg的socket讀是非阻塞讀，所以返回-1不影響，後面等到socket可讀之後繼續讀。見下文socket返回值解釋
        waitfor = WL_SOCKET_READABLE;
    }

    /* In blocking mode, wait until the socket is ready */
    if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))   // 非阻塞讀，且errono==EAGAIN，所以就是每次處理完一個SQL語句後，會繼續讀下一個SQL，發現讀不到，進到這裏，通過latch設置等待客戶端可讀事件
    {
        WaitEvent    event;

        Assert(waitfor);

        ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);

        WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
                         WAIT_EVENT_CLIENT_READ);

        /*
         * If the postmaster has died, it's not safe to continue running,
         * because it is the postmaster's job to kill us if some other backend
         * exits uncleanly.  Moreover, we won't run very well in this state;
         * helper processes like walwriter and the bgwriter will exit, so
         * performance may be poor.  Finally, if we don't exit, pg_ctl will be
         * unable to restart the postmaster without manual intervention, so no
         * new connections can be accepted.  Exiting clears the deck for a
         * postmaster restart.
         *
         * (Note that we only make this check when we would otherwise sleep on
         * our latch.  We might still continue running for a while if the
         * postmaster is killed in mid-query, or even through multiple queries
         * if we never have to wait for read.  We don't want to burn too many
         * cycles checking for this very rare condition, and this should cause
         * us to exit quickly in most cases.)
         */
        if (event.events & WL_POSTMASTER_DEATH)
            ereport(FATAL,
                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
                     errmsg("terminating connection due to unexpected postmaster exit")));

        /* Handle interrupt. */
        if (event.events & WL_LATCH_SET)
        {
            ResetLatch(MyLatch);
            ProcessClientReadInterrupt(true);

            /*
             * We'll retry the read. Most likely it will return immediately
             * because there's still no data available, and we'll wait for the
             * socket to become ready again.
             */
        }
        goto retry;
    }

在底層，linux內核 4.x平臺中，等待socket可讀是通過epoll_wait實現的。

#0  0x00007fa94fd880bb in epoll_wait () from /lib64/libc.so.6
#1  0x00000000007916de in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffb042fd40, cur_timeout=-1, set=0x2bf7e58) at latch.c:1295
#2  WaitEventSetWait (set=0x2bf7e58, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffb042fd40, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663296) at latch.c:1247
#3  0x0000000000688233 in secure_read (port=0x2c5b970, ptr=0xdc7500 <PqRecvBuffer>, len=8192) at be-secure.c:184
#4  0x000000000068eb7b in pq_recvbuf () at pqcomm.c:947
#5  pq_recvbuf () at pqcomm.c:923
#6  0x000000000068f985 in pq_getbyte () at pqcomm.c:990
#7  0x00000000007b531e in SocketBackend (inBuf=0x7fffb042ff40) at postgres.c:357
#8  ReadCommand (inBuf=0x7fffb042ff40) at postgres.c:530
#9  PostgresMain (argc=<optimized out>, argv=argv@entry=0x2c64490, dbname=<optimized out>, username=<optimized out>) at postgres.c:4598
#10 0x000000000073549d in BackendRun (port=0x2c5b970, port=0x2c5b970) at postmaster.c:5063

3.x內核版本中我記得是poll而非poll。

socket讀寫返回不同值的總結

在調用socket讀寫函數read(),write()時，都會有返回值。如果沒有正確處理返回值，就可能引入一些問題

1當read()或者write()函數返回值大於0時，表示實際從緩衝區讀取或者寫入的字節數目

2當read()函數返回值爲0時，表示對端已經關閉了 socket，這時候也要關閉這個socket，否則會導致socket泄露。netstat命令查看下，如果有closewait狀態的socket,就是socket泄露了

當write()函數返回0時，表示當前寫緩衝區已滿，是正常情況，下次再來寫就行了。

3當read()或者write()返回-1時，一般要判斷errno

如果errno == EINTR,表示系統當前中斷了，直接忽略

如果errno == EAGAIN或者EWOULDBLOCK，非阻塞socket直接忽略;如果是阻塞的socket,一般是讀寫操作超時了，還未返回。這個超時是指socket的SO_RCVTIMEO與SO_SNDTIMEO兩個屬性。所以在使用阻塞socket時，不要將超時時間設置的過小。不然返回了-1，你也不知道是socket連接是真的斷開了，還是正常的網絡抖動。一般情況下，阻塞的socket返回了-1，都需要關閉重新連接。

4.另外，對於非阻塞的connect,可能返回-1.這時需要判斷errno，如果 errno == EINPROGRESS，表示正在處理中，否則表示連接出錯了，需要關閉重連。之後使用select，檢測到該socket的可寫事件時，要判斷getsockopt(c->fd, SOL_SOCKET, SO_ERROR, &err, &errlen)，看socket是否出錯了。如果err值爲0,則表示connect成功；否則也應該關閉重連

5 在使用epoll時，有ET與LT兩種模式。ET模式下，socket需要read或者write到返回-1爲止。對於非阻塞的socket沒有問題，但是如果是阻塞的socket，正如第三條中所說的，只有超時纔會返回。所以在ET模式下千萬不要使用阻塞的socket。那麼LT模式爲什麼沒問題呢？一般情況下，使用LT模式，我們只要調用一次read或者write函數，如果沒有讀完或者沒有寫完，下次再來就是了。由於已經返回了可讀或者可寫事件，所以可以保證調用一次read或者write會正常返回。

注：除了read/write外，send/recv也可以讀寫socket，差別是多了個選項，但是在pg中用的是send/recv，第四個選項標誌位均沒有使用。第四個標誌位的含義可以參考https://www.cnblogs.com/haichun/p/3519232.html。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

postgresql socket讀數據返回-1

socket讀寫返回不同值的總結

ollama使用

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

TiDB Vector 太香啦：以圖搜圖初體驗！

《最新出爐》系列入門篇-Python+Playwright自動化測試-41-錄製視頻

lightdb WARNING: could not establish connection after 30000 ms

oracle 19c普通用戶查詢字典表all_views時等待row cache mutex事件

centos 7.5下oracle 19.3 rac安裝（最新親測）

rhel/centos 8.5下基於asm存儲的oracle 19c安裝

Oracle RAC SCAN ip的原理、配置及優缺點

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結