[APUE] 文件 I/O

文件操作相關 API：open, read, write, lseek, close.

多進程共享文件的相關 API：dup, dup2, fcntl, sync, fsync, ioctl.

文件操作 API

open and openat

函數原型：

#include <fcntl.h>
int open(const char *path, int oflag, ... /* mode_t mode */ );
int openat(int fd, const char *path, int oflag, ... /* mode_t mode */ );
// Both return: file descriptor if OK, −1 on error

oflag 是下列選項的組合（通過或運算 | ）：

必選且只能選擇一個：O_RDONLY, O_WRONLY, O_RDWR
可選項
- O_APPEND: 寫文件時追加到尾端。
- O_CLOEXEC
- O_CREAT: 文件不存在時創建；若使用該選項，需要 mode 參數，指定文件到訪問權限。
- O_DIRECTORY: 如果 path 不是目錄，出錯。
- O_EXCL: 如果同時指定了 O_CREAT，而文件已存在，則出錯。
- O_NOCITY
- O_NOFOLLOW: 如果 path 是一個符號鏈接，則出錯。
- O_NOBLOCK：如果 path 是一個 FIFO、一個塊特殊文件或一個字符特殊文件，則爲本次打開操作和後續的I/O操作設置非阻塞模式 (Nonblocking Mode) .
- O_SYNC: 使每次 write 操作等待物理 I/O 完成，包括由該 write 操作引起的文件屬性的更新所需要的 I/O 。
- O_DSYNC
- O_RSYNC
- O_TRUNC：如果文件存在，且打開模式可寫，那麼長度截斷爲 0 。
- O_TTYINIT

mode 參數可設定文件的權限，取值及其含義如下圖所示：

openat 的 fd 參數可以傳入一個目錄的 fd:

int dir = open(".", O_RDONLY | O_DIRECTORY);
int fd = openat(dir, "test.c", O_RDONLY);

creat

函數原型：

int creat(const char *path, mode_t mode);
// Returns: file descriptor opened for write-only if OK, −1 on error

creat 函數相當於：open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);

mode 表示文件的訪問權限，將在後續章節解析。

close

函數原型：

int close(int fd);
// Returns: 0 if OK, −1 on error

關閉文件，釋放該進程加在該文件上的所有記錄鎖。

進程結束時，內核會自動關閉所有它打開的文件，所以 close 有時候可有可無。

lseek

off_t lseek(int fd, off_t offset, int whence);
// Returns: new file offset if OK, −1 on error

對於 offset 參數的解析，取決於 whence 的值：

SEEK_SET：該文件的偏移量設置爲距文件開始處的 offset 個字節
SEEK_CUR：該文件的偏移量設置爲當前位置加上 offset 的值，這時候 offset 可爲負數。
SEEK_END：該文件的偏移量設置爲文件長度加上 offset 的值，這時候 offset 可爲負數。

如果 lseek 成功執行，返回新的文件偏移量，否則返回 -1 。如果 fd 指向的是一個 FIFO、管道或者 socket，lseek 返回 -1，並把 errno 設置爲 ESPIPE (Illegal Seek) . lseek 不引起任何 IO 操作，僅僅把當前偏移量記錄在內核當中，用於下一次的讀寫操作。

例子1

int main()
{
    if (lseek(STDIN_FILENO, 0, SEEK_CUR) != -1) puts("can seek");
    else puts("can not seek");
}

運行結果：

$ ./a.out < /etc/passwd
can seek
$ cat /etc/passwd | ./a.out 
can not seek

< 符號的作用是重定向輸入。

一般情況下，當前偏移量應當爲非負數，但某些設備（Linux中一切皆文件）允許它爲負數。此外，偏移量可以大於文件長度，這種情況下，對文件的下一次寫操作將「加長」文件，在文件中形成一個「空洞」（字節均值爲 0 ），空洞不一定會佔據磁盤空間，具體取決於文件系統的實現。

例子2：空洞文件

char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
int main()
{
    int fd = -1;
    if ((fd = creat("file.hole", FILE_MODE)) == -1) err_sys("creat error");
    if (write(fd, buf1, 10) != 10)                  err_sys("write error");
    if (lseek(fd, 16384, SEEK_SET) == -1)           err_sys("lseek error");
    if (write(fd, buf2, 10) != 10)                  err_sys("write2 error");
    // now offset is at 16394
}

運行結果：

$ ll file.hole 
-rw-r--r-- 1 sinkinben sinkinben 16394 1月  20 15:20 file.hole
$ od -c file.hole 
0000000   a   b   c   d   e   f   g   h   i   j  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0040000   A   B   C   D   E   F   G   H   I   J
0040012

od -c 以八進制輸出文件內容，hex(16394) = 0x400a .

創建一個同樣長度但沒有空洞的文件 file.nohole：

$ ls -sl file.*
 8 -rw-r--r-- 1 sinkinben sinkinben 16394 1月  20 15:31 file.hole
20 -rw-r--r-- 1 sinkinben sinkinben 16394 1月  20 15:31 file.nohole

可以看出，file.nohole 佔據了 20 個磁盤塊。

read

函數原型：

ssize_t read(int fd, void *buf, size_t nbytes);
// Returns: number of bytes read, 0 if end of file, −1 on error

可能存在返回值（實際讀到的字節數）小於要求讀取的字節數 nbytes 的情況：

讀取普通文件：當前 offset 離文件末端只有 30 字節，而要求讀取 100 字節。
讀取終端設備：通常一次最多讀取一行。
從網絡 socket 中讀取：網絡的緩衝機制可能造成上述情況。
從管道或者 FIFO 讀取：與讀取普通文件類似，剩下的字節數不足。
當某一信號造成讀取中斷。

read 操作一般都會採用預讀機制 (Read Ahead) 提高性能，預讀的數據放入到 Cache 當中，那麼下一次讀取就不用讀取磁盤。

write

函數原型：

ssize_t write(int fd, const void *buf, size_t nbytes);
// Returns: number of bytes written if OK, −1 on error

與 read 操作類似。返回值通常與 nbytes 相等，否則表示出錯。出錯的原因可能爲磁盤已滿，或者超過一個進程的文件長度限制。

文件共享

在 Unix 系統中，內核爲每個進程都建立了一個文件描述符表 (即下圖的 Process Table Entry, 名字是我自己翻譯的)，進程打開某個文件都過程如下圖所示。

進程的每一個 fd 都有對應的文件指針 (File Pointer) 指向某一個文件表項 (File Table Entry) ，該表項包括當前打開文件的狀態信息和一個 v-node 指針。其中 v-node 包含了文件的類型和操作該文件的函數指針等信息，還包括一個指向文件 inode 的指針。

如下圖所示，如果 2 個進程同時打開了同一個文件，那麼這 2 個 File Table Entry 的 v-node 指針將會指向同一個 v-node 。由圖中的過程可以看出，不同進程打開同一文件，每個進程對文件的偏移量是獨立的，文件的狀態信息 (File Status Flags) 也是獨立的。

基於這個過程，可以對上述對一些 IO 操作的特徵進行解析：

完成一次 write 操作後，File Table Entry 中的 offset 將會增加寫入的字節數。如果當前的 offset 超過了 i-node 中的文件大小 (current file size) ，那麼就將 current file size 設置爲當前的 offset 。
使用 O_APPEND 打開一個文件，File Table Entry 中的 file status flags 會記錄這個 O_APPEND 。每次 write 操作執行時，首先會把 current file offset 設置爲 i-node 中的 current file size。
若使用 lseek 定位到文件末端，則會把 offset 設置爲 file size 。
lseek 只修改 File Table Entry 中的 offset，不進行任何 IO 操作。

如果進程進行了 fork 操作，那麼 Process Table Entry 中的文件描述符表也會被子進程拷貝，所以也有可能有多個 File Pointer 指向同一個 File Table Entry 。類似，dup 操作也會使得同一進程中的 2 個不同的 fd 指向同一個 File Table Entry。

原子操作

在多進程場景下，需要對同一個日誌文件進行寫操作，那麼就有可能會出現進程 A 的內容被進程 B 的內容覆蓋的情況（因爲文件偏移量是獨立的）。

因此，寫操作需要實現爲一個原子操作（要麼全做，要麼全不做），才能滿足上述場景的要求。

pread and pwrite

函數原型：

ssize_t pread(int fd, void *buf, size_t nbytes, off_t offset);
// Returns: number of bytes read, 0 if end of file, −1 on error
ssize_t pwrite(int fd, const void *buf, size_t nbytes, off_t offset); 
// Returns: number of bytes written if OK, −1 on error

作用：從離文件開始處的 offset 位置開始，讀取 nbytes 個字節。

pread 的行爲相當於調用 lseek 後再次調用 read ，但 pread 是一個原子操作，這意味着：

調用 pread 過程中，無法中斷其定位 lseek 和 read 操作。
不更新當前的文件偏移量。

pwrite 與之類似。

dup and dup2

int dup(int fd);
int dup2(int fd, int fd2);
// Both return: new file descriptor if OK, −1 on error

作用：把 fd 複製爲一個新的描述符。如果傳入的 fd 無效，那麼返回 -1 。

dup 返回的總是可用對文件描述符中的最小值（也就是從 3 開始）。

對於 dup2 的 fd2 參數，用於指定新描述符的值，如果 fd2 已經打開，會先關閉它：

fd1 == fd2 : 返回 fd2，且不關閉它。
如果 fd1 無效，那麼返回 -1.
如果 fd1 有效，那麼把 fd1 複製爲 fd2 ，返回 fd2 。

如下圖所示，經過 dup 操作後，會有多個文件指針指向同一個 File Table Entry。

sync, fsync and fdatasync

現在的計算機通常都會有 Cache，爲了提高 IO 性能，除了在 read 一小節提到的預讀機制外，還有延遲寫機制 (Delayed Write) 。當我們向文件寫數據時，首先會拷貝到高速緩衝區當中，後面再把高速緩衝區中的數據寫到磁盤上（通過排隊 FIFO 的順序）。

在某些場景下，我們需要緩衝區的數據和磁盤的數據保持一致。因此需要 sync, fsync, fdatasync 這三個函數。

函數原型：

int fsync(int fd); 
int fdatasync(int fd);
// Returns: 0 if OK, −1 on error
void sync(void);

sync 的作用：

The sync function simply queues all the modified block buffers for writing and returns; it does not wait for the disk writes to take place. （不等待磁盤操作完成）
sync is normally called periodically (usually every 30 seconds) from a system daemon, often called update. The command sync also calls the sync function.（sync 通常由系統的一個守護進程 update 來週期性調用，命令 sync 也會調用這個函數。）

fsync 只對 fd 這一個文件實現同步操作，並且等待磁盤 IO 的完成才返回。

fdatasync 與 fsync 類似，但它只更新文件的數據，而 fsync 還會更新文件的屬性（包括權限信息等）。

fcntl and ioctl

函數原型：

int fcntl(int fd, int cmd, ... /* int arg */ );
// Returns: depends on cmd if OK (see following), −1 on error
int ioctl(int fd, int request, ...);
// Returns: −1 on error, something else if OK

fcntl 可以改變文件 fd 的屬性信息。ioctl 一般用於外部設備（比如實現驅動程序）的 IO 操作。

總結

APUE 看得好無聊，看着看着就想睡覺。

文件操作 API

open and openat

creat

close

lseek

read

write

文件共享

原子操作

pread and pwrite

dup and dup2

sync, fsync and fdatasync

fcntl and ioctl

總結

無鎖隊列 SPSC Queue

Random Walk Problem

General Matrix Multiplication

Architecture of GPU and CUDA

Architecture of Modern CPU

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結