對不起,學會這些 Linux 知識後,我有點飄

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Linux 簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"UNIX 是一個交互式系統,用於同時處理多進程和多用戶同時在線。爲什麼要說 UNIX,那是因爲 Linux 是由 UNIX 發展而來的,UNIX 是由程序員設計,它的主要服務對象也是程序員。Linux 繼承了 UNIX 的設計目標。從智能手機到汽車,超級計算機和家用電器,從家用臺式機到企業服務器,Linux 操作系統無處不在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大多數程序員都喜歡讓系統儘量簡單,優雅並具有一致性。舉個例子,從最底層的角度來講,一個文件應該只是一個字節集合。爲了實現順序存取、隨機存取、按鍵存取、遠程存取只能是妨礙你的工作。相同的,如果命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"ls A*"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"意味着只列出以 A 爲開頭的所有文件,那麼命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"rm A*"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應該會移除所有以 A 爲開頭的文件而不是隻刪除文件名是 "},{"type":"codeinline","content":[{"type":"text","text":"A*"}]},{"type":"text","text":" 的文件。這個特性也是"},{"type":"codeinline","content":[{"type":"text","text":"最小喫驚原則(principle of least surprise)"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">最小喫驚原則一半常用於用戶界面和軟件設計。它的原型是:該功能或者特徵應該符合用戶的預期,不應該使用戶感到驚訝和震驚。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些有經驗的程序員通常希望系統具有較強的功能性和靈活性。設計 Linux 的一個基本目標是每個應用程序只做一件事情並把他做好。所以編譯器只負責編譯的工作,編譯器不會產生列表,因爲有其他應用比編譯器做的更好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多人都不喜歡冗餘,爲什麼在 cp 就能描述清楚你想幹什麼時候還使用 copy?這完全是在浪費寶貴的 "},{"type":"codeinline","content":[{"type":"text","text":"hacking time"}]},{"type":"text","text":"。爲了從文件中提取所有包含字符串 "},{"type":"codeinline","content":[{"type":"text","text":"ard"}]},{"type":"text","text":" 的行,Linux 程序員應該輸入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"grep ard f"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統是一種金字塔模型的系統,如下所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2e/2e2ce005662344ad0d4caf7a637cd2c5.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用程序發起系統調用把參數放在寄存器中(有時候放在棧中),併發出 "},{"type":"codeinline","content":[{"type":"text","text":"trap"}]},{"type":"text","text":" 系統陷入指令切換用戶態至內核態。因爲不能直接在 C 中編寫 trap 指令,因此 C 提供了一個庫,庫中的函數對應着系統調用。有些函數是使用匯編編寫的,但是能夠從 C 中調用。每個函數首先把參數放在合適的位置然後執行系統調用指令。因此如果你想要執行 read 系統調用的話,C 程序會調用 read 函數庫來執行。這裏順便提一下,是由 POSIX 指定的庫接口而不是系統調用接口。也就是說,POSIX 會告訴一個標準系統應該提供哪些庫過程,它們的參數是什麼,它們必須做什麼以及它們必須返回什麼結果。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了操作系統和系統調用庫外,Linux 操作系統還要提供一些標準程序,比如文本編輯器、編譯器、文件操作工具等。直接和用戶打交道的是上面這些應用程序。因此我們可以說 Linux 具有三種不同的接口:"},{"type":"text","marks":[{"type":"strong"}],"text":"系統調用接口、庫函數接口和應用程序接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中的 "},{"type":"codeinline","content":[{"type":"text","text":"GUI(Graphical User Interface)"}]},{"type":"text","text":" 和 UNIX 中的非常相似,這種 GUI 創建一個桌面環境,包括窗口、目標和文件夾、工具欄和文件拖拽功能。一個完整的 GUI 還包括窗口管理器以及各種應用程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9f/9f5f567626b329e08dc9b769c178b08d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 上的 GUI 由 X 窗口支持,主要組成部分是 X 服務器、控制鍵盤、鼠標、顯示器等。當在 Linux 上使用圖形界面時,用戶可以通過鼠標點擊運行程序或者打開文件,通過拖拽將文件進行復制等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 組成部分"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實上,Linux 操作系統可以由下面這幾部分構成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"引導程序(Bootloader)"}]},{"type":"text","text":":引導程序是管理計算機啓動過程的軟件,對於大多數用戶而言,只是彈出一個屏幕,但其實內部操作系統做了很多事情"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"內核(Kernel)"}]},{"type":"text","text":":內核是操作系統的核心,負責管理 CPU、內存和外圍設備等。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"初始化系統(Init System)"}]},{"type":"text","text":":這是一個引導用戶空間並負責控制守護程序的子系統。一旦從引導加載程序移交了初始引導,它就是用於管理引導過程的初始化系統。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"後臺進程(Daemon)"}]},{"type":"text","text":":後臺進程顧名思義就是在後臺運行的程序,比如打印、聲音、調度等,它們可以在引導過程中啓動,也可以在登錄桌面後啓動"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"圖形服務器(Graphical server)"}]},{"type":"text","text":":這是在監視器上顯示圖形的子系統。通常將其稱爲 X 服務器或 X。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"桌面環境(Desktop environment)"}]},{"type":"text","text":":這是用戶與之實際交互的部分,有很多桌面環境可供選擇,每個桌面環境都包含內置應用程序,比如文件管理器、Web 瀏覽器、遊戲等"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"應用程序(Applications)"}]},{"type":"text","text":":桌面環境不提供完整的應用程序,就像 Windows 和 macOS 一樣,Linux 提供了成千上萬個可以輕鬆找到並安裝的高質量軟件。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Shell"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管 Linux 應用程序提供了 GUI ,但是大部分程序員仍偏好於使用"},{"type":"codeinline","content":[{"type":"text","text":"命令行(command-line interface)"}]},{"type":"text","text":",稱爲"},{"type":"codeinline","content":[{"type":"text","text":"shell"}]},{"type":"text","text":"。用戶通常在 GUI 中啓動一個 shell 窗口然後就在 shell 窗口下進行工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e13e93b1bffec0549814bd7bc985a4da.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"shell 命令行使用速度快、功能更強大、而且易於擴展、並且不會帶來"},{"type":"codeinline","content":[{"type":"text","text":"肢體重複性勞損(RSI)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面會介紹一些最簡單的 bash shell。當 shell 啓動時,它首先進行初始化,在屏幕上輸出一個 "},{"type":"codeinline","content":[{"type":"text","text":"提示符(prompt)"}]},{"type":"text","text":",通常是一個百分號或者美元符號,等待用戶輸入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8d/8d3469b3919197393d9e26625de676c2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等用戶輸入一個命令後,shell 提取其中的第一個詞,這裏的詞指的是被空格或製表符分隔開的一連串字符。假定這個詞是將要運行程序的程序名,那麼就會搜索這個程序,如果找到了這個程序就會運行它。然後 shell 會將自己掛起直到程序運行完畢,之後再嘗試讀入下一條指令。shell 也是一個普通的用戶程序。它的主要功能就是讀取用戶的輸入和顯示計算的輸出。shell 命令中可以包含參數,它們作爲字符串傳遞給所調用的程序。比如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"cp src dest"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"會調用 cp 應用程序幷包含兩個參數 "},{"type":"codeinline","content":[{"type":"text","text":"src"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"dest"}]},{"type":"text","text":"。這個程序會解釋第一個參數是一個已經存在的文件名,然後創建一個該文件的副本,名稱爲 dest。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並不是所有的參數都是文件名,比如下面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"head -20 file"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個參數 -20,會告訴 head 應用程序打印文件的前 20 行,而不是默認的 10 行。控制命令操作或者指定可選值的參數稱爲"},{"type":"codeinline","content":[{"type":"text","text":"標誌(flag)"}]},{"type":"text","text":",按照慣例標誌應該使用 "},{"type":"codeinline","content":[{"type":"text","text":"-"}]},{"type":"text","text":" 來表示。這個符號是必要的,比如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"head 20 file"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是一個完全合法的命令,它會告訴 head 程序輸出文件名爲 20 的文件的前 10 行,然後輸出文件名爲 file 文件的前 10 行。Linux 操作系統可以接受一個或多個參數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了更容易的指定多個文件名,shell 支持 "},{"type":"codeinline","content":[{"type":"text","text":"魔法字符(magic character)"}]},{"type":"text","text":",也被稱爲"},{"type":"codeinline","content":[{"type":"text","text":"通配符(wild cards)"}]},{"type":"text","text":"。比如,"},{"type":"codeinline","content":[{"type":"text","text":"*"}]},{"type":"text","text":" 可以匹配一個或者多個可能的字符串"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"ls *.c"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"告訴 ls 列舉出所有文件名以 "},{"type":"codeinline","content":[{"type":"text","text":".c"}]},{"type":"text","text":" 結束的文件。如果同時存在多個文件,則會在後面進行並列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個通配符是問號,負責匹配任意一個字符。一組在中括號中的字符可以表示其中任意一個,因此 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"ls [abc]*"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"會列舉出所有以 "},{"type":"codeinline","content":[{"type":"text","text":"a"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"b"}]},{"type":"text","text":" 或者 "},{"type":"codeinline","content":[{"type":"text","text":"c"}]},{"type":"text","text":" 開頭的文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"shell 應用程序不一定通過終端進行輸入和輸出。shell 啓動時,就會獲取 "},{"type":"text","marks":[{"type":"strong"}],"text":"標準輸入、標準輸出、標準錯誤"},{"type":"text","text":"文件進行訪問的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準輸出是從鍵盤輸入的,標準輸出或者標準錯誤是輸出到顯示器的。許多 Linux 程序默認是從標準輸入進行輸入並從標準輸出進行輸出。比如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sort\t"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"會調用 sort 程序,會從終端讀取數據(直到用戶輸入 ctrl-d 結束),根據字母順序進行排序,然後將結果輸出到屏幕上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常還可以重定向標準輸入和標準輸出,重定向標準輸入使用 "},{"type":"codeinline","content":[{"type":"text","text":""}]},{"type":"text","text":" 進行重定向。允許一個命令中重定向標準輸入和輸出。例如命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sort out"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"會使 sort 從文件 in 中得到輸入,並把結果輸出到 out 文件中。由於標準錯誤沒有重定向,所以錯誤信息會直接打印到屏幕上。從標準輸入讀入,對其進行處理並將其寫入到標準輸出的程序稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"過濾器"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考慮下面由三個分開的命令組成的指令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sort temp;head -30 f00"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對任意以 "},{"type":"codeinline","content":[{"type":"text","text":".t"}]},{"type":"text","text":" 結尾的文件中包含 "},{"type":"codeinline","content":[{"type":"text","text":"cxuan"}]},{"type":"text","text":" 的行被寫到標準輸出中,然後進行排序。這些內容中的前 30 行被 head 出來並傳給 tail ,它又將最後 5 行傳遞給 foo。這個例子提供了一個管道將多個命令連接起來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以把一系列 shell 命令放在一個文件中,然後將此文件作爲輸入來運行。shell 會按照順序對他們進行處理,就像在鍵盤上鍵入命令一樣。包含 shell 命令的文件被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"shell 腳本(shell scripts)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">推薦一個 shell 命令的學習網站:https://www.shellscript.sh/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"shell 腳本其實也是一段程序,shell 腳本中可以對變量進行賦值,也包含循環控制語句比如 "},{"type":"text","marks":[{"type":"strong"}],"text":"if、for、while"},{"type":"text","text":" 等,shell 的設計目標是讓其看起來和 C 相似(There is no doubt that C is father)。由於 shell 也是一個用戶程序,所以用戶可以選擇不同的 shell。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 應用程序"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 的命令行也就是 shell,它由大量標準應用程序組成。這些應用程序主要有下面六種"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文件和目錄操作命令"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"過濾器"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本程序"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統管理"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"程序開發工具,例如編輯器和編譯器"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其他"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了這些標準應用程序外,還有其他應用程序比如 "},{"type":"text","marks":[{"type":"strong"}],"text":"Web 瀏覽器、多媒體播放器、圖片瀏覽器、辦公軟件和遊戲程序等"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在上面的例子中已經見過了幾個 Linux 的應用程序,比如 sort、cp、ls、head,下面我們再來認識一下其他 Linux 的應用程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先從幾個例子開始講起,比如"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"cp a b"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是將 a 複製一個副本爲 b ,而"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"mv a b"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是將 a 移動到 b ,但是刪除原文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這兩個命令有一些區別,"},{"type":"codeinline","content":[{"type":"text","text":"cp"}]},{"type":"text","text":" 是將文件進行復制,複製完成後會有兩個文件 a 和 b;而 "},{"type":"codeinline","content":[{"type":"text","text":"mv"}]},{"type":"text","text":" 相當於是文件的移動,移動完成後就不再有 a 文件。"},{"type":"codeinline","content":[{"type":"text","text":"cat"}]},{"type":"text","text":" 命令可以把多個文件內容進行連接。使用 "},{"type":"codeinline","content":[{"type":"text","text":"rm"}]},{"type":"text","text":" 可以刪除文件;使用 "},{"type":"codeinline","content":[{"type":"text","text":"chmod"}]},{"type":"text","text":" 可以允許所有者改變訪問權限;文件目錄的的創建和刪除可以使用 "},{"type":"codeinline","content":[{"type":"text","text":"mkdir"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"rmdir"}]},{"type":"text","text":" 命令;使用 "},{"type":"codeinline","content":[{"type":"text","text":"ls"}]},{"type":"text","text":" 可以查看目錄文件,ls 可以顯示很多屬性,比如大小、用戶、創建日期等;sort 決定文件的顯示順序"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 應用程序還包括過濾器 grep,"},{"type":"codeinline","content":[{"type":"text","text":"grep"}]},{"type":"text","text":" 從標準輸入或者一個或多個輸入文件中提取特定模式的行;"},{"type":"codeinline","content":[{"type":"text","text":"sort"}]},{"type":"text","text":" 將輸入進行排序並輸出到標準輸出;"},{"type":"codeinline","content":[{"type":"text","text":"head"}]},{"type":"text","text":" 提取輸入的前幾行;tail 提取輸入的後面幾行;除此之外的過濾器還有 "},{"type":"codeinline","content":[{"type":"text","text":"cut"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"paste"}]},{"type":"text","text":",允許對文本行的剪切和複製;"},{"type":"codeinline","content":[{"type":"text","text":"od"}]},{"type":"text","text":" 將輸入轉換爲 ASCII ;"},{"type":"codeinline","content":[{"type":"text","text":"tr"}]},{"type":"text","text":" 實現字符大小寫轉換;"},{"type":"codeinline","content":[{"type":"text","text":"pr"}]},{"type":"text","text":" 爲格式化打印輸出等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"程序編譯工具使用 "},{"type":"codeinline","content":[{"type":"text","text":"gcc "}]},{"type":"text","text":";"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"make"}]},{"type":"text","text":" 命令用於自動編譯,這是一個很強大的命令,它用於維護一個大的程序,往往這類程序的源碼由許多文件構成。典型的,有一些是 "},{"type":"codeinline","content":[{"type":"text","text":"header files 頭文件"}]},{"type":"text","text":",源文件通常使用 "},{"type":"codeinline","content":[{"type":"text","text":"include"}]},{"type":"text","text":" 指令包含這些文件,make 的作用就是跟蹤哪些文件屬於頭文件,然後安排自動編譯的過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面列出了 POSIX 的標準應用程序"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 程序 | 應用 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ----- | ---------------------- |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ls | 列出目錄 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cp | 複製文件 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| head | 顯示文件的前幾行 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| make | 編譯文件生成二進制文件 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cd | 切換目錄 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| mkdir | 創建目錄 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| chmod | 修改文件訪問權限 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ps | 列出文件進程 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| pr | 格式化打印 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| rm | 刪除一個文件 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| rmdir | 刪除文件目錄 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| tail | 提取文件最後幾行 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| tr | 字符集轉換 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| grep | 分組 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cat | 將多個文件連續標準輸出 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| od | 以八進制顯示文件 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cut | 從文件中剪切 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| paste | 從文件中粘貼 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 內核結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上面我們看到了 Linux 的整體結構,下面我們從整體的角度來看一下 Linux 的內核結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e5/e50e0d8ce2c0f283505e89d42fe5a246.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內核直接坐落在硬件上,內核的主要作用就是 I/O 交互、內存管理和控制 CPU 訪問。上圖中還包括了 "},{"type":"codeinline","content":[{"type":"text","text":"中斷"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"調度器"}]},{"type":"text","text":",中斷是與設備交互的主要方式。中斷出現時調度器就會發揮作用。這裏的低級代碼停止正在運行的進程,將其狀態保存在內核進程結構中,並啓動驅動程序。進程調度也會發生在內核完成一些操作並且啓動用戶進程的時候。圖中的調度器是 dispatcher。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">注意這裏的調度器是 "},{"type":"codeinline","content":[{"type":"text","text":"dispatcher"}]},{"type":"text","text":" 而不是 "},{"type":"codeinline","content":[{"type":"text","text":"scheduler"}]},{"type":"text","text":",這兩者是有區別的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">scheduler 和 dispatcher 都是和進程調度相關的概念,不同的是 scheduler 會從幾個進程中隨意選取一個進程;而 dispatcher 會給 scheduler 選擇的進程分配 CPU。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,我們把內核系統分爲三部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O 部分負責與設備進行交互以及執行網絡和存儲 I/O 操作的所有內核部分。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖中可以看出 I/O 層次的關係,最高層是一個"},{"type":"codeinline","content":[{"type":"text","text":"虛擬文件系統"}]},{"type":"text","text":",也就是說不管文件是來自內存還是磁盤中,都是經過虛擬文件系統中的。從底層看,所有的驅動都是字符驅動或者塊設備驅動。二者的主要區別就是是否允許隨機訪問。網絡驅動設備並不是一種獨立的驅動設備,它實際上是一種字符設備,不過網絡設備的處理方式和字符設備不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的設備驅動程序中,每個設備類型的內核代碼都不同。字符設備有兩種使用方式,有"},{"type":"codeinline","content":[{"type":"text","text":"一鍵式"}]},{"type":"text","text":"的比如 vi 或者 emacs ,需要每一個鍵盤輸入。其他的比如 shell ,是需要輸入一行按回車鍵將字符串發送給程序進行編輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網絡軟件通常是模塊化的,由不同的設備和協議來支持。大多數 Linux 系統在內核中包含一個完整的硬件路由器的功能,但是這個不能和外部路由器相比,路由器上面是"},{"type":"codeinline","content":[{"type":"text","text":"協議棧"}]},{"type":"text","text":",包括 TCP/IP 協議,協議棧上面是 socket 接口,socket 負責與外部進行通信,充當了門的作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤驅動上面是 I/O 調度器,它負責排序和分配磁盤讀寫操作,以儘可能減少磁頭的無用移動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O 右邊的是內存部件,程序被裝載進內存,由 CPU 執行,這裏會涉及到虛擬內存的部件,頁面的換入和換出是如何進行的,壞頁面的替換和經常使用的頁面會進行緩存。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進程模塊負責進程的創建和終止、進程的調度、Linux 把進程和線程看作是可運行的實體,並使用統一的調度策略來進行調度。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在內核最頂層的是系統調用接口,所有的系統調用都是經過這裏,系統調用會觸發一個 trap,將系統從用戶態轉換爲內核態,然後將控制權移交給上面的內核部件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Linux 進程和線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們就深入理解一下 Linux 內核來理解 Linux 的基本概念之進程和線程。系統調用是操作系統本身的接口,它對於創建進程和線程,內存分配,共享文件和 I/O 來說都很重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將從各個版本的共性出發來進行探討。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基本概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個進程都會運行一段獨立的程序,並且在初始化的時候擁有一個獨立的控制線程。換句話說,每個進程都會有一個自己的程序計數器,這個程序計數器用來記錄下一個需要被執行的指令。Linux 允許進程在運行時創建額外的線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/eb6b0141659094bed27e75990b3321f2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 是一個多道程序設計系統,因此係統中存在彼此相互獨立的進程同時運行。此外,每個用戶都會同時有幾個活動的進程。因爲如果是一個大型系統,可能有數百上千的進程在同時運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在某些用戶空間中,即使用戶退出登錄,仍然會有一些後臺進程在運行,這些進程被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"守護進程(daemon)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中有一種特殊的守護進程被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"計劃守護進程(Cron daemon)"}]},{"type":"text","text":" ,計劃守護進程可以每分鐘醒來一次檢查是否有工作要做,做完會繼續回到睡眠狀態等待下一次喚醒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/69/690f186735649b5b8af2511524ccf465.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cron 是一個守護程序,可以做任何你想做的事情,比如說你可以定期進行系統維護、定期進行系統備份等。在其他操作系統上也有類似的程序,比如 Mac OS X 上 Cron 守護程序被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"launchd"}]},{"type":"text","text":" 的守護進程。在 Windows 上可以被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"計劃任務(Task Scheduler)"}]},{"type":"text","text":"。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 系統中,進程通過非常簡單的方式來創建,"},{"type":"codeinline","content":[{"type":"text","text":"fork"}]},{"type":"text","text":" 系統調用會創建一個源進程的"},{"type":"codeinline","content":[{"type":"text","text":"拷貝(副本)"}]},{"type":"text","text":"。調用 fork 函數的進程被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"父進程(parent process)"}]},{"type":"text","text":",使用 fork 函數創建出來的進程被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"子進程(child process)"}]},{"type":"text","text":"。父進程和子進程都有自己的內存映像。如果在子進程創建出來後,父進程修改了一些變量等,那麼子進程是看不到這些變化的,也就是 fork 後,父進程和子進程相互獨立。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然父進程和子進程保持相互獨立,但是它們卻能夠共享相同的文件,如果在 fork 之前,父進程已經打開了某個文件,那麼 fork 後,父進程和子進程仍然共享這個打開的文件。對共享文件的修改會對父進程和子進程同時可見。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼該如何區分父進程和子進程呢?子進程只是父進程的拷貝,所以它們幾乎所有的情況都一樣,包括內存映像、變量、寄存器等。區分的關鍵在於 "},{"type":"codeinline","content":[{"type":"text","text":"fork "}]},{"type":"text","text":" 函數調用後的返回值,如果 fork 後返回一個非零值,這個非零值即是子進程的 "},{"type":"codeinline","content":[{"type":"text","text":"進程標識符(Process Identiier, PID)"}]},{"type":"text","text":",而會給子進程返回一個零值,可以用下面代碼來進行表示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"c"},"content":[{"type":"text","text":"pid = fork(); // 調用 fork 函數創建進程\nif(pid < 0){\n error()\t\t\t\t // pid < 0,創建失敗\n}\nelse if(pid > 0){\n parent_handle() // 父進程代碼\n}\nelse {\n child_handle() // 子進程代碼\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"父進程在 fork 後會得到子進程的 PID,這個 PID 即能代表這個子進程的唯一標識符也就是 PID。如果子進程想要知道自己的 PID,可以調用 "},{"type":"codeinline","content":[{"type":"text","text":"getpid"}]},{"type":"text","text":" 方法。當子進程結束運行時,父進程會得到子進程的 PID,因爲一個進程會 fork 很多子進程,子進程也會 fork 子進程,所以 PID 是非常重要的。我們把第一次調用 fork 後的進程稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"原始進程"}]},{"type":"text","text":",一個原始進程可以生成一顆繼承樹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d6/d6c105eee74f73aed0685da80fb5646a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 進程間通信"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 進程間的通信機制通常被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"Internel-Process communication,IPC "}]},{"type":"text","text":"下面我們來說一說 Linux 進程間通信的機制,大致來說,Linux 進程間的通信機制可以分爲 6 種"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8a/8a15382b200baa1614c8a08f20b8bdcb.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們分別對其進行概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"信號 signal"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"信號是 UNIX 系統最先開始使用的進程間通信機制,因爲 Linux 是繼承於 UNIX 的,所以 Linux 也支持信號機制,通過向一個或多個進程發送"},{"type":"codeinline","content":[{"type":"text","text":"異步事件信號"}]},{"type":"text","text":"來實現,信號可以從鍵盤或者訪問不存在的位置等地方產生;信號通過 shell 將任務發送給子進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你可以在 Linux 系統上輸入 "},{"type":"codeinline","content":[{"type":"text","text":"kill -l"}]},{"type":"text","text":" 來列出系統使用的信號,下面是我提供的一些信號"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/96/96c3aaeb43211a61200aef7f5a285a14.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進程可以選擇忽略發送過來的信號,但是有兩個是不能忽略的:"},{"type":"codeinline","content":[{"type":"text","text":"SIGSTOP"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"SIGKILL"}]},{"type":"text","text":" 信號。SIGSTOP 信號會通知當前正在運行的進程執行關閉操作,SIGKILL 信號會通知當前進程應該被殺死。除此之外,進程可以選擇它想要處理的信號,進程也可以選擇阻止信號,如果不阻止,可以選擇自行處理,也可以選擇進行內核處理。如果選擇交給內核進行處理,那麼就執行默認處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"操作系統會中斷目標程序的進程來向其發送信號、在任何非原子指令中,執行都可以中斷,如果進程已經註冊了新號處理程序,那麼就執行進程,如果沒有註冊,將採用默認處理的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:當進程收到 "},{"type":"codeinline","content":[{"type":"text","text":"SIGFPE"}]},{"type":"text","text":" 浮點異常的信號後,默認操作是對其進行 "},{"type":"codeinline","content":[{"type":"text","text":"dump(轉儲)"}]},{"type":"text","text":"和退出。信號沒有優先級的說法。如果同時爲某個進程產生了兩個信號,則可以將它們呈現給進程或者以任意的順序進行處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們就來看一下這些信號是幹什麼用的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGABRT 和 SIGIOT"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGABRT 和 SIGIOT 信號發送給進程,告訴其進行終止,這個 信號通常在調用 C標準庫的"},{"type":"codeinline","content":[{"type":"text","text":"abort()"}]},{"type":"text","text":"函數時由進程本身啓動"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGALRM 、 SIGVTALRM、SIGPROF"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當設置的時鐘功能超時時會將 SIGALRM 、 SIGVTALRM、SIGPROF 發送給進程。當實際時間或時鐘時間超時時,發送 SIGALRM。 當進程使用的 CPU 時間超時時,將發送 SIGVTALRM。 當進程和系統代表進程使用的CPU 時間超時時,將發送 SIGPROF。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGBUS"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGBUS 將造成"},{"type":"codeinline","content":[{"type":"text","text":"總線中斷"}]},{"type":"text","text":"錯誤時發送給進程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGCHLD"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當子進程終止、被中斷或者被中斷恢復,將 SIGCHLD 發送給進程。此信號的一種常見用法是指示操作系統在子進程終止後清除其使用的資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGCONT"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGCONT 信號指示操作系統繼續執行先前由 SIGSTOP 或 SIGTSTP 信號暫停的進程。該信號的一個重要用途是在 Unix shell 中的作業控制中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGFPE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGFPE 信號在執行錯誤的算術運算(例如除以零)時將被髮送到進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGUP"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGUP 信號控制的終端關閉時,會發送給進程。許多守護程序將重新加載其配置文件並重新打開其日誌文件,而不是在收到此信號時退出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGILL"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGILL 信號在嘗試執行非法、格式錯誤、未知或者特權指令時發出"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGINT"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶希望中斷進程時,操作系統會向進程發送 SIGINT 信號。用戶輸入 ctrl - c 就是希望中斷進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGKILL"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGKILL 信號發送到進程以使其馬上進行終止。 與 SIGTERM 和 SIGINT 相比,這個信號無法捕獲和忽略執行,並且進程在接收到此信號後無法執行任何清理操作,下面是一些例外情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"殭屍進程無法殺死,因爲殭屍進程已經死了,它在等待父進程對其進行捕獲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處於阻塞狀態的進程只有再次喚醒後纔會被 kill 掉"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"init"}]},{"type":"text","text":" 進程是 Linux 的初始化進程,這個進程會忽略任何信號。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGKILL 通常是作爲最後殺死進程的信號、它通常作用於 SIGTERM 沒有響應時發送給進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGPIPE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGPIPE 嘗試寫入進程管道時發現管道未連接無法寫入時發送到進程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGPOLL"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當在明確監視的文件描述符上發生事件時,將發送 SIGPOLL 信號。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGRTMIN 至 SIGRTMAX"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGRTMIN 至 SIGRTMAX 是"},{"type":"codeinline","content":[{"type":"text","text":"實時信號"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGQUIT"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶請求退出進程並執行核心轉儲時,SIGQUIT 信號將由其控制終端發送給進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGSEGV"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGSEGV 信號做出無效的虛擬內存引用或分段錯誤時,即在執行分段違規時,將其發送到進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGSTOP"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGSTOP 指示操作系統終止以便以後進行恢復時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGSYS"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGSYS 信號將錯誤參數傳遞給系統調用時,該信號將發送到進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SYSTERM"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們上面簡單提到過了 SYSTERM 這個名詞,這個信號發送給進程以請求終止。與 SIGKILL 信號不同,該信號可以被過程捕獲或忽略。這允許進程執行良好的終止,從而釋放資源並在適當時保存狀態。 SIGINT 與SIGTERM 幾乎相同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGTSIP"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGTSTP 信號由其控制終端發送到進程,以請求終端停止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGTTIN 和 SIGTTOU"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGTTIN 和SIGTTOU 信號分別在後臺嘗試從 tty 讀取或寫入時,信號將發送到該進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" SIGTRAP"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在發生異常或者 trap 時,將 SIGTRAP 信號發送到進程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGURG"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當套接字具有可讀取的緊急或帶外數據時,將 SIGURG 信號發送到進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGUSR1 和 SIGUSR2"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGUSR1 和 SIGUSR2 信號被髮送到進程以指示用戶定義的條件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGXCPU"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGXCPU 信號耗盡 CPU 的時間超過某個用戶可設置的預定值時,將其發送到進程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGXFSZ"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 SIGXFSZ 信號增長超過最大允許大小的文件時,該信號將發送到該進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGWINCH"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SIGWINCH 信號在其控制終端更改其大小(窗口更改)時發送給進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"管道 pipe"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統中的進程可以通過建立管道 pipe 進行通信。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在兩個進程之間,可以建立一個通道,一個進程向這個通道里寫入字節流,另一個進程從這個管道中讀取字節流。管道是同步的,當進程嘗試從空管道讀取數據時,該進程會被阻塞,直到有可用數據爲止。shell 中的"},{"type":"codeinline","content":[{"type":"text","text":"管線 pipelines"}]},{"type":"text","text":" 就是用管道實現的,當 shell 發現輸出 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sort "},{"type":"codeinline","content":[{"type":"text","text":"進程位於內存"}]},{"type":"text","text":"被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"PIM(Process In Memory)"}]},{"type":"text","text":" ,這是馮諾伊曼體系架構的一種體現,加載到內存中並執行的程序稱爲進程。簡單來說,一個進程就是正在執行的程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進程描述符可以歸爲下面這幾類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"調度參數(scheduling parameters)"}]},{"type":"text","text":":進程優先級、最近消耗 CPU 的時間、最近睡眠時間一起決定了下一個需要運行的進程"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"內存映像(memory image)"}]},{"type":"text","text":":我們上面說到,進程映像是執行程序時所需要的可執行文件,它由數據和代碼組成。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"信號(signals)"}]},{"type":"text","text":":顯示哪些信號被捕獲、哪些信號被執行"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"寄存器"}]},{"type":"text","text":":當發生內核陷入 (trap) 時,寄存器的內容會被保存下來。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"系統調用狀態(system call state)"}]},{"type":"text","text":":當前系統調用的信息,包括參數和結果"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"文件描述符表(file descriptor table)"}]},{"type":"text","text":":有關文件描述符的系統被調用時,文件描述符作爲索引在文件描述符表中定位相關文件的 i-node 數據結構"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"統計數據(accounting)"}]},{"type":"text","text":":記錄用戶、進程佔用系統 CPU 時間表的指針,一些操作系統還保存進程最多佔用的 CPU 時間、進程擁有的最大堆棧空間、進程可以消耗的頁面數等。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"內核堆棧(kernel stack)"}]},{"type":"text","text":":進程的內核部分可以使用的固定堆棧"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"其他"}]},{"type":"text","text":": 當前進程狀態、事件等待時間、距離警報的超時時間、PID、父進程的 PID 以及用戶標識符等"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了上面這些信息,現在就很容易描述在 Linux 中是如何創建這些進程的了,創建新流程實際上非常簡單。"},{"type":"text","marks":[{"type":"strong"}],"text":"爲子進程開闢一塊新的用戶空間的進程描述符,然後從父進程複製大量的內容。爲這個子進程分配一個 PID,設置其內存映射,賦予它訪問父進程文件的權限,註冊並啓動"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當執行 fork 系統調用時,調用進程會陷入內核並創建一些和任務相關的數據結構,比如"},{"type":"codeinline","content":[{"type":"text","text":"內核堆棧(kernel stack)"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"thread_info"}]},{"type":"text","text":" 結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">關於 thread_info 結構可以參考 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">https://docs.huihoo.com/doxygen/linux/kernel/3.7/arch"},{"type":"text","marks":[{"type":"italic"}],"text":"2avr32"},{"type":"text","text":"2include"},{"type":"text","marks":[{"type":"italic"}],"text":"2asm"},{"type":"text","text":"2thread_"},{"type":"text","marks":[{"type":"italic"}],"text":"info"},{"type":"text","text":"8h_source.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個結構中包含進程描述符,進程描述符位於固定的位置,使得 Linux 系統只需要很小的開銷就可以定位到一個運行中進程的數據結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進程描述符的主要內容是根據"},{"type":"codeinline","content":[{"type":"text","text":"父進程"}]},{"type":"text","text":"的描述符來填充。Linux 操作系統會尋找一個可用的 PID,並且此 PID 沒有被任何進程使用,更新進程標示符使其指向一個新的數據結構即可。爲了減少 hash table 的碰撞,進程描述符會形成"},{"type":"codeinline","content":[{"type":"text","text":"鏈表"}]},{"type":"text","text":"。它還將 task_struct 的字段設置爲指向任務數組上相應的上一個/下一個進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">task_struct : Linux 進程描述符,內部涉及到衆多 C++ 源碼,我們會在後面進行講解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從原則上來說,爲子進程開闢內存區域併爲子進程分配數據段、堆棧段,並且對父進程的內容進行復制,但是實際上 fork 完成後,子進程和父進程沒有共享內存,所以需要複製技術來實現同步,但是複製開銷比較大,因此 Linux 操作系統使用了一種 "},{"type":"codeinline","content":[{"type":"text","text":"欺騙"}]},{"type":"text","text":" 方式。即爲子進程分配頁表,然後新分配的頁表指向父進程的頁面,同時這些頁面是隻讀的。當進程向這些頁面進行寫入的時候,會開啓保護錯誤。內核發現寫入操作後,會爲進程分配一個副本,使得寫入時把數據複製到這個副本上,這個副本是共享的,這種方式稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"寫入時複製(copy on write)"}]},{"type":"text","text":",這種方式避免了在同一塊內存區域維護兩個副本的必要,節省內存空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在子進程開始運行後,操作系統會調用 exec 系統調用,內核會進行查找驗證可執行文件,把參數和環境變量複製到內核,釋放舊的地址空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在新的地址空間需要被創建和填充。如果系統支持映射文件,就像 Unix 系統一樣,那麼新的頁表就會創建,表明內存中沒有任何頁,除非所使用的頁面是堆棧頁,其地址空間由磁盤上的可執行文件支持。新進程開始運行時,立刻會收到一個"},{"type":"codeinline","content":[{"type":"text","text":"缺頁異常(page fault)"}]},{"type":"text","text":",這會使具有代碼的頁面加載進入內存。最後,參數和環境變量被複制到新的堆棧中,重置信號,寄存器全部清零。新的命令開始運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是一個示例,用戶輸出 ls,shell 會調用 fork 函數複製一個新進程,shell 進程會調用 exec 函數用可執行文件 ls 的內容覆蓋它的內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aa77b70011522baed485137a7da43b02.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Linux 線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們來討論一下 Linux 中的線程,線程是輕量級的進程,想必這句話你已經聽過很多次了,"},{"type":"codeinline","content":[{"type":"text","text":"輕量級"}]},{"type":"text","text":"體現在所有的進程切換都需要清除所有的表、進程間的共享信息也比較麻煩,一般來說通過管道或者共享內存,如果是 fork 函數後的父子進程則使用共享文件,然而線程切換不需要像進程一樣具有昂貴的開銷,而且線程通信起來也更方便。線程分爲兩種:用戶級線程和內核級線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"用戶級線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶級線程避免使用內核,通常,每個線程會顯示調用開關,發送信號或者執行某種切換操作來放棄 CPU,同樣,計時器可以強制進行開關,用戶線程的切換速度通常比內核線程快很多。在用戶級別實現線程會有一個問題,即單個線程可能會壟斷 CPU 時間片,導致其他線程無法執行從而 "},{"type":"codeinline","content":[{"type":"text","text":"餓死"}]},{"type":"text","text":"。如果執行一個 I/O 操作,那麼 I/O 會阻塞,其他線程也無法運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72006eac12a08d633a8ff905d54f04a9.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一種解決方案是,一些用戶級的線程包解決了這個問題。可以使用時鐘週期的監視器來控制第一時間時間片獨佔。然後,一些庫通過特殊的包裝來解決系統調用的 I/O 阻塞問題,或者可以爲非阻塞 I/O 編寫任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"內核級線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內核級線程通常使用幾個進程表在內核中實現,每個任務都會對應一個進程表。在這種情況下,內核會在每個進程的時間片內調度每個線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/84ffaff7dd39aa4b4f09a05f9553dab9.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有能夠阻塞的調用都會通過系統調用的方式來實現,當一個線程阻塞時,內核可以進行選擇,是運行在同一個進程中的另一個線程(如果有就緒線程的話)還是運行一個另一個進程中的線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從用戶空間 -> 內核空間 -> 用戶空間的開銷比較大,但是線程初始化的時間損耗可以忽略不計。這種實現的好處是由時鐘決定線程切換時間,因此不太可能將時間片與任務中的其他線程佔用時間綁定到一起。同樣,I/O 阻塞也不是問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"混合實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合用戶空間和內核空間的優點,設計人員採用了一種"},{"type":"codeinline","content":[{"type":"text","text":"內核級線程"}]},{"type":"text","text":"的方式,然後將用戶級線程與某些或者全部內核線程多路複用起來"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9f/9f4a1bb5186811a9b79b1973b61f2fef.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這種模型中,編程人員可以自由控制用戶線程和內核線程的數量,具有很大的靈活度。採用這種方法,內核只識別內核級線程,並對其進行調度。其中一些內核級線程會被多個用戶級線程多路複用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 調度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們來關注一下 Linux 系統的調度算法,首先需要認識到,Linux 系統的線程是內核線程,所以 Linux 系統是基於線程的,而不是基於進程的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了進行調度,Linux 系統將線程分爲三類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時先入先出"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時輪詢"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分時"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時先入先出線程具有最高優先級,它不會被其他線程所搶佔,除非那是一個剛剛準備好的,擁有更高優先級的線程進入。實時輪轉線程與實時先入先出線程基本相同,只是每個實時輪轉線程都有一個時間量,時間到了之後就可以被搶佔。如果多個實時線程準備完畢,那麼每個線程運行它時間量所規定的時間,然後插入到實時輪轉線程末尾。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">注意這個實時只是相對的,無法做到絕對的實時,因爲線程的運行時間無法確定。它們相對分時系統來說,更加具有實時性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統會給每個線程分配一個 "},{"type":"codeinline","content":[{"type":"text","text":"nice"}]},{"type":"text","text":" 值,這個值代表了優先級的概念。nice 值默認值是 0 ,但是可以通過系統調用 nice 值來修改。修改值的範圍從 -20 - +19。nice 值決定了線程的靜態優先級。一般系統管理員的 nice 值會比一般線程的優先級高,它的範圍是 -20 - -1。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們更詳細的討論一下 Linux 系統的兩個調度算法,它們的內部與"},{"type":"codeinline","content":[{"type":"text","text":"調度隊列(runqueue)"}]},{"type":"text","text":" 的設計很相似。運行隊列有一個數據結構用來監視系統中所有可運行的任務並選擇下一個可以運行的任務。每個運行隊列和系統中的每個 CPU 有關。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Linux O(1)"}]},{"type":"text","text":" 調度器是歷史上很流行的一個調度器。這個名字的由來是因爲它能夠在常數時間內執行任務調度。在 O(1) 調度器裏,調度隊列被組織成兩個數組,一個是任務"},{"type":"text","marks":[{"type":"strong"}],"text":"正在活動"},{"type":"text","text":"的數組,一個是任務"},{"type":"text","marks":[{"type":"strong"}],"text":"過期失效"},{"type":"text","text":"的數組。如下圖所示,每個數組都包含了 140 個鏈表頭,每個鏈表頭具有不同的優先級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/aec2aa1d7eb7988ff8c8158ba5df173b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大致流程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器從正在活動數組中選擇一個優先級最高的任務。如果這個任務的時間片過期失效了,就把它移動到過期失效數組中。如果這個任務阻塞了,比如說正在等待 I/O 事件,那麼在它的時間片過期失效之前,一旦 I/O 操作完成,那麼這個任務將會繼續運行,它將被放回到之前正在活動的數組中,因爲這個任務之前已經消耗一部分 CPU 時間片,所以它將運行剩下的時間片。當這個任務運行完它的時間片後,它就會被放到過期失效數組中。一旦正在活動的任務數組中沒有其他任務後,調度器將會交換指針,使得正在活動的數組變爲過期失效數組,過期失效數組變爲正在活動的數組。使用這種方式可以保證每個優先級的任務都能夠得到執行,不會導致線程飢餓。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這種調度方式中,不同優先級的任務所得到 CPU 分配的時間片也是不同的,高優先級進程往往能得到較長的時間片,低優先級的任務得到較少的時間片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式爲了保證能夠更好的提供服務,通常會爲 "},{"type":"codeinline","content":[{"type":"text","text":"交互式進程"}]},{"type":"text","text":" 賦予較高的優先級,交互式進程就是"},{"type":"codeinline","content":[{"type":"text","text":"用戶進程"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統不知道一個任務究竟是 I/O 密集型的還是 CPU 密集型的,它只是依賴於交互式的方式,Linux 系統會區分是"},{"type":"codeinline","content":[{"type":"text","text":"靜態優先級"}]},{"type":"text","text":" 還是 "},{"type":"codeinline","content":[{"type":"text","text":"動態優先級"}]},{"type":"text","text":"。動態優先級是採用一種獎勵機制來實現的。獎勵機制有兩種方式:"},{"type":"text","marks":[{"type":"strong"}],"text":"獎勵交互式線程、懲罰佔用 CPU 的線程"},{"type":"text","text":"。在 Linux O(1) 調度器中,最高的優先級獎勵是 -5,注意這個優先級越低越容易被線程調度器接受,所以最高懲罰的優先級是 +5。具體體現就是操作系統維護一個名爲 "},{"type":"codeinline","content":[{"type":"text","text":"sleep_avg"}]},{"type":"text","text":" 的變量,任務喚醒會增加 sleep_avg 變量的值,當任務被搶佔或者時間量過期會減少這個變量的值,反映在獎勵機制上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"O(1) 調度算法是 2.6 內核版本的調度器,最初引入這個調度算法的是不穩定的 2.5 版本。早期的調度算法在多處理器環境中說明了通過訪問正在活動數組就可以做出調度的決定。使調度可以在固定的時間 O(1) 完成。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"O(1) 調度器使用了一種 "},{"type":"codeinline","content":[{"type":"text","text":"啓發式"}]},{"type":"text","text":" 的方式,這是什麼意思?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">在計算機科學中,啓發式是一種當傳統方式解決問題很慢時用來快速解決問題的方式,或者找到一個在傳統方法無法找到任何精確解的情況下找到近似解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"O(1) 使用啓發式的這種方式,會使任務的優先級變得複雜並且不完善,從而導致在處理交互任務時性能很糟糕。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了改進這個缺點,O(1) 調度器的開發者又提出了一個新的方案,即 "},{"type":"codeinline","content":[{"type":"text","text":"公平調度器(Completely Fair Scheduler, CFS)"}]},{"type":"text","text":"。 CFS 的主要思想是使用一顆"},{"type":"codeinline","content":[{"type":"text","text":"紅黑樹"}]},{"type":"text","text":"作爲調度隊列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">數據結構太重要了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CFS 會根據任務在 CPU 上的運行時間長短而將其有序地排列在樹中,時間精確到納秒級。下面是 CFS 的構造模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ed/ed8cf942f744bc4cbfc7ecaff2e6bc2e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CFS 的調度過程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CFS 算法總是優先調度哪些使用 CPU 時間最少的任務。最小的任務一般都是在最左邊的位置。當有一個新的任務需要運行時,CFS 會把這個任務和最左邊的數值進行對比,如果此任務具有最小時間值,那麼它將進行運行,否則它會進行比較,找到合適的位置進行插入。然後 CPU 運行紅黑樹上當前比較的最左邊的任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在紅黑樹中選擇一個節點來運行的時間可以是常數時間,但是插入一個任務的時間是 "},{"type":"codeinline","content":[{"type":"text","text":"O(loog(N))"}]},{"type":"text","text":",其中 N 是系統中的任務數。考慮到當前系統的負載水平,這是可以接受的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器只需要考慮可運行的任務即可。這些任務被放在適當的調度隊列中。不可運行的任務和正在等待的各種 I/O 操作或內核事件的任務被放入一個"},{"type":"codeinline","content":[{"type":"text","text":"等待隊列"}]},{"type":"text","text":"中。等待隊列頭包含一個指向任務鏈表的指針和一個自旋鎖。自旋鎖對於併發處理場景下用處很大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Linux 系統中的同步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面來聊一下 Linux 中的同步機制。早期的 Linux 內核只有一個 "},{"type":"codeinline","content":[{"type":"text","text":"大內核鎖(Big Kernel Lock,BKL)"}]},{"type":"text","text":" 。它阻止了不同處理器併發處理的能力。因此,需要引入一些粒度更細的鎖機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 提供了若干不同類型的同步變量,這些變量既能夠在內核中使用,也能夠在用戶應用程序中使用。在地層中,Linux 通過使用 "},{"type":"codeinline","content":[{"type":"text","text":"atomic_set"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"atomic_read"}]},{"type":"text","text":" 這樣的操作爲硬件支持的原子指令提供封裝。硬件提供內存重排序,這是 Linux 屏障的機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具有高級別的同步像是自旋鎖的描述是這樣的,當兩個進程同時對資源進行訪問,在一個進程獲得資源後,另一個進程不想被阻塞,所以它就會自旋,等待一會兒再對資源進行訪問。Linux 也提供互斥量或信號量這樣的機制,也支持像是 "},{"type":"codeinline","content":[{"type":"text","text":"mutex_tryLock"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"mutex_tryWait"}]},{"type":"text","text":" 這樣的非阻塞調用。也支持中斷處理事務,也可以通過動態禁用和啓用相應的中斷來實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 啓動"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面來聊一聊 Linux 是如何啓動的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當計算機電源通電後,"},{"type":"codeinline","content":[{"type":"text","text":"BIOS"}]},{"type":"text","text":"會進行"},{"type":"codeinline","content":[{"type":"text","text":"開機自檢(Power-On-Self-Test, POST)"}]},{"type":"text","text":",對硬件進行檢測和初始化。因爲操作系統的啓動會使用到磁盤、屏幕、鍵盤、鼠標等設備。下一步,磁盤中的第一個分區,也被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"MBR(Master Boot Record)"}]},{"type":"text","text":" 主引導記錄,被讀入到一個固定的內存區域並執行。這個分區中有一個非常小的,只有 512 字節的程序。程序從磁盤中調入 boot 獨立程序,boot 程序將自身複製到高位地址的內存從而爲操作系統釋放低位地址的內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複製完成後,boot 程序讀取啓動設備的根目錄。boot 程序要理解文件系統和目錄格式。然後 boot 程序被調入內核,把控制權移交給內核。直到這裏,boot 完成了它的工作。系統內核開始運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內核啓動代碼是使用"},{"type":"codeinline","content":[{"type":"text","text":"彙編語言"}]},{"type":"text","text":"完成的,主要包括創建內核堆棧、識別 CPU 類型、計算內存、禁用中斷、啓動內存管理單元等,然後調用 C 語言的 main 函數執行操作系統部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這部分也會做很多事情,首先會分配一個消息緩衝區來存放調試出現的問題,調試信息會寫入緩衝區。如果調試出現錯誤,這些信息可以通過診斷程序調出來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後操作系統會進行自動配置,檢測設備,加載配置文件,被檢測設備如果做出響應,就會被添加到已鏈接的設備表中,如果沒有相應,就歸爲未連接直接忽略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配置完所有硬件後,接下來要做的就是仔細手工處理進程0,設置其堆棧,然後運行它,執行初始化、配置時鐘、掛載文件系統。創建 "},{"type":"codeinline","content":[{"type":"text","text":"init 進程(進程 1 )"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"守護進程(進程 2)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"init 進程會檢測它的標誌以確定它是否爲單用戶還是多用戶服務。在前一種情況中,它會調用 fork 函數創建一個 shell 進程,並且等待這個進程結束。後一種情況調用 fork 函數創建一個運行系統初始化的 shell 腳本(即 /etc/rc)的進程,這個進程可以進行文件系統一致性檢測、掛載文件系統、開啓守護進程等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後 /etc/rc 這個進程會從 /etc/ttys 中讀取數據,/etc/ttys 列出了所有的終端和屬性。對於每一個啓用的終端,這個進程調用 fork 函數創建一個自身的副本,進行內部處理並運行一個名爲 "},{"type":"codeinline","content":[{"type":"text","text":"getty"}]},{"type":"text","text":" 的程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"getty 程序會在終端上輸入 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"login:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等待用戶輸入用戶名,在輸入用戶名後,getty 程序結束,登陸程序 "},{"type":"codeinline","content":[{"type":"text","text":"/bin/login"}]},{"type":"text","text":" 開始運行。login 程序需要輸入密碼,並與保存在 "},{"type":"codeinline","content":[{"type":"text","text":"/etc/passwd"}]},{"type":"text","text":" 中的密碼進行對比,如果輸入正確,login 程序以用戶 shell 程序替換自身,等待第一個命令。如果不正確,login 程序要求輸入另一個用戶名。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個系統啓動過程如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e6a9e119d51380df7835f01520c7db70.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Linux 內存管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 內存管理模型非常直接明瞭,因爲 Linux 的這種機制使其具有可移植性並且能夠在內存管理單元相差不大的機器下實現 Linux,下面我們就來認識一下 Linux 內存管理是如何實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基本概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個 Linux 進程都會有地址空間,這些地址空間由三個段區域組成:"},{"type":"text","marks":[{"type":"strong"}],"text":"text 段、data 段、stack 段"},{"type":"text","text":"。下面是進程地址空間的示例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a6/a640a8683e038f11e5dbe53440f9ffc7.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"數據段(data segment)"}]},{"type":"text","text":" 包含了程序的變量、字符串、數組和其他數據的存儲。數據段分爲兩部分,已經初始化的數據和尚未初始化的數據。其中"},{"type":"codeinline","content":[{"type":"text","text":"尚未初始化的數據"}]},{"type":"text","text":"就是我們說的 BSS。數據段部分的初始化需要編譯就期確定的常量以及程序啓動就需要一個初始值的變量。所有 BSS 部分中的變量在加載後被初始化爲 0 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和 "},{"type":"codeinline","content":[{"type":"text","text":"代碼段(Text segment)"}]},{"type":"text","text":" 不一樣,data segment 數據段可以改變。程序總是修改它的變量。而且,許多程序需要在執行時動態分配空間。Linux 允許數據段隨着內存的分配和回收從而增大或者減小。爲了分配內存,程序可以增加數據段的大小。在 C 語言中有一套標準庫 "},{"type":"codeinline","content":[{"type":"text","text":"malloc"}]},{"type":"text","text":" 經常用於分配內存。進程地址空間描述符包含動態分配的內存區域稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"堆(heap)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分段是 "},{"type":"codeinline","content":[{"type":"text","text":"棧段(stack segment)"}]},{"type":"text","text":"。在大部分機器上,棧段會在虛擬內存地址頂部地址位置處,並向低位置處(向地址空間爲 0 處)拓展。舉個例子來說,在 32 位 x86 架構的機器上,棧開始於 "},{"type":"codeinline","content":[{"type":"text","text":"0xC0000000"}]},{"type":"text","text":",這是用戶模式下進程允許可見的 3GB 虛擬地址限制。如果棧一直增大到超過棧段後,就會發生硬件故障並把頁面下降一個頁面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當程序啓動時,棧區域並不是空的,相反,它會包含所有的 shell 環境變量以及爲了調用它而向 shell 輸入的命令行。舉個例子,當你輸入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"cp cxuan lx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時,cp 程序會運行並在棧中帶着字符串 "},{"type":"codeinline","content":[{"type":"text","text":"cp cxuan lx"}]},{"type":"text","text":" ,這樣就能夠找出源文件和目標文件的名稱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當兩個用戶運行在相同程序中,例如"},{"type":"codeinline","content":[{"type":"text","text":"編輯器(editor)"}]},{"type":"text","text":",那麼就會在內存中保持編輯器程序代碼的兩個副本,但是這種方式並不高效。Linux 系統支持"},{"type":"codeinline","content":[{"type":"text","text":"共享文本段作"}]},{"type":"text","text":"爲替代。下面圖中我們會看到 A 和 B 兩個進程,它們有着相同的文本區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/335308c57f3db79929cfad7353a6351c.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據段和棧段只有在 fork 之後纔會共享,共享也是共享未修改過的頁面。如果任何一個都需要變大但是沒有相鄰空間容納的話,也不會有問題,因爲相鄰的虛擬頁面不必映射到相鄰的物理頁面上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了動態分配更多的內存,Linux 中的進程可以通過"},{"type":"codeinline","content":[{"type":"text","text":"內存映射文件"}]},{"type":"text","text":"來訪問文件數據。這個特性可以使我們把一個文件映射到進程空間的一部分而該文件就可以像位於內存中的字節數組一樣被讀寫。把一個文件映射進來使得隨機讀寫比使用 read 和 write 之類的 I/O 系統調用要容易得多。共享庫的訪問就是使用了這種機制。如下所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a963cbc033419df2461ee4f6f904da10.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以看到兩個相同文件會被映射到相同的物理地址上,但是它們屬於不同的地址空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"映射文件的優點是,兩個或多個進程可以同時映射到同一文件中,任意一個進程對文件的寫操作對其他文件可見。通過使用映射臨時文件的方式,可以爲多線程共享內存"},{"type":"codeinline","content":[{"type":"text","text":"提供高帶寬"}]},{"type":"text","text":",臨時文件在進程退出後消失。但是實際上,並沒有兩個相同的地址空間,因爲每個進程維護的打開文件和信號不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 內存管理系統調用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們探討一下關於內存管理的系統調用方式。事實上,POSIX 並沒有給內存管理指定任何的系統調用。然而,Linux 卻有自己的內存系統調用,主要系統調用如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 系統調用 | 描述 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| --------------------------------------- | -------------- |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = brk(addr) | 改變數據段大小 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| a = mmap(addr,len,prot,flags,fd,offset) | 進行映射 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = unmap(addr,len) | 取消映射 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果遇到錯誤,那麼 s 的返回值是 -1,a 和 addr 是內存地址,len 表示的是長度,prot 表示的是控制保護位,flags 是其他標誌位,fd 是文件描述符,offset 是文件偏移量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"brk"}]},{"type":"text","text":" 通過給出超過數據段之外的第一個字節地址來指定數據段的大小。如果新的值要比原來的大,那麼數據區會變得越來越大,反之會越來越小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"mmap"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"unmap"}]},{"type":"text","text":" 系統調用會控制映射文件。mmp 的第一個參數 addr 決定了文件映射的地址。它必須是頁面大小的倍數。如果參數是 0,系統會分配地址並返回 a。第二個參數是長度,它告訴了需要映射多少字節。它也是頁面大小的倍數。prot 決定了映射文件的保護位,保護位可以標記爲 "},{"type":"text","marks":[{"type":"strong"}],"text":"可讀、可寫、可執行或者這些的結合"},{"type":"text","text":"。第四個參數 flags 能夠控制文件是私有的還是可讀的以及 addr 是必須的還是隻是進行提示。第五個參數 fd 是要映射的文件描述符。只有打開的文件是可以被映射的,因此如果想要進行文件映射,必須打開文件;最後一個參數 offset 會指示文件從什麼時候開始,並不一定每次都要從零開始。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 內存管理實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存管理系統是操作系統最重要的部分之一。從計算機早期開始,我們實際使用的內存都要比系統中實際存在的內存多。"},{"type":"codeinline","content":[{"type":"text","text":"內存分配策略"}]},{"type":"text","text":"克服了這一限制,並且其中最有名的就是 "},{"type":"codeinline","content":[{"type":"text","text":"虛擬內存(virtual memory)"}]},{"type":"text","text":"。通過在多個競爭的進程之間共享虛擬內存,虛擬內存得以讓系統有更多的內存。虛擬內存子系統主要包括下面這些概念。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"大地址空間"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"操作系統使系統使用起來好像比實際的物理內存要大很多,那是因爲虛擬內存要比物理內存大很多倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"保護"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統中的每個進程都會有自己的虛擬地址空間。這些虛擬地址空間彼此完全分開,因此運行一個應用程序的進程不會影響另一個。並且,硬件虛擬內存機制允許內存保護關鍵內存區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"內存映射"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存映射用來向進程地址空間映射圖像和數據文件。在內存映射中,文件的內容直接映射到進程的虛擬空間中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"公平的物理內存分配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存管理子系統允許系統中的每個正在運行的進程公平分配系統的物理內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"共享虛擬內存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管虛擬內存讓進程有自己的內存空間,但是有的時候你是需要共享內存的。例如幾個進程同時在 shell 中運行,這會涉及到 IPC 的進程間通信問題,這個時候你需要的是共享內存來進行信息傳遞而不是通過拷貝每個進程的副本獨立運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們就正式探討一下什麼是 "},{"type":"codeinline","content":[{"type":"text","text":"虛擬內存"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"虛擬內存的抽象模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在考慮 Linux 用於支持虛擬內存的方法之前,考慮一個不會被太多細節困擾的抽象模型是很有用的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處理器在執行指令時,會從內存中讀取指令並將其"},{"type":"codeinline","content":[{"type":"text","text":"解碼(decode)"}]},{"type":"text","text":",在指令解碼時會獲取某個位置的內容並將他存到內存中。然後處理器繼續執行下一條指令。這樣,處理器總是在訪問存儲器以獲取指令和存儲數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在虛擬內存系統中,所有的地址空間都是虛擬的而不是物理的。但是實際存儲和提取指令的是物理地址,所以需要讓處理器根據操作系統維護的一張表將虛擬地址轉換爲物理地址。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了簡單的完成轉換,虛擬地址和物理地址會被分爲固定大小的塊,稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"頁(page)"}]},{"type":"text","text":"。這些頁有相同大小,如果頁面大小不一樣的話,那麼操作系統將很難管理。Alpha AXP系統上的 Linux 使用 8 KB 頁面,而 Intel x86 系統上的 Linux 使用 4 KB 頁面。每個頁面都有一個唯一的編號,即"},{"type":"codeinline","content":[{"type":"text","text":"頁面框架號(PFN)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ad/adeb020c7a7656dd87f210e3677d1d08.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面就是 Linux 內存映射模型了,在這個頁模型中,虛擬地址由兩部分組成:"},{"type":"text","marks":[{"type":"strong"}],"text":"偏移量和虛擬頁框號"},{"type":"text","text":"。每次處理器遇到虛擬地址時都會提取偏移量和虛擬頁框號。處理器必須將虛擬頁框號轉換爲物理頁號,然後以正確的偏移量的位置訪問物理頁。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中展示了兩個進程 A 和 B 的虛擬地址空間,每個進程都有自己的頁表。這些頁表將進程中的虛擬頁映射到內存中的物理頁中。頁表中每一項均包含"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"有效標誌(valid flag)"}]},{"type":"text","text":": 表明此頁表條目是否有效"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該條目描述的物理頁框號"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訪問控制信息,頁面使用方式,是否可寫以及是否可以執行代碼"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要將處理器的虛擬地址映射爲內存的物理地址,首先需要計算虛擬地址的頁框號和偏移量。頁面大小爲 2 的次冪,可以通過移位完成操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果當前進程嘗試訪問虛擬地址,但是訪問不到的話,這種情況稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"缺頁異常"}]},{"type":"text","text":",此時虛擬操作系統的錯誤地址和頁面錯誤的原因將通知操作系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以這種方式將虛擬地址映射到物理地址,虛擬內存可以以任何順序映射到系統的物理頁面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"按需分頁"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於物理內存要比虛擬內存少很多,因此操作系統需要注意儘量避免直接使用"},{"type":"codeinline","content":[{"type":"text","text":"低效"}]},{"type":"text","text":"的物理內存。節省物理內存的一種方式是僅加載執行程序當前使用的頁面(這何嘗不是一種懶加載的思想呢?)。例如,可以運行數據庫來查詢數據庫,在這種情況下,不是所有的數據都裝入內存,只裝載需要檢查的數據。這種僅僅在需要時纔將虛擬頁面加載進內中的技術稱爲按需分頁。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"交換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果某個進程需要將虛擬頁面傳入內存,但是此時沒有可用的物理頁面,那麼操作系統必須丟棄物理內存中的另一個頁面來爲該頁面騰出空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果頁面已經修改過,那麼操作系統必須保留該頁面的內容,以便以後可以訪問它。這種類型的頁面被稱爲髒頁,當將其從內存中移除時,它會保存在稱爲"},{"type":"codeinline","content":[{"type":"text","text":"交換文件"}]},{"type":"text","text":"的特殊文件中。相對於處理器和物理內存的速度,對交換文件的訪問非常慢,並且操作系統需要兼顧將頁面寫到磁盤的以及將它們保留在內存中以便再次使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 使用"},{"type":"codeinline","content":[{"type":"text","text":"最近最少使用(LRU)"}]},{"type":"text","text":"頁面老化技術來公平的選擇可能會從系統中刪除的頁面,這個方案涉及系統中的每個頁面,頁面的年齡隨着訪問次數的變化而變化,如果某個頁面訪問次數多,那麼該頁就表示越 "},{"type":"codeinline","content":[{"type":"text","text":"年輕"}]},{"type":"text","text":",如果某個呃頁面訪問次數太少,那麼該頁越容易被"},{"type":"codeinline","content":[{"type":"text","text":"換出"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"物理和虛擬尋址模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大多數多功能處理器都支持 "},{"type":"codeinline","content":[{"type":"text","text":"物理地址"}]},{"type":"text","text":"模式和"},{"type":"codeinline","content":[{"type":"text","text":"虛擬地址"}]},{"type":"text","text":"模式的概念。物理尋址模式不需要頁表,並且處理器不會在此模式下嘗試執行任何地址轉換。 Linux 內核被鏈接在物理地址空間中運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Alpha AXP 處理器沒有物理尋址模式。相反,它將內存空間劃分爲幾個區域,並將其中兩個指定爲物理映射的地址。此內核地址空間稱爲 KSEG 地址空間,它包含從 0xfffffc0000000000 向上的所有地址。爲了從 KSEG 中鏈接的代碼(按照定義,內核代碼)執行或訪問其中的數據,該代碼必須在內核模式下執行。鏈接到 Alpha 上的 Linux內核以從地址 0xfffffc0000310000 執行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"訪問控制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁面表的每一項還包含訪問控制信息,訪問控制信息主要檢查進程是否應該訪問內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"必要時需要對內存進行"},{"type":"codeinline","content":[{"type":"text","text":"訪問限制"}]},{"type":"text","text":"。 例如包含可執行代碼的內存,自然是隻讀內存; 操作系統不應允許進程通過其可執行代碼寫入數據。 相比之下,包含數據的頁面可以被寫入,但是嘗試執行該內存的指令將失敗。 大多數處理器至少具有兩種執行模式:內核態和用戶態。 你不希望訪問用戶執行內核代碼或內核數據結構,除非處理器以內核模式運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/63e2d3bc2feb24ccb78aff7e71bfb1cd.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訪問控制信息被保存在上面的 Page Table Entry ,頁表項中,上面這幅圖是 Alpha AXP的 PTE。位字段具有以下含義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"V"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示 valid ,是否有效位"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FOR"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取時故障,在嘗試讀取此頁面時出現故障"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FOW"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"寫入時錯誤,在嘗試寫入時發生錯誤"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FOE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行時發生錯誤,在嘗試執行此頁面中的指令時,處理器都會報告頁面錯誤並將控制權傳遞給操作系統,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ASM"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"地址空間匹配,當操作系統希望清除轉換緩衝區中的某些條目時,將使用此選項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GH"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當在使用"},{"type":"codeinline","content":[{"type":"text","text":"單個轉換緩衝區"}]},{"type":"text","text":"條目而不是"},{"type":"codeinline","content":[{"type":"text","text":"多個轉換緩衝區"}]},{"type":"text","text":"條目映射整個塊時使用的提示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KRE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內核模式運行下的代碼可以讀取頁面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"URE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶模式下的代碼可以讀取頁面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KWE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以內核模式運行的代碼可以寫入頁面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"UWE"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以用戶模式運行的代碼可以寫入頁面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁框號"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於設置了 V 位的 PTE,此字段包含此 PTE 的物理頁面幀號(頁面幀號)。對於無效的 PTE,如果此字段不爲零,則包含有關頁面在交換文件中的位置的信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外,Linux 還使用了兩個位"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"PAGE"},{"type":"text","text":"DIRTY"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果已設置,則需要將頁面寫出到交換文件中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"PAGE"},{"type":"text","text":"ACCESSED"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 用來將頁面標記爲已訪問。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的虛擬內存抽象模型可以用來實施,但是效率不會太高。操作系統和處理器設計人員都嘗試提高性能。 但是除了提高處理器,內存等的速度之外,最好的方法就是維護有用信息和數據的高速緩存,從而使某些操作更快。在 Linux 中,使用很多和內存管理有關的緩衝區,使用緩衝區來提高效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"緩衝區緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"緩衝區高速緩存包含"},{"type":"codeinline","content":[{"type":"text","text":"塊設備"}]},{"type":"text","text":"驅動程序使用的數據緩衝區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還記得什麼是塊設備麼?這裏回顧下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"塊設備是一個能存儲"},{"type":"codeinline","content":[{"type":"text","text":"固定大小塊"}]},{"type":"text","text":"信息的設備,它支持"},{"type":"text","marks":[{"type":"strong"}],"text":"以固定大小的塊,扇區或羣集讀取和(可選)寫入數據"},{"type":"text","text":"。每個塊都有自己的"},{"type":"codeinline","content":[{"type":"text","text":"物理地址"}]},{"type":"text","text":"。通常塊的大小在 512 - 65536 之間。所有傳輸的信息都會以"},{"type":"codeinline","content":[{"type":"text","text":"連續"}]},{"type":"text","text":"的塊爲單位。塊設備的基本特徵是每個塊都較爲對立,能夠獨立的進行讀寫。常見的塊設備有 "},{"type":"text","marks":[{"type":"strong"}],"text":"硬盤、藍光光盤、USB 盤"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與字符設備相比,塊設備通常需要較少的引腳。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6f/6f82a3a0015992147a499f6255dfaacc.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"緩衝區高速緩存通過"},{"type":"codeinline","content":[{"type":"text","text":"設備標識符"}]},{"type":"text","text":"和塊編號用於快速查找數據塊。 如果可以在緩衝區高速緩存中找到數據,則無需從物理塊設備中讀取數據,這種訪問方式要快得多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"頁緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁緩存用於加快對磁盤上圖像和數據的訪問"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它用於一次一頁地緩存文件中的內容,並且可以通過文件和文件中的偏移量進行訪問。當頁面從磁盤讀入內存時,它們被緩存在頁面緩存中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"交換區緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"僅僅已修改(髒頁)被保存在交換文件中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只要這些頁面在寫入交換文件後沒有修改,則下次交換該頁面時,無需將其寫入交換文件,因爲該頁面已在交換文件中。 可以直接丟棄。 在大量交換的系統中,這節省了許多不必要的和昂貴的磁盤操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"硬件緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處理器中通常使用一種硬件緩存。頁表條目的緩存。在這種情況下,處理器並不總是直接讀取頁表,而是根據需要緩存頁的翻譯。 這些是"},{"type":"codeinline","content":[{"type":"text","text":"轉換後備緩衝區"}]},{"type":"text","text":" 也被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"TLB"}]},{"type":"text","text":",包含來自系統中一個或多個進程的頁表項的緩存副本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引用虛擬地址後,處理器將嘗試查找匹配的 TLB 條目。 如果找到,則可以將虛擬地址直接轉換爲物理地址,並對數據執行正確的操作。 如果處理器找不到匹配的 TLB 條目, 它通過向操作系統發信號通知已發生 TLB 丟失獲得操作系統的支持和幫助。系統特定的機制用於將該異常傳遞給可以修復問題的操作系統代碼。 操作系統爲地址映射生成一個新的 TLB 條目。 清除異常後,處理器將再次嘗試轉換虛擬地址。這次能夠執行成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用緩存也存在缺點,爲了節省精力,Linux 必須使用更多的時間和空間來維護這些緩存,並且如果緩存損壞,系統將會崩潰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 頁表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 假定頁表分爲三個級別。訪問的每個頁表都包含下一級頁表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6cedaac8a5789ed6fc15dd52c4a234d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中的 PDG 表示全局頁表,當創建一個新的進程時,都要爲新進程創建一個新的頁面目錄,即 PGD。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要將虛擬地址轉換爲物理地址,處理器必須獲取每個級別字段的內容,將其轉換爲包含頁表的物理頁的偏移量,並讀取下一級頁表的頁框號。這樣重複三次,直到找到包含虛擬地址的物理頁面的頁框號爲止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 運行的每個平臺都必須提供翻譯宏,這些宏允許內核遍歷特定進程的頁表。這樣,內核無需知道頁表條目的格式或它們的排列方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"頁分配和取消分配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對系統中物理頁面有很多需求。例如,當圖像加載到內存中時,操作系統需要分配頁面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統中所有物理頁面均由 "},{"type":"codeinline","content":[{"type":"text","text":"mem_map"}]},{"type":"text","text":" 數據結構描述,這個數據結構是 "},{"type":"codeinline","content":[{"type":"text","text":"mem_map_t"}]},{"type":"text","text":" 的列表。它包括一些重要的屬性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"count :這是頁面的用戶數計數,當頁面在多個進程之間共享時,計數大於 1 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"age:這是描述頁面的年齡,用於確定頁面是否適合丟棄或交換"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"map"},{"type":"text","marks":[{"type":"italic"}],"text":"nr :這是此mem"},{"type":"text","text":"map_t描述的物理頁框號。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁面分配代碼使用 "},{"type":"codeinline","content":[{"type":"text","text":"free_area"}]},{"type":"text","text":"向量查找和釋放頁面,free_area 的每個元素都包含有關頁面塊的信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"頁面分配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 的頁面分配使用一種著名的夥伴算法來進行頁面的分配和取消分配。頁面以 2 的冪爲單位進行塊分配。這就意味着它可以分配 1頁、2 頁、4頁等等,只要系統中有足夠可用的頁面來滿足需求就可以。判斷的標準是"},{"type":"text","marks":[{"type":"strong"}],"text":"nr_free_pages> min_free_pages"},{"type":"text","text":",如果滿足,就會在 free"},{"type":"text","marks":[{"type":"italic"}],"text":"area 中搜索所需大小的頁面塊完成分配。free"},{"type":"text","text":"area 的每個元素都有該大小的塊的已分配頁面和空閒頁面塊的映射。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分配算法會搜索請求大小的頁面塊。如果沒有任何請求大小的頁面塊可用的話,會搜尋一個是請求大小二倍的頁面塊,然後重複,直到一直搜尋完 free_area 找到一個頁面塊爲止。如果找到的頁面塊要比請求的頁面塊大,就會對找到的頁面塊進行細分,直到找到合適的大小塊爲止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲每個塊都是 2 的次冪,所以拆分過程很容易,因爲你只需將塊分成兩半即可。空閒塊在適當的隊列中排隊,分配的頁面塊返回給調用者。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/13/135ca55e4d606736e95145ca549f129c.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果請求一個 2 個頁的塊,則 4 頁的第一個塊(從第 4 頁的框架開始)將被分成兩個 2 頁的塊。第一個頁面(從第 4 頁的幀開始)將作爲分配的頁面返回給調用方,第二個塊(從第 6 頁的頁面開始)將作爲 2 頁的空閒塊排隊到 free_area 數組的元素 1 上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"頁面取消分配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的這種內存方式最造成一種後果,那就是內存的碎片化,會將較大的空閒頁面分成較小的頁面。頁面解除分配代碼會盡可能將頁面重新組合成爲更大的空閒塊。每釋放一個頁面,都會檢查相同大小的相鄰的塊,以查看是否空閒。如果是,則將其與新釋放的頁面塊組合以形成下一個頁面大小塊的新的自由頁面塊。 每次將兩個頁面塊重新組合爲更大的空閒頁面塊時,頁面釋放代碼就會嘗試將該頁面塊重新組合爲更大的空閒頁面。 通過這種方式,可用頁面的塊將盡可能多地使用內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如上圖,如果要釋放第 1 頁的頁面,則將其與已經空閒的第 0 頁頁面框架組合在一起,並作爲大小爲 2頁的空閒塊排隊到 free_area 的元素 1 中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"內存映射"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內核有兩種類型的內存映射:"},{"type":"codeinline","content":[{"type":"text","text":"共享型(shared)"}]},{"type":"text","text":" 和"},{"type":"codeinline","content":[{"type":"text","text":"私有型(private)"}]},{"type":"text","text":"。私有型是當進程爲了只讀文件,而不寫文件時使用,這時,私有映射更加高效。 但是,任何對私有映射頁的寫操作都會導致內核停止映射該文件中的頁。所以,寫操作既不會改變磁盤上的文件,對訪問該文件的其它進程也是不可見的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"按需分頁"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一旦可執行映像被內存映射到虛擬內存後,它就可以被執行了。因爲只將映像的開頭部分物理的拉入到內存中,因此它將很快訪問物理內存尚未存在的虛擬內存區域。當進程訪問沒有有效頁表的虛擬地址時,操作系統會報告這項錯誤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁面錯誤描述頁面出錯的虛擬地址和引起的內存訪問(RAM)類型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 必須找到代表發生頁面錯誤的內存區域的 vm"},{"type":"text","marks":[{"type":"italic"}],"text":"area"},{"type":"text","text":"struct 結構。由於搜索 vm"},{"type":"text","marks":[{"type":"italic"}],"text":"area"},{"type":"text","text":"struct 數據結構對於有效處理頁面錯誤至關重要,因此它們以 "},{"type":"codeinline","content":[{"type":"text","text":"AVL(Adelson-Velskii和Landis)"}]},{"type":"text","text":"樹結構鏈接在一起。如果引起故障的虛擬地址沒有 "},{"type":"codeinline","content":[{"type":"text","text":"vm_area_struct"}]},{"type":"text","text":" 結構,則此進程已經訪問了非法地址,Linux 會向進程發出 "},{"type":"codeinline","content":[{"type":"text","text":"SIGSEGV"}]},{"type":"text","text":" 信號,如果進程沒有用於該信號的處理程序,那麼進程將會終止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,Linux 會針對此虛擬內存區域所允許的訪問類型,檢查發生的頁面錯誤類型。 如果該進程以非法方式訪問內存,例如寫入僅允許讀的區域,則還會發出內存訪問錯誤信號。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,Linux 已確定頁面錯誤是合法的,因此必須對其進行處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 中,最直觀、最可見的部分就是 "},{"type":"codeinline","content":[{"type":"text","text":"文件系統(file system)"}]},{"type":"text","text":"。下面我們就來一起探討一下關於 Linux 中國的文件系統,系統調用以及文件系統實現背後的原理和思想。這些思想中有一些來源於 MULTICS,現在已經被 Windows 等其他操作系統使用。Linux 的設計理念就是 "},{"type":"codeinline","content":[{"type":"text","text":"小的就是好的(Small is Beautiful)"}]},{"type":"text","text":" 。雖然 Linux 只是使用了最簡單的機制和少量的系統調用,但是 Linux 卻提供了強大而優雅的文件系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 文件系統基本概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 在最初的設計是 MINIX1 文件系統,它只支持 14 字節的文件名,它的最大文件只支持到 64 MB。在 MINIX 1 之後的文件系統是 ext 文件系統。ext 系統相較於 MINIX 1 來說,在支持字節大小和文件大小上均有很大提升,但是 ext 的速度仍沒有 MINIX 1 快,於是,ext 2 被開發出來,它能夠支持長文件名和大文件,而且具有比 MINIX 1 更好的性能。這使他成爲 Linux 的主要文件系統。只不過 Linux 會使用 "},{"type":"codeinline","content":[{"type":"text","text":"VFS"}]},{"type":"text","text":" 曾支持多種文件系統。在 Linux 鏈接時,用戶可以動態的將不同的文件系統掛載倒 VFS 上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中的文件是一個任意長度的字節序列,Linux 中的文件可以包含任意信息,比如 ASCII 碼、二進制文件和其他類型的文件是不加區分的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了方便起見,文件可以被組織在一個目錄中,目錄存儲成文件的形式在很大程度上可以作爲文件處理。目錄可以有子目錄,這樣形成有層次的文件系統,Linux 系統下面的根目錄是 "},{"type":"codeinline","content":[{"type":"text","text":"/"}]},{"type":"text","text":" ,它通常包含了多個子目錄。字符 "},{"type":"codeinline","content":[{"type":"text","text":"/"}]},{"type":"text","text":" 還用於對目錄名進行區分,例如 "},{"type":"text","marks":[{"type":"strong"}],"text":"/usr/cxuan"},{"type":"text","text":" 表示的就是根目錄下面的 usr 目錄,其中有一個叫做 cxuan 的子目錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們介紹一下 Linux 系統根目錄下面的目錄名"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/bin"}]},{"type":"text","text":",它是重要的二進制應用程序,包含二進制文件,系統的所有用戶使用的命令都在這裏"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/boot"}]},{"type":"text","text":",啓動包含引導加載程序的相關文件"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/dev"}]},{"type":"text","text":",包含設備文件,終端文件,USB 或者連接到系統的任何設備"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/etc"}]},{"type":"text","text":",配置文件,啓動腳本等,包含所有程序所需要的配置文件,也包含了啓動/停止單個應用程序的啓動和關閉 shell 腳本"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/home"}]},{"type":"text","text":",本地主要路徑,所有用戶用 home 目錄存儲個人信息"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/lib"}]},{"type":"text","text":",系統庫文件,包含支持位於 /bin 和 /sbin 下的二進制庫文件"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/lost+found"}]},{"type":"text","text":",在根目錄下提供一個遺失+查找系統,必須在 root 用戶下才能查看當前目錄下的內容"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/media"}]},{"type":"text","text":",掛載可移動介質"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/mnt"}]},{"type":"text","text":",掛載文件系統"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/opt"}]},{"type":"text","text":",提供一個可選的應用程序安裝目錄"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/proc"}]},{"type":"text","text":",特殊的動態目錄,用於維護系統信息和狀態,包括當前運行中進程信息"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/root"}]},{"type":"text","text":",root 用戶的主要目錄文件夾"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/sbin"}]},{"type":"text","text":",重要的二進制系統文件"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/tmp"}]},{"type":"text","text":", 系統和用戶創建的臨時文件,系統重啓時,這個目錄下的文件都會被刪除"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/usr"}]},{"type":"text","text":",包含絕大多數用戶都能訪問的應用程序和文件"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"/var"}]},{"type":"text","text":",經常變化的文件,諸如日誌文件或數據庫等"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 中,有兩種路徑,一種是 "},{"type":"codeinline","content":[{"type":"text","text":"絕對路徑(absolute path)"}]},{"type":"text","text":" ,絕對路徑告訴你從根目錄下查找文件,絕對路徑的缺點是太長而且不太方便。還有一種是 "},{"type":"codeinline","content":[{"type":"text","text":"相對路徑(relative path)"}]},{"type":"text","text":" ,相對路徑所在的目錄也叫做"},{"type":"codeinline","content":[{"type":"text","text":"工作目錄(working directory)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 "},{"type":"codeinline","content":[{"type":"text","text":"/usr/local/books"}]},{"type":"text","text":" 是工作目錄,那麼 shell 命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"cp books books-replica "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就表示的是相對路徑,而"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"cp /usr/local/books/books /usr/local/books/books-replica"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"則表示的是絕對路徑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 中經常出現一個用戶使用另一個用戶的文件或者使用文件樹結構中的文件。兩個用戶共享同一個文件,這個文件位於某個用戶的目錄結構中,另一個用戶需要使用這個文件時,必須通過絕對路徑才能引用到他。如果絕對路徑很長,那麼每次輸入起來會變的非常麻煩,所以 Linux 提供了一種 "},{"type":"codeinline","content":[{"type":"text","text":"鏈接(link)"}]},{"type":"text","text":" 機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子,下面是一個使用鏈接之前的圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c2/c29e86aab0b4deb7a19591aeae522e5d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上所示,比如有兩個工作賬戶 jianshe 和 cxuan,jianshe 想要使用 cxuan 賬戶下的 A 目錄,那麼它可能會輸入 "},{"type":"codeinline","content":[{"type":"text","text":"/usr/cxuan/A"}]},{"type":"text","text":" ,這是一種未使用鏈接之後的圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用鏈接後的示意如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f2/f24a7e139f385095bb9fe684550bd694.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,jianshe 可以創建一個鏈接來使用 cxuan 下面的目錄了。‘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當一個目錄被創建出來後,有兩個目錄項也同時被創建出來,它們就是 "},{"type":"codeinline","content":[{"type":"text","text":"."}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":".."}]},{"type":"text","text":" ,前者代表工作目錄自身,後者代表該目錄的父目錄,也就是該目錄所在的目錄。這樣一來,在 /usr/jianshe 中訪問 cxuan 中的目錄就是 "},{"type":"codeinline","content":[{"type":"text","text":"../cxuan/xxx"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 文件系統不區分磁盤的,這是什麼意思呢?一般來說,一個磁盤中的文件系統相互之間保持獨立,如果一個文件系統目錄想要訪問另一個磁盤中的文件系統,在 Windows 中你可以像下面這樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec73ec7985a334055cd05f109a4fa8e8.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩個文件系統分別在不同的磁盤中,彼此保持獨立。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在 Linux 中,是支持"},{"type":"codeinline","content":[{"type":"text","text":"掛載"}]},{"type":"text","text":"的,它允許一個磁盤掛在到另外一個磁盤上,那麼上面的關係會變成下面這樣"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b095ac5b4cd6144fe4df149331e3af06.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"掛在之後,兩個文件系統就不再需要關心文件系統在哪個磁盤上了,兩個文件系統彼此可見。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 文件系統的另外一個特性是支持 "},{"type":"codeinline","content":[{"type":"text","text":"加鎖(locking)"}]},{"type":"text","text":"。在一些應用中會出現兩個或者更多的進程同時使用同一個文件的情況,這樣很可能會導致"},{"type":"codeinline","content":[{"type":"text","text":"競爭條件(race condition)"}]},{"type":"text","text":"。一種解決方法是對其進行加不同粒度的鎖,就是爲了防止某一個進程只修改某一行記錄從而導致整個文件都不能使用的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"POSIX 提供了一種靈活的、不同粒度級別的鎖機制,允許一個進程使用一個不可分割的操作對一個字節或者整個文件進行加鎖。加鎖機制要求嘗試加鎖的進程指定其 "},{"type":"text","marks":[{"type":"strong"}],"text":"要加鎖的文件,開始位置以及要加鎖的字節"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統提供了兩種鎖:"},{"type":"text","marks":[{"type":"strong"}],"text":"共享鎖和互斥鎖"},{"type":"text","text":"。如果文件的一部分已經加上了共享鎖,那麼再加排他鎖是不會成功的;如果文件系統的一部分已經被加了互斥鎖,那麼在互斥鎖解除之前的任何加鎖都不會成功。爲了成功加鎖、請求加鎖的部分的所有字節都必須是可用的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在加鎖階段,進程需要設計好加鎖失敗後的情況,也就是判斷加鎖失敗後是否選擇阻塞,如果選擇阻塞式,那麼當已經加鎖的進程中的鎖被刪除時,這個進程會解除阻塞並替換鎖。如果進程選擇非阻塞式的,那麼就不會替換這個鎖,會立刻從系統調用中返回,標記狀態碼錶示是否加鎖成功,然後進程會選擇下一個時間再次嘗試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"加鎖區域是可以重疊的。下面我們演示了三種不同條件的加鎖區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/509110c6e881abb71967f024b867035c.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,A 的共享鎖在第四字節到第八字節進行加鎖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/32/320ccece503dfcfc7ef6fb7a9890df4b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,進程在 A 和 B 上同時加了共享鎖,其中 6 - 8 字節是重疊鎖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a7/a7d55047b2c029057cd9f4b598ef4fc5.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,進程 A 和 B 和 C 同時加了共享鎖,那麼第六字節和第七字節是共享鎖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果此時一個進程嘗試在第 6 個字節處加鎖,此時會設置失敗並阻塞,由於該區域被 A B C 同時加鎖,那麼只有等到 A B C 都釋放鎖後,進程才能加鎖成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 文件系統調用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"許多系統調用都會和文件與文件系統有關。我們首先先看一下對單個文件的系統調用,然後再來看一下對整個目錄和文件的系統調用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了創建一個新的文件,會使用到 "},{"type":"codeinline","content":[{"type":"text","text":"creat"}]},{"type":"text","text":" 方法,注意沒有 "},{"type":"codeinline","content":[{"type":"text","text":"e"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">這裏說一個小插曲,曾經有人問 UNIX 創始人 Ken Thompson,如果有機會重新寫 UNIX ,你會怎麼辦,他回答自己要把 creat 改成 create ,哈哈哈哈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個系統調用的兩個參數是文件名和保護模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"fd = creat(\"aaa\",mode);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這段命令會創建一個名爲 aaa 的文件,並根據 mode 設置文件的保護位。這些位決定了哪個用戶可能訪問文件、如何訪問。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"creat 系統調用不僅僅創建了一個名爲 aaa 的文件,還會打開這個文件。爲了允許後續的系統調用訪問這個文件,這個 creat 系統調用會返回一個 "},{"type":"codeinline","content":[{"type":"text","text":"非負整數"}]},{"type":"text","text":", 這個就叫做 "},{"type":"codeinline","content":[{"type":"text","text":"文件描述符(file descriptor)"}]},{"type":"text","text":",也就是上面的 fd。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果在已經存在的文件上調用了 creat 系統調用,那麼該文件中的內容會被清除,從 0 開始。通過設置合適的參數,"},{"type":"codeinline","content":[{"type":"text","text":"open"}]},{"type":"text","text":" 系統調用也能夠創建文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面讓我們看一看主要的系統調用,如下表所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 系統調用 | 描述 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ------------------------------------ | ------------------------ |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| fd = creat(name,mode) | 一種創建一個新文件的方式 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| fd = open(file, ...) | 打開文件讀、寫或者讀寫 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = close(fd) | 關閉一個打開的文件 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| n = read(fd, buffer, nbytes) | 從文件中向緩存中讀入數據 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| n = write(fd, buffer, nbytes) | 從緩存中向文件中寫入數據 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| position = lseek(fd, offset, whence) | 移動文件指針 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = stat(name, &buf) | 獲取文件信息 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = fstat(fd, &buf) | 獲取文件信息 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = pipe(&fd[0]) | 創建一個管道 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| s = fcntl(fd,...) | 文件加鎖等其他操作 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了對一個文件進行讀寫的前提是先需要打開文件,必須使用 creat 或者 open 打開,參數是打開文件的方式,是隻讀、可讀寫還是隻寫。open 系統調用也會返回文件描述符。打開文件後,需要使用 "},{"type":"codeinline","content":[{"type":"text","text":"close"}]},{"type":"text","text":" 系統調用進行關閉。close 和 open 返回的 fd 總是未被使用的最小數量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">什麼是文件描述符?文件描述符就是一個數字,這個數字標示了計算機操作系統中打開的文件。它描述了數據資源,以及訪問資源的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當程序要求打開一個文件時,內核會進行如下操作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"授予訪問權限"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"全局文件表(global file table)"}]},{"type":"text","text":"中創建一個"},{"type":"codeinline","content":[{"type":"text","text":"條目(entry)"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向軟件提供條目的位置"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文件描述符由唯一的非負整數組成,系統上每個打開的文件至少存在一個文件描述符。文件描述符最初在 Unix 中使用,並且被包括 Linux,macOS 和 BSD 在內的現代操作系統所使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當一個進程成功訪問一個打開的文件時,內核會返回一個文件描述符,這個文件描述符指向全局文件表的 entry 項。這個文件表項包含文件的 inode 信息,字節位移,訪問限制等。例如下圖所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8a/8a1177910594a66defc6b053f27c5e5f.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認情況下,前三個文件描述符爲 "},{"type":"codeinline","content":[{"type":"text","text":"STDIN(標準輸入)"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"STDOUT(標準輸出)"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"STDERR(標準錯誤)"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準輸入的文件描述符是 0 ,在終端中,默認爲用戶的鍵盤輸入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準輸出的文件描述符是 1 ,在終端中,默認爲用戶的屏幕"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與錯誤有關的默認數據流是 2,在終端中,默認爲用戶的屏幕。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在簡單聊了一下文件描述符後,我們繼續回到文件系統調用的探討。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在文件系統調用中,開銷最大的就是 read 和 write 了。read 和 write 都有三個參數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"文件描述符"}]},{"type":"text","text":":告訴需要對哪一個打開文件進行讀取和寫入"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"緩衝區地址"}]},{"type":"text","text":":告訴數據需要從哪裏讀取和寫入哪裏"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"統計"}]},{"type":"text","text":":告訴需要傳輸多少字節"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是所有的參數了,這個設計非常簡單輕巧。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然幾乎所有程序都按順序讀取和寫入文件,但是某些程序需要能夠隨機訪問文件的任何部分。與每個文件相關聯的是一個指針,該指針指示文件中的當前位置。順序讀取(或寫入)時,它通常指向要讀取(寫入)的下一個字節。如果指針在讀取 1024 個字節之前位於 4096 的位置,則它將在成功讀取系統調用後自動移至 5120 的位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Lseek"}]},{"type":"text","text":" 系統調用會更改指針位置的值,以便後續對 read 或 write 的調用可以在文件中的任何位置開始,甚至可以超出文件末尾。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">lseek = Lseek ,段首大寫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"lseek 避免叫做 seek 的原因就是 seek 已經在之前 16 位的計算機上用於搜素功能了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Lseek"}]},{"type":"text","text":" 有三個參數:第一個是文件的文件描述符,第二個是文件的位置;第三個告訴文件位置是相對於文件的開頭,當前位置還是文件的結尾"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"c"},"content":[{"type":"text","text":"lseek(int fildes, off_t offset, int whence);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"lseek 的返回值是更改文件指針後文件中的絕對位置。lseek 是唯一從來不會造成真正磁盤查找的系統調用,它只是更新當前的文件位置,這個文件位置就是內存中的數字。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每個文件,Linux 都會跟蹤文件模式(常規,目錄,特殊文件),大小,最後修改時間以及其他信息。程序能夠通過 "},{"type":"codeinline","content":[{"type":"text","text":"stat"}]},{"type":"text","text":" 系統調用看到這些信息。第一個參數就是文件名,第二個是指向要放置請求信息結構的指針。這些結構的屬性如下圖所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 存儲文件的設備 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ------------------------ |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 存儲文件的設備 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| i-node 編號 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 文件模式(包括保護位信息) |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 文件鏈接的數量 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 文件所有者標識 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 文件所屬的組 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 文件大小(字節) |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 創建時間 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 最後一個修改/訪問時間 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"fstat"}]},{"type":"text","text":" 調用和 "},{"type":"codeinline","content":[{"type":"text","text":"stat"}]},{"type":"text","text":" 相同,只有一點區別,fstat 可以對打開文件進行操作,而 stat 只能對路徑進行操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"pipe"}]},{"type":"text","text":" 文件系統調用被用來創建 shell 管道。它會創建一系列的"},{"type":"codeinline","content":[{"type":"text","text":"僞文件"}]},{"type":"text","text":",來緩衝和管道組件之間的數據,並且返回讀取或者寫入緩衝區的文件描述符。在管道中,像是如下操作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sort 在 Linux 中,目錄和設備也表示爲文件,因爲它們具有對應的 i-node"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"超級塊和索引塊所在的文件系統都在磁盤上有對應的結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了便於某些目錄操作和路徑遍歷,比如 /usr/local/cxuan,VFS 支持一個 "},{"type":"codeinline","content":[{"type":"text","text":"dentry"}]},{"type":"text","text":" 數據結構,該數據結構代表着目錄項。這個 dentry 數據結構有很多東西(http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch12lev1sec7.html)這個數據結構由文件系統動態創建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目錄項被緩存在 "},{"type":"codeinline","content":[{"type":"text","text":"dentry_cache"}]},{"type":"text","text":" 緩存中。例如,緩存條目會緩存 /usr 、 /usr/local 等條目。如果多個進程通過硬連接訪問相同的文件,他們的文件對象將指向此緩存中的相同條目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,文件數據結構是代表着打開的文件,也代表着內存表示,它根據 open 系統調用創建。它支持 "},{"type":"text","marks":[{"type":"strong"}],"text":"read、write、sendfile、lock"},{"type":"text","text":" 和其他在我們之前描述的系統調用中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 VFS 下實現的實際文件系統不需要在內部使用完全相同的抽象和操作。 但是,它們必須在語義上實現與 VFS 對象指定的文件系統操作相同的文件系統操作。 四個 VFS 對象中每個對象的操作數據結構的元素都是指向基礎文件系統中功能的指針。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Linux Ext2 文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們一起看一下 Linux 中最流行的一個磁盤文件系統,那就是 "},{"type":"codeinline","content":[{"type":"text","text":"ext2"}]},{"type":"text","text":" 。Linux 的第一個版本用於 "},{"type":"codeinline","content":[{"type":"text","text":"MINIX1"}]},{"type":"text","text":" 文件系統,它的文件名大小被限制爲最大 64 MB。MINIX 1 文件系統被永遠的被它的擴展系統 ext 取代,因爲 ext 允許更長的文件名和文件大小。由於 ext 的性能低下,ext 被其替代者 ext2 取代,ext2 目前仍在廣泛使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個 ext2 Linux 磁盤分區包含了一個文件系統,這個文件系統的佈局如下所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f6/f6de51d62897083af706566d2cb7449b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Boot 塊也就是第 0 塊不是讓 Linux 使用的,而是用來加載和引導計算機啓動代碼的。在塊 0 之後,磁盤分區被分成多個組,這些組與磁盤柱面邊界所處的位置無關。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個塊是 "},{"type":"codeinline","content":[{"type":"text","text":"超級塊(superblock)"}]},{"type":"text","text":"。它包含有關文件系統佈局的信息,包括 i-node、磁盤塊數量和以及空閒磁盤塊列表的開始。下一個是 "},{"type":"codeinline","content":[{"type":"text","text":"組描述符(group descriptor)"}]},{"type":"text","text":",其中包含有關位圖的位置,組中空閒塊和 i-node 的數量以及組中的目錄數量的信息。這些信息很重要,因爲 ext2 會在磁盤上均勻分佈目錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中的兩個位圖用來記錄空閒塊和空閒 i-node,這是從 MINIX 1文件系統繼承的選擇,大多數 UNIX 文件系統使用位圖而不是空閒列表。每個位圖的大小是一個塊。如果一個塊的大小是 1 KB,那麼就限制了塊組的數量是 8192 個塊和 8192 個 i-node。塊的大小是一個嚴格的限制,塊組的數量不固定,在 4KB 的塊中,塊組的數量增大四倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在超級塊之後分佈的是 "},{"type":"codeinline","content":[{"type":"text","text":"i-node"}]},{"type":"text","text":" 它們自己,i-node 取值範圍是 1 - 某些最大值。每個 i-node 是 128 字節的 "},{"type":"codeinline","content":[{"type":"text","text":"long"}]},{"type":"text","text":" ,這些字節恰好能夠描述一個文件。i-node 包含了統計信息(包含了 "},{"type":"codeinline","content":[{"type":"text","text":"stat"}]},{"type":"text","text":" 系統調用能獲得的所有者信息,實際上 stat 就是從 i-node 中讀取信息的),以及足夠的信息來查找保存文件數據的所有磁盤塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 i-node 之後的是 "},{"type":"codeinline","content":[{"type":"text","text":"數據塊(data blocks)"}]},{"type":"text","text":"。所有的文件和目錄都保存在這。如果一個文件或者目錄包含多個塊,那麼這些塊在磁盤中的分佈不一定是連續的,也有可能不連續。事實上,大文件塊可能會被拆分成很多小塊散佈在整個磁盤上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對應於目錄的 i-node 分散在整個磁盤組上。如果有足夠的空間,ext2 會把普通文件組織到與父目錄相同的塊組中,而把同一塊上的數據文件組織成初始 "},{"type":"codeinline","content":[{"type":"text","text":"i-node"}]},{"type":"text","text":" 節點。位圖用來快速確定新文件系統數據的分配位置。在分配新的文件塊時,ext2 也會給該文件預分配許多額外的數據塊,這樣可以減少將來向文件寫入數據時產生的文件碎片。這種策略在整個磁盤上實現了文件系統的 "},{"type":"codeinline","content":[{"type":"text","text":"負載"}]},{"type":"text","text":",後續還有對文件碎片的排列和整理,而且性能也比較好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了達到訪問的目的,需要首先使用 Linux 系統調用,例如 "},{"type":"codeinline","content":[{"type":"text","text":"open"}]},{"type":"text","text":",這個系統調用會確定打開文件的路徑。路徑分爲兩種,"},{"type":"codeinline","content":[{"type":"text","text":"相對路徑"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"絕對路徑"}]},{"type":"text","text":"。如果使用相對路徑,那麼就會從當前目錄開始查找,否則就會從根目錄進行查找。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目錄文件的文件名最高不能超過 255 個字符,它的分配如下圖所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0b3478702ee6e5e00590e4e071e57399.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一個目錄都由整數個磁盤塊組成,這樣目錄就可以整體的寫入磁盤。在一個目錄中,文件和子目錄的目錄項都是未經排序的,並且一個挨着一個。目錄項不能跨越磁盤塊,所以通常在每個磁盤塊的尾部會有部分未使用的字節。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中每個目錄項都由四個固定長度的屬性和一個長度可變的屬性組成。第一個屬性是 "},{"type":"codeinline","content":[{"type":"text","text":"i-node"}]},{"type":"text","text":" 節點數量,文件 first 的 i-node 編號是 19 ,文件 second 的編號是 42,目錄 third 的 i-node 編號是 88。緊隨其後的是 "},{"type":"codeinline","content":[{"type":"text","text":"rec_len"}]},{"type":"text","text":" 域,表明目錄項大小是多少字節,名稱後面會有一些擴展,當名字以未知長度填充時,這個域被用來尋找下一個目錄項,直至最後的未使用。這也是圖中箭頭的含義。緊隨其後的是 "},{"type":"codeinline","content":[{"type":"text","text":"類型域"}]},{"type":"text","text":":F 表示的是文件,D 表示的是目錄,最後是固定長度的文件名,上面的文件名的長度依次是 5、6、5,最後以文件名結束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"rec_len 域是如何擴展的呢?如下圖所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d718b6cc0d56253120af43a1d0923cd6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以看到,中間的 "},{"type":"codeinline","content":[{"type":"text","text":"second"}]},{"type":"text","text":" 被移除了,所以將其所在的域變爲第一個目錄項的填充。當然,這個填充可以作爲後續的目錄項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於目錄是按照線性的順序進行查找的,因此可能需要很長時間才能在大文件末尾找到目錄項。因此,系統會爲近期的訪問目錄維護一個緩存。這個緩存用文件名來查找,如果緩存命中,那麼就會避免線程搜索這樣昂貴的開銷。組成路徑的每個部分都在目錄緩存中保存一個 "},{"type":"codeinline","content":[{"type":"text","text":"dentry"}]},{"type":"text","text":" 對象,並且通過 i-node 找到後續的路徑元素的目錄項,直到找到真正的文件 i - node。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如說要使用絕對路徑來尋找一個文件,我們暫定這個路徑是 "},{"type":"codeinline","content":[{"type":"text","text":"/usr/local/file"}]},{"type":"text","text":",那麼需要經過如下幾個步驟:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,系統會確定根目錄,它通常使用 2 號 i -node ,也就是索引 2 節點,因爲索引節點 1 是 ext2 /3/4 文件系統上的"},{"type":"codeinline","content":[{"type":"text","text":"壞塊"}]},{"type":"text","text":"索引節點。系統會將一項放在 dentry 緩存中,以應對將來對根目錄的查找。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,在根目錄中查找字符串 "},{"type":"codeinline","content":[{"type":"text","text":"usr"}]},{"type":"text","text":",得到 /usr 目錄的 i - node 節點號。/usr 的 i - node 同樣也進入 dentry 緩存。然後節點被取出,並從中解析出磁盤塊,這樣就可以讀取 /usr 目錄並查找字符串 "},{"type":"codeinline","content":[{"type":"text","text":"local"}]},{"type":"text","text":" 了。一旦找到這個目錄項,目錄 "},{"type":"codeinline","content":[{"type":"text","text":"/usr/local"}]},{"type":"text","text":" 的 i - node 節點就可以從中獲得。有了 /usr/local 的 i - node 節點號,就可以讀取 i - node 並確定目錄所在的磁盤塊。最後,從 /usr/local 目錄查找 file 並確定其 i - node 節點呢號。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果文件存在,那麼系統會提取 i - node 節點號並把它作爲索引在 i - node 節點表中定位相應的 i - node 節點並裝入內存。i - node 被存放在 i - node "},{"type":"codeinline","content":[{"type":"text","text":"節點表(i-node table)"}]},{"type":"text","text":" 中,節點表是一個內核數據結構,它會持有當前打開文件和目錄的 i - node 節點號。下面是一些 Linux 文件系統支持的 i - node 數據結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 屬性 | 字節 | 描述 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ------ | ---- | ------------------------------------- |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Mode | 2 | 文件屬性、保護位、setuid 和 setgid 位 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Nlinks | 2 | 指向 i - node 節點目錄項的數目 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Uid | 2 | 文件所有者的 UID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Gid | 2 | 文件所有者的 GID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Size | 4 | 文件字節大小 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Addr | 60 | 12 個磁盤塊以及後面 3 個間接塊的地址 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Gen | 1 | 每次重複使用 i - node 時增加的代號 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Atime | 4 | 最近訪問文件的時間 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Mtime | 4 | 最近修改文件的時間 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| Ctime | 4 | 最近更改 i - node 的時間 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們來一起探討一下文件讀取過程,還記得 "},{"type":"codeinline","content":[{"type":"text","text":"read"}]},{"type":"text","text":" 函數是如何調用的嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"c"},"content":[{"type":"text","text":"n = read(fd,buffer,nbytes);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當內核接管後,它會從這三個參數以及內部表與用戶有關的信息開始。內部表的其中一項是文件描述符數組。文件描述符數組用"},{"type":"codeinline","content":[{"type":"text","text":"文件描述符"}]},{"type":"text","text":" 作爲索引併爲每一個打開文件保存一個表項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文件是和 i - node 節點號相關的。那麼如何通過一個文件描述符找到文件對應的 i - node 節點呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏使用的一種設計思想是在文件描述符表和 i - node 節點表之間插入一個新的表,叫做 "},{"type":"codeinline","content":[{"type":"text","text":"打開文件描述符(open-file-description table)"}]},{"type":"text","text":"。文件的讀寫位置會在打開文件描述符表中存在,如下圖所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/06601af8d1620e1af2a95baf716cc807.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用 shell 、P1 和 P2 來描述一下父進程、子進程、子進程的關係。Shell 首先生成 P1,P1 的數據結構就是 Shell 的一個副本,因此兩者都指向相同的打開文件描述符的表項。當 P1 運行完成後,Shell 的文件描述符仍會指向 P1 文件位置的打開文件描述。然後 Shell 生成了 P2,新的子進程自動繼承文件的讀寫位置,甚至 P2 和 Shell 都不知道文件具體的讀寫位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面描述的是父進程和子進程這兩個 "},{"type":"codeinline","content":[{"type":"text","text":"相關"}]},{"type":"text","text":" 進程,如果是一個不相關進程打開文件時,它將得到自己的打開文件描述符表項,以及自己的文件讀寫位置,這是我們需要的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">因此,打開文件描述符相當於是給相關進程提供同一個讀寫位置,而給不相關進程提供各自私有的位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"i - node 包含三個間接塊的磁盤地址,它們每個指向磁盤塊的地址所能夠存儲的大小不一樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Linux Ext4 文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了防止由於系統崩潰和電源故障造成的數據丟失,ext2 系統必須在每個數據塊創建之後立即將其寫入到磁盤上,磁盤磁頭尋道操作導致的延遲是無法讓人忍受的。爲了增強文件系統的健壯性,Linux 依靠"},{"type":"codeinline","content":[{"type":"text","text":"日誌文件系統"}]},{"type":"text","text":",ext3 是一個日誌文件系統,它在 ext2 文件系統的基礎之上做了改進,ext4 也是 ext3 的改進,ext4 也是一個日誌文件系統。ext4 改變了 ext3 的塊尋址方案,從而支持更大的文件和更大的文件系統大小。下面我們就來描述一下 ext4 文件系統的特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具有記錄的文件系統最基本的功能就是"},{"type":"codeinline","content":[{"type":"text","text":"記錄日誌"}]},{"type":"text","text":",這個日誌記錄了按照順序描述所有文件系統的操作。通過順序寫出文件系統數據或元數據的更改,操作不受磁盤訪問期間磁盤頭移動的開銷。最終,這個變更會寫入並提交到合適的磁盤位置上。如果這個變更在提交到磁盤前文件系統宕機了,那麼在重啓期間,系統會檢測到文件系統未正確卸載,那麼就會遍歷日誌並應用日誌的記錄來對文件系統進行更改。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ext4 文件系統被設計用來高度匹配 ext2 和 ext3 文件系統的,儘管 ext4 文件系統在內核數據結構和磁盤佈局上都做了變更。儘管如此,一個文件系統能夠從 ext2 文件系統上卸載後成功的掛載到 ext4 文件系統上,並提供合適的日誌記錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌是作爲循環緩衝區管理的文件。日誌可以存儲在與主文件系統相同或者不同的設備上。日誌記錄的讀寫操作會由單獨的 "},{"type":"codeinline","content":[{"type":"text","text":"JBD(Journaling Block Device)"}]},{"type":"text","text":" 來扮演。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JBD 中有三個主要的數據結構,分別是 "},{"type":"text","marks":[{"type":"strong"}],"text":"log record(日誌記錄)、原子操作和事務"},{"type":"text","text":"。一個日誌記錄描述了一個低級別的文件系統操作,這個操作通常導致塊內的變化。因爲像是 "},{"type":"codeinline","content":[{"type":"text","text":"write"}]},{"type":"text","text":" 這種系統調用會包含多個地方的改動 --- i - node 節點,現有的文件塊,新的文件塊和空閒列表等。相關的日誌記錄會以原子性的方式分組。ext4 會通知系統調用進程的開始和結束,以此使 JBD 能夠確保原子操作的記錄都能被應用,或者一個也不被應用。最後,主要從效率方面考慮,JBD 會視原子操作的集合爲事務。一個事務中的日誌記錄是連續存儲的。只有在所有的變更一起應用到磁盤後,日誌記錄才能夠被丟棄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於爲每個磁盤寫出日誌的開銷會很大,所以 ext4 可以配置爲保留所有磁盤更改的日誌,或者僅僅保留與文件系統元數據相關的日誌更改。僅僅記錄元數據可以減少系統開銷,提升性能,但不能保證不會損壞文件數據。其他的幾個日誌系統維護着一系列元數據操作的日誌,例如 SGI 的 XFS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"/proc 文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外一個 Linux 文件系統是 "},{"type":"codeinline","content":[{"type":"text","text":"/proc"}]},{"type":"text","text":" (process) 文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">它的主要思想來源於貝爾實驗室開發的第 8 版的 UNIX,後來被 BSD 和 System V 採用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,Linux 在一些方面上對這個想法進行了擴充。它的基本概念是爲系統中的每個進程在 "},{"type":"codeinline","content":[{"type":"text","text":"/proc"}]},{"type":"text","text":" 中創建一個目錄。目錄的名字就是進程 PID,以十進制數進行表示。例如,"},{"type":"codeinline","content":[{"type":"text","text":"/proc/1024"}]},{"type":"text","text":" 就是一個進程號爲 1024 的目錄。在該目錄下是進程信息相關的文件,比如進程的命令行、環境變量和信號掩碼等。事實上,這些文件在磁盤上並不存在磁盤中。當需要這些信息的時候,系統會按需從進程中讀取,並以標準格式返回給用戶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"許多 Linux 擴展與 "},{"type":"codeinline","content":[{"type":"text","text":"/proc"}]},{"type":"text","text":" 中的其他文件和目錄有關。它們包含各種各樣的關於 CPU、磁盤分區、設備、中斷向量、內核計數器、文件系統、已加載模塊等信息。非特權用戶可以讀取很多這樣的信息,於是就可以通過一種安全的方式瞭解系統情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"NFS 網絡文件系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從一開始,網絡就在 Linux 中扮演了很重要的作用。下面我們會探討一下 "},{"type":"codeinline","content":[{"type":"text","text":"NFS(Network File System)"}]},{"type":"text","text":" 網絡文件系統,它在現代 Linux 操作系統的作用是將不同計算機上的不同文件系統鏈接成一個邏輯整體。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"NFS 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NFS 最基本的思想是允許任意選定的一些"},{"type":"codeinline","content":[{"type":"text","text":"客戶端"}]},{"type":"text","text":"和"},{"type":"codeinline","content":[{"type":"text","text":"服務器"}]},{"type":"text","text":"共享一個公共文件系統。在許多情況下,所有的客戶端和服務器都會在同一個 "},{"type":"codeinline","content":[{"type":"text","text":"LAN(Local Area Network)"}]},{"type":"text","text":" 局域網內共享,但是這並不是必須的。也可能是下面這樣的情況:如果客戶端和服務器距離較遠,那麼它們也可以在廣域網上運行。客戶端可以是服務器,服務器可以是客戶端,但是爲了簡單起見,我們說的客戶端就是消費服務,而服務器就是提供服務的角度來聊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一個 NFS 服務都會導出一個或者多個目錄供遠程客戶端訪問。當一個目錄可用時,它的所有子目錄也可用。因此,通常整個目錄樹都會作爲一個整體導出。服務器導出的目錄列表會用一個文件來維護,這個文件是 "},{"type":"codeinline","content":[{"type":"text","text":"/etc/exports"}]},{"type":"text","text":",當服務器啓動後,這些目錄可以自動的被導出。客戶端通過掛載這些導出的目錄來訪問它們。當一個客戶端掛載了一個遠程目錄,這個目錄就成爲客戶端目錄層次的一部分,如下圖所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/92/92665e77fb78ca55313c89c5ffa65996.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個示例中,一號客戶機掛載到服務器的 bin 目錄下,因此它現在可以使用 shell 訪問 /bin/cat 或者其他任何一個目錄。同樣,客戶機 1 也可以掛載到 二號服務器上從而訪問 /usr/local/projects/proj1 或者其他目錄。二號客戶機同樣可以掛載到二號服務器上,訪問路徑是 /mnt/projects/proj2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面可以看到,由於不同的客戶端將文件掛載到各自目錄樹的不同位置,同一個文件在不同的客戶端有不同的訪問路徑和不同的名字。掛載點一般通常在客戶端本地,服務器不知道任何一個掛載點的存在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"NFS 協議"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於 NFS 的協議之一是支持 "},{"type":"codeinline","content":[{"type":"text","text":"異構"}]},{"type":"text","text":" 系統,客戶端和服務器可能在不同的硬件上運行不同的操作系統,因此有必要在服務器和客戶端之間進行接口定義。這樣才能讓任何寫一個新客戶端能夠和現有的服務器一起正常工作,反之亦然。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NFS 就通過定義兩個客戶端 - 服務器協議從而實現了這個目標。協議就是客戶端發送給服務器的一連串的請求,以及服務器發送回客戶端的相應答覆。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個 NFS 協議是處理掛載。客戶端可以向服務器發送路徑名並且請求服務器是否能夠將服務器的目錄掛載到自己目錄層次上。因爲服務器不關心掛載到哪裏,因此請求不會包含掛載地址。如果路徑名是合法的並且指定的目錄已經被導出,那麼服務器會將文件 "},{"type":"codeinline","content":[{"type":"text","text":"句柄"}]},{"type":"text","text":" 返回給客戶端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">文件句柄包含唯一標識文件系統類型,磁盤,目錄的i節點號和安全性信息的字段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨後調用讀取和寫入已安裝目錄或其任何子目錄中的文件,都將使用文件句柄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 Linux 啓動時會在多用戶之前運行 shell 腳本 /etc/rc 。可以將掛載遠程文件系統的命令寫入該腳本中,這樣就可以在允許用戶登陸之前自動掛載必要的遠程文件系統。大部分 Linux 版本是支持"},{"type":"codeinline","content":[{"type":"text","text":"自動掛載"}]},{"type":"text","text":"的。這個特性會支持將遠程目錄和本地目錄進行關聯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相對於手動掛載到 /etc/rc 目錄下,自動掛載具有以下優勢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果列出的 /etc/rc 目錄下出現了某種故障,那麼客戶端將無法啓動,或者啓動會很困難、延遲或者伴隨一些出錯信息,如果客戶根本不需要這個服務器,那麼手動做了這些工作就白費了。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"允許客戶端並行的嘗試一組服務器,可以實現一定程度的容錯率,並且性能也可以得到提高。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一方面,我們默認在自動掛載時所有可選的文件系統都是相同的。由於 NFS 不提供對文件或目錄複製的支持,用戶需要自己確保這些所有的文件系統都是相同的。因此,大部分的自動掛載都只應用於二進制文件和很少改動的只讀的文件系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個 NFS 協議是爲文件和目錄的訪問而設計的。客戶端能夠通過向服務器發送消息來操作目錄和讀寫文件。客戶端也可以訪問文件屬性,比如文件模式、大小、上次修改時間。NFS 支持大多數的 Linux 系統調用,但是 open 和 close 系統調用卻不支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不支持 open 和 close 並不是一種疏忽,而是一種刻意的設計,完全沒有必要在讀一個文件之前對其進行打開,也沒有必要在讀完時對其進行關閉。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NFS 使用了標準的 UNIX 保護機制,使用 "},{"type":"codeinline","content":[{"type":"text","text":"rwx"}]},{"type":"text","text":" 位來標示"},{"type":"codeinline","content":[{"type":"text","text":"所有者(owner)"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"組(groups)"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"其他用戶"}]},{"type":"text","text":" 。最初,每個請求消息都會攜帶調用者的 groupId 和 userId,NFS 會對其進行驗證。事實上,它會信任客戶端不會發生欺騙行爲。可以使用公鑰密碼來創建一個安全密鑰,在每次請求和應答中使用它驗證客戶端和服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"NFS 實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使客戶端和服務器的代碼實現是獨立於 NFS 協議的,大部分的 Linux 系統會使用一個下圖的三層實現,頂層是系統調用層,系統調用層能夠處理 open 、 read 、 close 這類的系統調用。在解析和參數檢查結束後調用第二層,"},{"type":"codeinline","content":[{"type":"text","text":"虛擬文件系統 (VFS)"}]},{"type":"text","text":" 層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca59a3cfc47fb8b4c0a478e4748ac213.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"VFS 層的任務是維護一個表,每個已經打開的文件都在表中有一個表項。VFS 層爲每一個打開的文件維護着一個"},{"type":"codeinline","content":[{"type":"text","text":"虛擬i節點 "}]},{"type":"text","text":",簡稱爲 v - node。v 節點用來說明文件是本地文件還是遠程文件。如果是遠程文件的話,那麼 v - node 會提供足夠的信息使客戶端能夠訪問它們。對於本地文件,會記錄其所在的文件系統和文件的 i-node ,因爲現代操作系統能夠支持多文件系統。雖然 VFS 是爲了支持 NFS 而設計的,但是現代操作系統都會使用 VFS,而不管有沒有 NFS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Linux IO"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們之前瞭解過了 Linux 的進程和線程、Linux 內存管理,那麼下面我們就來認識一下 Linux 中的 I/O 管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統和其他 UNIX 系統一樣,IO 管理比較直接和簡潔。所有 IO 設備都被當作"},{"type":"codeinline","content":[{"type":"text","text":"文件"}]},{"type":"text","text":",通過在系統內部使用相同的 read 和 write 一樣進行讀寫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux IO 基本概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中也有磁盤、打印機、網絡等 I/O 設備,Linux 把這些設備當作一種 "},{"type":"codeinline","content":[{"type":"text","text":"特殊文件"}]},{"type":"text","text":" 整合到文件系統中,一般通常位於 "},{"type":"codeinline","content":[{"type":"text","text":"/dev"}]},{"type":"text","text":" 目錄下。可以使用與普通文件相同的方式來對待這些特殊文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特殊文件一般分爲兩種:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"塊特殊文件是一個能存儲"},{"type":"codeinline","content":[{"type":"text","text":"固定大小塊"}]},{"type":"text","text":"信息的設備,它支持"},{"type":"text","marks":[{"type":"strong"}],"text":"以固定大小的塊,扇區或羣集讀取和(可選)寫入數據"},{"type":"text","text":"。每個塊都有自己的"},{"type":"codeinline","content":[{"type":"text","text":"物理地址"}]},{"type":"text","text":"。通常塊的大小在 512 - 65536 之間。所有傳輸的信息都會以"},{"type":"codeinline","content":[{"type":"text","text":"連續"}]},{"type":"text","text":"的塊爲單位。塊設備的基本特徵是每個塊都較爲對立,能夠獨立的進行讀寫。常見的塊設備有 "},{"type":"text","marks":[{"type":"strong"}],"text":"硬盤、藍光光盤、USB 盤"},{"type":"text","text":"與字符設備相比,塊設備通常需要較少的引腳。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/51/519fa57ce8422c8d38736c8b87978532.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"塊特殊文件的缺點基於給定固態存儲器的塊設備比基於相同類型的存儲器的字節尋址要慢一些,因爲必須在塊的開頭開始讀取或寫入。所以,要讀取該塊的任何部分,必須尋找到該塊的開始,讀取整個塊,如果不使用該塊,則將其丟棄。要寫入塊的一部分,必須尋找到塊的開始,將整個塊讀入內存,修改數據,再次尋找到塊的開頭處,然後將整個塊寫回設備。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一類 I/O 設備是"},{"type":"codeinline","content":[{"type":"text","text":"字符特殊文件"}]},{"type":"text","text":"。字符設備以"},{"type":"codeinline","content":[{"type":"text","text":"字符"}]},{"type":"text","text":"爲單位發送或接收一個字符流,而不考慮任何塊結構。字符設備是不可尋址的,也沒有任何尋道操作。常見的字符設備有 "},{"type":"text","marks":[{"type":"strong"}],"text":"打印機、網絡設備、鼠標、以及大多數與磁盤不同的設備"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c1ef0e8bc51187ee3a167a23b2c29c9b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個設備特殊文件都會和 "},{"type":"codeinline","content":[{"type":"text","text":"設備驅動"}]},{"type":"text","text":" 相關聯。每個驅動程序都通過一個 "},{"type":"codeinline","content":[{"type":"text","text":"主設備號"}]},{"type":"text","text":" 來標識。如果一個驅動支持多個設備的話,此時會在主設備的後面新加一個 "},{"type":"codeinline","content":[{"type":"text","text":"次設備號"}]},{"type":"text","text":" 來標識。主設備號和次設備號共同確定了唯一的驅動設備。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們知道,在計算機系統中,CPU 並不直接和設備打交道,它們中間有一個叫作 "},{"type":"codeinline","content":[{"type":"text","text":"設備控制器(Device Control Unit)"}]},{"type":"text","text":"的組件,例如硬盤有磁盤控制器、USB 有 USB 控制器、顯示器有視頻控制器等。這些控制器就像代理商一樣,它們知道如何應對硬盤、鼠標、鍵盤、顯示器的行爲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"絕大多數字符特殊文件都不能隨機訪問,因爲他們需要使用和塊特殊文件不同的方式來控制。比如,你在鍵盤上輸入了一些字符,但是你發現輸錯了一個,這時有一些人喜歡使用 "},{"type":"codeinline","content":[{"type":"text","text":"backspace"}]},{"type":"text","text":" 來刪除,有人喜歡用 "},{"type":"codeinline","content":[{"type":"text","text":"del"}]},{"type":"text","text":" 來刪除。爲了中斷正在運行的設備,一些系統使用 "},{"type":"codeinline","content":[{"type":"text","text":"ctrl-u"}]},{"type":"text","text":" 來結束,但是現在一般使用 "},{"type":"codeinline","content":[{"type":"text","text":"ctrl-c"}]},{"type":"text","text":" 來結束。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"網絡"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O 的另外一個概念是"},{"type":"codeinline","content":[{"type":"text","text":"網絡"}]},{"type":"text","text":", 也是由 UNIX 引入,網絡中一個很關鍵的概念就是 "},{"type":"codeinline","content":[{"type":"text","text":"套接字(socket)"}]},{"type":"text","text":"。套接字允許用戶連接到網絡,正如郵筒允許用戶連接到郵政系統,套接字的示意圖如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dc8388128704e61ce8327e6fbb575fe.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"套接字的位置如上圖所示,套接字可以動態創建和銷燬。成功創建一個套接字後,系統會返回一個"},{"type":"codeinline","content":[{"type":"text","text":"文件描述符(file descriptor)"}]},{"type":"text","text":",在後面的創建鏈接、讀數據、寫數據、解除連接時都需要使用到這個文件描述符。每個套接字都支持一種特定類型的網絡類型,在創建時指定。一般最常用的幾種"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可靠的面向連接的字節流"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可靠的面向連接的數據包"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不可靠的數據包傳輸"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可靠的面向連接的字節流會使用"},{"type":"codeinline","content":[{"type":"text","text":"管道"}]},{"type":"text","text":" 在兩臺機器之間建立連接。能夠保證字節從一臺機器按照順序到達另一臺機器,系統能夠保證所有字節都能到達。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了數據包之間的分界之外,第二種類型和第一種類型是類似的。如果發送了 3 次寫操作,那麼使用第一種方式的接受者會直接接收到所有字節;第二種方式的接受者會分 3 次接受所有字節。除此之外,用戶還可以使用第三種即不可靠的數據包來傳輸,使用這種傳輸方式的優點在於高性能,有的時候它比可靠性更加重要,比如在流媒體中,性能就尤其重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上涉及兩種形式的傳輸協議,即 "},{"type":"codeinline","content":[{"type":"text","text":"TCP"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"UDP"}]},{"type":"text","text":",TCP 是 "},{"type":"codeinline","content":[{"type":"text","text":"傳輸控制協議"}]},{"type":"text","text":",它能夠傳輸可靠的字節流。"},{"type":"codeinline","content":[{"type":"text","text":"UDP"}]},{"type":"text","text":" 是 "},{"type":"codeinline","content":[{"type":"text","text":"用戶數據報協議"}]},{"type":"text","text":",它只能夠傳輸不可靠的字節流。它們都屬於 TCP/IP 協議簇中的協議,下面是網絡協議分層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4c/4ccdc3777099e0aba3bc0aa116f232a9.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,TCP 、UDP 都位於網絡層上,可見它們都把 IP 協議 即 "},{"type":"codeinline","content":[{"type":"text","text":"互聯網協議"}]},{"type":"text","text":" 作爲基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一旦套接字在源計算機和目的計算機建立成功,那麼兩個計算機之間就可以建立一個鏈接。通信一方在本地套接字上使用 "},{"type":"codeinline","content":[{"type":"text","text":"listen"}]},{"type":"text","text":" 系統調用,它就會創建一個緩衝區,然後阻塞直到數據到來。另一方使用 "},{"type":"codeinline","content":[{"type":"text","text":"connect"}]},{"type":"text","text":" 系統調用,如果另一方接受 connect 系統調用後,則系統會在兩個套接字之間建立連接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"socket 連接建立成功後就像是一個管道,一個進程可以使用本地套接字的文件描述符從中讀寫數據,當連接不再需要的時候使用 "},{"type":"codeinline","content":[{"type":"text","text":"close"}]},{"type":"text","text":" 系統調用來關閉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux I/O 系統調用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統中的每個 I/O 設備都有一個"},{"type":"codeinline","content":[{"type":"text","text":"特殊文件(special file)"}]},{"type":"text","text":"與之關聯,什麼是特殊文件呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在操作系統中,特殊文件是一種在文件系統中與硬件設備相關聯的文件。特殊文件也被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"設備文件(device file)"}]},{"type":"text","text":"。特殊文件的目的是將設備作爲文件系統中的文件進行公開。特殊文件爲硬件設備提供了藉口,用於文件 I/O 的工具可以進行訪問。因爲設備有兩種類型,同樣特殊文件也有兩種,即字符特殊文件和塊特殊文件"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於大部分 I/O 操作來說,只用合適的文件就可以完成,並不需要特殊的系統調用。然後,有時需要一些設備專用的處理。在 POSIX 之前,大多數 UNIX 系統會有一個叫做 "},{"type":"codeinline","content":[{"type":"text","text":"ioctl"}]},{"type":"text","text":" 的系統調用,它用於執行大量的系統調用。隨着時間的發展,POSIX 對其進行了整理,把 ioctl 的功能劃分爲面向終端設備的獨立功能調用,現在已經變成獨立的系統調用了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是幾個管理終端的系統調用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 系統調用 | 描述 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| ----------- | ------------ |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| tcgetattr | 獲取屬性 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| tcsetattr | 設置屬性 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cfgetispeed | 獲取輸入速率 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cfgetospeed | 獲取輸出速率 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cfsetispeed | 設置輸入速率 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| cfsetospeed | 設置輸出速率 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux IO 實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中的 IO 是通過一系列設備驅動實現的,每個設備類型對應一個設備驅動。設備驅動爲操作系統和硬件分別預留接口,通過設備驅動來屏蔽操作系統和硬件的差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶訪問一個特殊的文件時,由文件系統提供此特殊文件的主設備號和次設備號,並判斷它是一個塊特殊文件還是字符特殊文件。主設備號用於標識字符設備還是塊設備,次設備號用於參數傳遞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個"},{"type":"codeinline","content":[{"type":"text","text":"驅動程序"}]},{"type":"text","text":" 都有兩部分:這兩部分都是屬於 Linux 內核,也都運行在內核態下。上半部分運行在調用者上下文並且與 Linux 其他部分交互。下半部分運行在內核上下文並且與設備進行交互。驅動程序可以調用內存分配、定時器管理、DMA 控制等內核過程。可被調用的內核功能都位於 "},{"type":"codeinline","content":[{"type":"text","text":"驅動程序 - 內核接口"}]},{"type":"text","text":" 的文檔中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O 實現指的就是對字符設備和塊設備的實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"塊設備實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統中處理塊特殊文件 I/O 部分的目標是爲了使傳輸次數儘可能的小。爲了實現這個目標,Linux 系統在磁盤驅動程序和文件系統之間設置了一個 "},{"type":"codeinline","content":[{"type":"text","text":"高速緩存(cache)"}]},{"type":"text","text":" ,如下圖所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ee/ee239ae3af204448ea5c223908dc70ca.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 內核 2.2 之前,Linux 系統維護着兩個緩存:"},{"type":"codeinline","content":[{"type":"text","text":"頁面緩存(page cache)"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"緩衝區緩存(buffer cache)"}]},{"type":"text","text":",因此,存儲在一個磁盤塊中的文件可能會在兩個緩存中。2.2 版本以後 Linux 內核只有一個統一的緩存一個 "},{"type":"codeinline","content":[{"type":"text","text":"通用數據塊層(generic block layer)"}]},{"type":"text","text":" 把這些融合在一起,實現了磁盤、數據塊、緩衝區和數據頁之間必要的轉換。那麼什麼是通用數據塊層?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">通用數據塊層是一個內核的組成部分,用於處理對系統中所有塊設備的請求。通用數據塊主要有以下幾個功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">將數據緩衝區放在內存高位處,當 CPU 訪問數據時,頁面纔會映射到內核線性地址中,並且此後取消映射"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">實現 "},{"type":"codeinline","content":[{"type":"text","text":"零拷貝"}]},{"type":"text","text":"機制,磁盤數據可以直接放入用戶模式的地址空間,而無需先複製到內核內存中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">管理磁盤卷,會把不同塊設備上的多個磁盤分區視爲一個分區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">利用最新的磁盤控制器的高級功能,例如 DMA 等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"cache 是提升性能的利器,不管以什麼樣的目的需要一個數據塊,都會先從 cache 中查找,如果找到直接返回,避免一次磁盤訪問,能夠極大的提升系統性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果頁面 cache 中沒有這個塊,操作系統就會把頁面從磁盤中調入內存,然後讀入 cache 進行緩存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"cache 除了支持讀操作外,也支持寫操作,一個程序要寫回一個塊,首先把它寫到 cache 中,而不是直接寫入到磁盤中,等到磁盤中緩存達到一定數量值時再被寫入到 cache 中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 系統中使用 "},{"type":"codeinline","content":[{"type":"text","text":"IO 調度器"}]},{"type":"text","text":" 來保證減少磁頭的反覆移動從而減少損失。I/O 調度器的作用是對塊設備的讀寫操作進行排序,對讀寫請求進行合併。Linux 有許多調度器的變體,從而滿足不同的工作需要。最基本的 Linux 調度器是基於傳統的 "},{"type":"codeinline","content":[{"type":"text","text":"Linux 電梯調度器(Linux elevator scheduler)"}]},{"type":"text","text":"。Linux 電梯調度器的主要工作流程就是按照磁盤扇區的地址排序並存儲在一個"},{"type":"codeinline","content":[{"type":"text","text":"雙向鏈表"}]},{"type":"text","text":" 中。新的請求將會以鏈表的形式插入。這種方法可以有效的防止磁頭重複移動。因爲電梯調度器會容易產生飢餓現象。因此,Linux 在原基礎上進行了修改,維護了兩個鏈表,在 "},{"type":"codeinline","content":[{"type":"text","text":"最後日期(deadline)"}]},{"type":"text","text":" 內維護了排序後的讀寫操作。默認的讀操作耗時 0.5s,默認寫操作耗時 5s。如果在最後期限內等待時間最長的鏈表沒有獲得服務,那麼它將優先獲得服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"字符設備實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和字符設備的交互是比較簡單的。由於字符設備會產生並使用字符流、字節數據,因此對隨機訪問的支持意義不大。一個例外是使用 "},{"type":"codeinline","content":[{"type":"text","text":"行規則(line disciplines)"}]},{"type":"text","text":"。一個行規可以和終端設備相關聯,使用 "},{"type":"codeinline","content":[{"type":"text","text":"tty_struct"}]},{"type":"text","text":" 結構來表示,它表示與終端設備交換數據的解釋器,當然這也屬於內核的一部分。例如:行規可以對行進行編輯,映射回車爲換行等一系列其他操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼是行規則?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行規是某些類 UNIX 系統中的一層,終端子系統通常由三層組成:上層提供字符設備接口,下層硬件驅動程序與硬件或僞終端進行交互,中層規則用於實現終端設備共有的行爲。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/42/42487756f2a82ce1a2f8049ff68d0531.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"網絡設備實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網絡設備的交互是不一樣的,雖然 "},{"type":"codeinline","content":[{"type":"text","text":"網絡設備(network devices)"}]},{"type":"text","text":" 也會產生字符流,因爲它們的"},{"type":"codeinline","content":[{"type":"text","text":"異步(asynchronous)"}]},{"type":"text","text":" 特性是他們不易與其他字符設備在同一接口下集成。網絡設備驅動程序會產生很多數據包,經由網絡協議到達用戶應用程序中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 中的模塊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"UNIX 設備驅動程序是被"},{"type":"codeinline","content":[{"type":"text","text":"靜態加載"}]},{"type":"text","text":"到內核中的。因此,只要系統啓動後,設備驅動程序都會被加載到內存中。隨着個人電腦 Linux 的出現,這種靜態鏈接完成後會使用一段時間的模式被打破。相對於小型機上的 I/O 設備,PC 上可用的 I/O 設備有了數量級的增長。絕大多數用戶沒有能力去添加一個新的應用程序、更新設備驅動、重新連接內核,然後進行安裝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 爲了解決這個問題,引入了 "},{"type":"codeinline","content":[{"type":"text","text":"可加載(loadable module)"}]},{"type":"text","text":" 機制。可加載是在系統運行時添加到內核中的代碼塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當一個模塊被加載到內核時,會發生下面幾件事情:第一,在加載的過程中,模塊會被動態的重新部署。第二,系統會檢查程序程序所需的資源是否可用。如果可用,則把這些資源標記爲正在使用。第三步,設置所需的中斷向量。第四,更新驅動轉換表使其能夠處理新的主設備類型。最後再來運行設備驅動程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在完成上述工作後,驅動程序就會安裝完成,其他現代 UNIX 系統也支持可加載機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Linux 安全"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 作爲 MINIX 和 UNIX 的衍生操作系統,從一開始就是一個"},{"type":"codeinline","content":[{"type":"text","text":"多用戶"}]},{"type":"text","text":"系統。這意味着 Linux 從早期開始就建立了安全和信息訪問控制機制。下面我們主要探討的就是 Linux 安全性的一些內容"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 安全基本概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個 Linux 系統的用戶羣裏由一系列註冊用戶組成,他們每一個都有一個唯一的 UID (User ID)。一個 UID 是一個位於 0 到 65535 之間的整數。文件(進程或者是其他資源)都標記了它的所有者的 UID。默認情況下,文件的所有者是創建文件的人,文件的所有者是創建文件的用戶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶可以被分成許多組,每個組都會由一個 16 位的整數標記,這個組叫做 "},{"type":"codeinline","content":[{"type":"text","text":"GID(組 ID)"}]},{"type":"text","text":"。給用戶分組是手動完成的,它由系統管理員執行,分組就是在數據庫中添加一條記錄指明哪個用戶屬於哪個組。一個用戶可以屬於不同組。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中的基本安全機制比較容易理解,每個進程都會記錄它所有者的 UID 和 GID。當文件創建後,它會獲取創建進程的 UID 和 GID。當一個文件被創建時,它的 UID 和 GID 就會被標記爲進程的 UID 和 GID。這個文件同時會獲取由該進程決定的一些權限。這些權限會指定所有者、所有者所在組的其他用戶及其他用戶對文件具有什麼樣的訪問權限。對於這三類用戶而言,潛在的訪問權限是 "},{"type":"text","marks":[{"type":"strong"}],"text":"讀、寫和執行"},{"type":"text","text":",分別由 r、w 和 x 標記。當然,執行文件的權限僅當文件時可逆二進制程序時纔有意義。試圖執行一個擁有執行權限的非可執行文件,系統會報錯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Linux 用戶分爲三種"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"root(超級管理員)"}]},{"type":"text","text":",它的 UID 爲 0,這個用戶有極大的權限,可以直接無視很多的限制 ,包括讀寫執行的權限。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"系統用戶"}]},{"type":"text","text":",UID 爲 1~499。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"普通用戶"}]},{"type":"text","text":",UID 範圍一般是 500~65534。這類用戶的權限會受到基本權限的限制,也會受到來自管理員的限制。不過要注意 nobody 這個特殊的帳號,UID 爲 65534,這個用戶的權限會進一步的受到限制,一般用於實現來賓帳號。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中的每類用戶由 3 個比特爲來標記,所以 9 個比特位就能夠表示所有的權限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面來看一下一些基本的用戶和權限例子"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 二進制 | 標記 | 准許的文件訪問權限 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| --------- | --------- | ------------------------------ |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 111000000 | rwx------ | 所有者可讀、寫和執行 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 111111000 | rwxrwx--- | 所有者和組可以讀、寫和執行 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 111111111 | rwxrwxrwx | 所有人可以讀、寫和執行 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 000000000 | --------- | 任何人不擁有任何權限 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 000000111 | ------rwx | 只有組以外的其他用戶擁有所有權 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 110100100 | rw-r--r-- | 所有者可以讀和寫,其他人可以讀 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 110100100 | rw-r----- | 所有者可以讀和寫,組可以讀 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們上面提到,UID 爲 0 的是一個特殊用戶,稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"超級用戶(或者根用戶)"}]},{"type":"text","text":"。超級用戶能夠讀和寫系統中的任何文件,不管這個文件由誰所有,也不管這個文件的保護模式如何。 UID 爲 0 的進程還具有少數調用受保護系統調用的權限,而普通用戶是不可能有這些功能的。通常情況下,只有系統管理員知道超級用戶的密碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Linux 系統下,目錄也是一種文件,並且具有和普通文件一樣的保護模式。不同的是,目錄的 x 比特位表示查找權限而不是執行權限。因此,如果一個目錄的保護模式是 "},{"type":"codeinline","content":[{"type":"text","text":"rwxr-xr-x"}]},{"type":"text","text":",那麼它允許所有者讀、寫和查找目錄,而其他人只可以讀和查找,而不允許從中添加或者刪除目錄中的文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與 I/O 有關的特殊文件擁有和普通文件一樣的保護位。這種機制可以用來限制對 I/O 設備的訪問權限。舉個例子,打印機是特殊文件,它的目錄是 "},{"type":"codeinline","content":[{"type":"text","text":"/dev/lp"}]},{"type":"text","text":",它可以被根用戶或者一個叫守護進程的特殊用戶擁有,具有保護模式 rw-------,從而阻止其他所有人對打印機的訪問。畢竟每個人都使用打印機的話會發生混亂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,如果 /dev/lp 的保護模式是 rw-------,那就意味着其他任何人都不能使用打印機。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個問題通過增加一個保護位 "},{"type":"codeinline","content":[{"type":"text","text":"SETUID"}]},{"type":"text","text":" 到之前的 9 個比特位來解決。當一個進程的 SETUID 位打開,它的 "},{"type":"codeinline","content":[{"type":"text","text":"有效 UID"}]},{"type":"text","text":" 將變成相應可執行文件的所有者 UID,而不是當前使用該進程的用戶的 UID。將訪問打印機的程序設置爲守護進程所有,同時打開 SETUID 位,這樣任何用戶都可以執行此程序,而且擁有守護進程的權限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了 SETUID 之外,還有一個 SETGID 位,SETGID 的工作原理和 SETUID 類似。但是這個位一般很不常用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 安全相關的系統調用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 中關於安全的系統調用不是很多,只有幾個,如下列表所示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| 系統調用 | 描述 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| -------- | ---------------------------------- |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| chmod | 改變文件的保護模式 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| access | 使用真實的 UID 和 GID 測試訪問權限 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| chown | 改變所有者和組 |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| setuid | 設置 UID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| setgid | 設置 GID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| getuid | 獲取真實的 UID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| getgid | 獲取真實的 GID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| geteuid | 獲取有效的 UID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"| getegid | 獲取有效的 GID |"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在日常開發中用到最多的就是 "},{"type":"codeinline","content":[{"type":"text","text":"chmod"}]},{"type":"text","text":"了,沒想到我們日常開發過程中也能用到系統調用啊,chmod 之前我們一直認爲是改變權限,現在專業一點是改變文件的保護模式。它的具體函數如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"s = chmod(\"路徑名\",\"值\");"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"s = chmod(\"/usr/local/cxuan\",777);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他就是會把 "},{"type":"codeinline","content":[{"type":"text","text":"/usr/local/cxuan"}]},{"type":"text","text":" 這個路徑的保護模式改爲 rwxrwxrwx,任何組和人都可以操作這個路徑。只有該文件的所有者和超級用戶纔有權利更改保護模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"access"}]},{"type":"text","text":" 系統調用用來檢驗實際的 UID 和 GID 對某文件是否擁有特定的權限。下面就是四個 getxxx 的系統調用,這些用來獲取 uid 和 gid 的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">注意:其中的 chown、setuid 和 setgid 是超級用戶才能使用,用來改變所有者進程的 UID 和 GID。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Linux 安全實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當用戶登錄時,登錄程序,也被稱爲 "},{"type":"codeinline","content":[{"type":"text","text":"login"}]},{"type":"text","text":",會要求輸入用戶名和密碼。它會對密碼進行哈希處理,然後在 "},{"type":"codeinline","content":[{"type":"text","text":"/etc/passwd"}]},{"type":"text","text":" 中進行查找,看看是否有匹配的項。使用哈希的原因是防止密碼在系統中以非加密的方式存在。如果密碼正確,登錄程序會在 /etc/passwd 中讀取用戶選擇的 shell 程序的名稱,有可能是 "},{"type":"codeinline","content":[{"type":"text","text":"bash"}]},{"type":"text","text":",有可能是 "},{"type":"codeinline","content":[{"type":"text","text":"shell"}]},{"type":"text","text":" 或者其他的 "},{"type":"codeinline","content":[{"type":"text","text":"csh"}]},{"type":"text","text":" 或 "},{"type":"codeinline","content":[{"type":"text","text":"ksh"}]},{"type":"text","text":"。然後登錄程序使用 setuid 和 setgid 這兩個系統調用來把自己的 UID 和 GID 變爲用戶的 UID 和 GID,然後它打開鍵盤作爲標準輸入、標準輸入的文件描述符是 0 ,屏幕作爲標準輸出,文件描述符是 1 ,屏幕也作爲標準錯誤輸出,文件描述符爲 2。最後,執行用戶選擇的 shell 程序,終止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當任何進程想要打開一個文件,系統首先將文件的 i - node 所記錄的保護位與用戶有效 UID 和 有效 GID 進行對比,來檢查訪問是否允許。如果訪問允許,就打開文件並返回文件描述符;否則不打開文件,返回 - 1。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux 安全模型和實現在本質上與大多數傳統的 UNIX 系統相同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關注公衆號 程序員cxuan 回覆 cxuan 領取優質資料。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我自己寫了六本 PDF ,非常硬核,鏈接如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我自己寫了六本 PDF ,非常硬核,鏈接如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我自己寫了六本 PDF ,非常硬核,鏈接如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2e/2eca123a1333a3451f94c0ba133c9ed3.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?_biz=MzI0ODk2NDIyMQ==&mid=2247485329&idx=1&sn=673f306bb229e73e8f671443488b42d4&chksm=e999f283deee7b95a3cce247907b6557bf5f228c85434fc6cbadf42b2ec4c64443742a8bea7a&token=581641926&lang=zhCN#rd","title":""},"content":[{"type":"text","text":"cxuan 嘔心瀝血肝了四本 PDF。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?_biz=MzU2NDg0OTgyMA==&mid=2247494165&idx=1&sn=4e0247006bef89701529d765e6ce32a4&chksm=fc4617e6cb319ef0991ff70c8a769b92f59cf92122f27785b848604493653fdcc206d6830a23&token=794467841&lang=zhCN#rd","title":""},"content":[{"type":"text","text":"cxuan 又肝了兩本 PDF。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章