The Google File System : part7 EXPERIENCES

7. EXPERIENCES
In the process of building and deploying GFS, we have experienced a variety of issues, some operational and some technical.

Initially, GFS was conceived as the backend file system for our production systems. 
Over time, the usage evolved to include research and development tasks. 
It started with little support for things like permissions and quotas but now includes rudimentary forms of these. 
While production systems are well disciplined and controlled, users sometimes are not. 
More infrastructure is required to keep users from interfering with one another.
Some of our biggest problems were disk and Linux related.
Many of our disks claimed to the Linux driver that they supported a range of IDE protocol versions but in fact responded reliably only to the more recent ones. 
Since the protocol versions are very similar, these drives mostly worked,but occasionally the mismatches would cause the drive and the kernel to disagree about the drive’s state. 
This would corrupt data silently due to problems in the kernel. 
This problem motivated our use of checksums to detect data corruption, while concurrently we modified the kernel to handle these protocol mismatches.
Earlier we had some problems with Linux 2.2 kernels due to the cost of fsync(). 
Its cost is proportional to the size of the file rather than the size of the modified portion. 
This was a problem for our large operation logs especially before we implemented checkpointing. 
We worked around this for a time by using synchronous writes and eventually migrated to Linux 2.4.

Another Linux problem was a single reader-writer lock which any thread in an address space must hold when it pages in from disk (reader lock) or modifies the address space in an mmap() call (writer lock). 
We saw transient timeouts in our system under light load and looked hard for resource bottlenecks or sporadic hardware failures. 
Eventually, we found that this single lock blocked the primary network thread from mapping new data into memory while the disk threads were paging in previously mapped data.
Since we are mainly limited by the network interface rather than by memory copy bandwidth, we worked around this by replacing mmap() with pread() at the cost of an extra copy.
Despite occasional problems, the availability of Linux code has helped us time and again to explore and understand system behavior. 
When appropriate, we improve the kernel and share the changes with the open source community.

7.經驗
在建設和部署GFS的過程中,我們經歷了各種問題,一些操作和一些技術。
最初,GFS被認爲是我們生產系統的後端文件系統。
隨着時間的推移,使用演變成包括研發任務。
它開始於對許可和配額等方面的支持,但現在包括這些的基本形式。
雖然生產系統規範和控制嚴格,但用戶有時並不是這樣。
需要更多的基礎設施來防止用戶互相干擾。
我們最大的一些問題是磁盤和Linux相關。
我們的許多磁盤聲稱對Linux驅動程序,他們支持一系列IDE協議版本,但實際上只是對最近的響應可靠。
由於協議版本非常相似,這些驅動器主要工作,但有時不匹配將導致驅動器和內核對驅動器的狀態不同意。
這將由於內核中的問題而靜默地破壞數據。
這個問題促使我們使用校驗和來檢測數據損壞,同時我們修改了內核來處理這些協議不匹配。
早些時候,由於fsync()的成本,Linux 2.2內核出現了一些問題。
其成本與文件的大小成正比,而不是修改部分的大小。
這對我們的大型操作日誌是一個問題,特別是在我們實現檢查點之前。
我們通過使用同步寫入操作了一段時間,最終遷移到Linux 2.4。
另一個Linux問題是一個單一的讀寫器鎖,地址空間中的任何線程在從磁盤(讀卡器鎖定)中進行訪問或修改mmap()調用(寫入器鎖)中的地址空間時都必須保持。
我們在輕載下看到我們的系統中出現瞬時超時,並且很難看出資源瓶頸或零星的硬件故障。
最終,我們發現這個單一的鎖阻止主網絡線程將新數據映射到內存,而磁盤線程在以前映射的數據中進行分頁。
由於我們主要受網絡接口的限制,而不是內存複製帶寬限制,我們通過以額外的副本爲代價替換了mmap()與pread()。
儘管偶爾出現問題,但是Linux代碼的可用性幫助我們一次又一次地探索和理解系統行爲。
在適當的情況下,我們改進內核並與開源社區共享這些更改。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章