The Google File System : part7 EXPERIENCES

7. EXPERIENCES
In the process of building and deploying GFS, we have experienced a variety of issues, some operational and some technical.

Initially, GFS was conceived as the backend file system for our production systems. 
Over time, the usage evolved to include research and development tasks. 
It started with little support for things like permissions and quotas but now includes rudimentary forms of these. 
While production systems are well disciplined and controlled, users sometimes are not. 
More infrastructure is required to keep users from interfering with one another.
Some of our biggest problems were disk and Linux related.
Many of our disks claimed to the Linux driver that they supported a range of IDE protocol versions but in fact responded reliably only to the more recent ones. 
Since the protocol versions are very similar, these drives mostly worked,but occasionally the mismatches would cause the drive and the kernel to disagree about the drive’s state. 
This would corrupt data silently due to problems in the kernel. 
This problem motivated our use of checksums to detect data corruption, while concurrently we modified the kernel to handle these protocol mismatches.
Earlier we had some problems with Linux 2.2 kernels due to the cost of fsync(). 
Its cost is proportional to the size of the file rather than the size of the modified portion. 
This was a problem for our large operation logs especially before we implemented checkpointing. 
We worked around this for a time by using synchronous writes and eventually migrated to Linux 2.4.

Another Linux problem was a single reader-writer lock which any thread in an address space must hold when it pages in from disk (reader lock) or modifies the address space in an mmap() call (writer lock). 
We saw transient timeouts in our system under light load and looked hard for resource bottlenecks or sporadic hardware failures. 
Eventually, we found that this single lock blocked the primary network thread from mapping new data into memory while the disk threads were paging in previously mapped data.
Since we are mainly limited by the network interface rather than by memory copy bandwidth, we worked around this by replacing mmap() with pread() at the cost of an extra copy.
Despite occasional problems, the availability of Linux code has helped us time and again to explore and understand system behavior. 
When appropriate, we improve the kernel and share the changes with the open source community.

7.经验
在建设和部署GFS的过程中,我们经历了各种问题,一些操作和一些技术。
最初,GFS被认为是我们生产系统的后端文件系统。
随着时间的推移,使用演变成包括研发任务。
它开始于对许可和配额等方面的支持,但现在包括这些的基本形式。
虽然生产系统规范和控制严格,但用户有时并不是这样。
需要更多的基础设施来防止用户互相干扰。
我们最大的一些问题是磁盘和Linux相关。
我们的许多磁盘声称对Linux驱动程序,他们支持一系列IDE协议版本,但实际上只是对最近的响应可靠。
由于协议版本非常相似,这些驱动器主要工作,但有时不匹配将导致驱动器和内核对驱动器的状态不同意。
这将由于内核中的问题而静默地破坏数据。
这个问题促使我们使用校验和来检测数据损坏,同时我们修改了内核来处理这些协议不匹配。
早些时候,由于fsync()的成本,Linux 2.2内核出现了一些问题。
其成本与文件的大小成正比,而不是修改部分的大小。
这对我们的大型操作日志是一个问题,特别是在我们实现检查点之前。
我们通过使用同步写入操作了一段时间,最终迁移到Linux 2.4。
另一个Linux问题是一个单一的读写器锁,地址空间中的任何线程在从磁盘(读卡器锁定)中进行访问或修改mmap()调用(写入器锁)中的地址空间时都必须保持。
我们在轻载下看到我们的系统中出现瞬时超时,并且很难看出资源瓶颈或零星的硬件故障。
最终,我们发现这个单一的锁阻止主网络线程将新数据映射到内存,而磁盘线程在以前映射的数据中进行分页。
由于我们主要受网络接口的限制,而不是内存复制带宽限制,我们通过以额外的副本为代价替换了mmap()与pread()。
尽管偶尔出现问题,但是Linux代码的可用性帮助我们一次又一次地探索和理解系统行为。
在适当的情况下,我们改进内核并与开源社区共享这些更改。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章