Netoops

By Jonathan Corbet
November 10, 2010
A kernel oops produces a fair amount of data which can be useful in tracking down the source of whatever went wrong. But that data is only useful if it can be captured and examined by somebody who knows how to interpret it. Capturing oops output can be hard; it typically will not make it to any logfiles in persistent storage. That's why we still see oops output posted in the form of a photograph taken of the monitor. Using cameras as a debugging tool can work for a desktop system, but it certainly does not scale to a data center containing thousands of systems. Google is thought to operate a site or two meeting that description, so it's not surprising to see an interest in better management of oops information there.

Google has had its own oops collection tool running internally for years; that has recently been posted for merging as netoops. Essentially, netoops is a simple driver which will, in response to a kernel oops, collect the most recent kernel logs and deliver them to a server across the net. The functionality seems useful, but the first version of the patch was questioned: netoops looks somewhat similar to the existing netconsole system, so it wasn't clear that a need for it exists. Why not just add any missing features to netconsole?

Mike Waychison, who posted the patch, responded with a number of reasons which have since found their way into the changelog. Netoops only sends data on an oops, so it is less hard on network bandwidth. The data is packaged in a more structured manner which is easier for machines and people to parse; that has enabled the creation of a vast internal "oops database" at Google. Netoops can cut off output after the first oops, once again saving bandwidth. And so on. There are enough differences that netconsole maintainer Matt Mackall agreed that it made sense for netoops to go in as a separate feature.

That said, there is clear scope for sharing some code between the two and, perhaps, improving netconsole in the process. The current version of the netoops patch includes new work to bring about that sharing. There seems to be no further opposition, but it's worth noting that Mike, in the patch changelog, notes that he's not entirely happy with either the user-space ABI or the data format. So this might be a good time for others interested in this sort of functionality to have a look and offer their suggestions and/or patches.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章