Jens Axboe On Kernel Development 68
BlockHead writes "Kerneltrap.org is running an interview with Jens Axboe, 15 year Linux veteran and the maintainer of the linux kernel block layer, 'the piece of software that sits between the block device drivers (managing your hard drives, cdroms, etc) and the file systems.' The interview examines what's involved in maintaining this complex portion of the Linux kernel, and offers an accessible explanation of how IO schedulers work. Jens details his own CFQ, or Complete Fair Queue scheduler which is the default Linux IO scheduler. Finally, the article examines the current state of Linux kernel development, how it's changed over the years, and what's in store for the future."
Disagree with Mr. Axboe... (Score:5, Interesting)
If core changes of such magnitude are no longer sufficient to merit a dev branch or even a major point release, why bother with the "2.6" designation at all? Just pull a Solaris and call the next release "Linux 20" or "Linux XX."
-Isaac
Where are they now? (Score:4, Interesting)
No block devices = no disk scheduling? (Score:5, Interesting)
At risk of starting a holy war, is there any reason why one approach would be superior? And do they lend themselves to different methods of scheduling? In TFA, Axboe talks about [1] the scheduling mechanism used in later versions of the 2.6 kernel series, which alleviates a problem that I (and most other people, probably) have run into before.
I'm curious, because although I don't use any of the 'real' BSDs very often -- I spend most of my time (at home, anyway) using either Mac OS X, which uses the Mach/XNU kernel (which is derived from 4.3BSD, although I don't know if the I/O scheduler has been rewritten since then), or Linux with the 2.6 kernel, and it seems to me that OS X's disk I/O leaves something to be desired compared to Linux's.
Does BSD handle I/O differently in some fundamental fashion than Linux? It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer" (in the event of a crash) but seems like it would have big performance penalties when using drives that aren't very smart, and don't do a lot of caching and optimization on their own. It seems like getting rid of I/O scheduling altogether is a stiff price to pay for "safety."
[1] (quoting because there doesn't seem to be anchors in TFA)
Missing Question: How do you pronounce your name? (Score:3, Interesting)
Re:Missing Question: How do you pronounce your nam (Score:4, Interesting)
This is what Slashdot is about (Score:4, Interesting)
BTW, does anyone have a good set of benchmarks of the performance of different IO schedulers when running one or two or three IO intensive tasks, when running one intensive and many small tasks, etc.? That would actually help me decide whether to rebuild my kernel with CFQ.
Also, ionice would have made my old machine much more usable when doing backups... Oh well.
Scheduling better than no scheduling? (Score:5, Interesting)
Reading TFA piqued my interest into I/O scheduling and I've been doing some reading on it, and it seems like there are several competing schools of thought, of which Axboe (and potentially the Linux kernel developers generally) are only one.
An alternative view, such as this from Justin Walker (a Darwin developer) on the darwin-kernel mailing list [apple.com], holds that it's not worthwhile for the OS kernel to do much disk scheduling, since "the OS does not have a good idea of the actual disk geometry and other performance characteristics, and so we [kernel developers] leave that level of scheduling up to the controllers in the disk drive itself. I think, for example, that recent IBM drives have some variant of OS/2 running in the controller. Since the OS knows nothing about heads, tracks, cylinders for modern commodity disks, it's futile to try to schedule I/O for them." (written Mar 2003)
Axboe seems to acknowledge that this may sometimes be the case, because they do have the 'non-scheduling scheduler,' which he recommends only for use with very intelligent hardware. However, it seems like some people think that commodity drives are already 'smart enough' to do their own scheduling.
It seems like determining which approach was superior would be relatively straightforward, and yet I've never seen it done (although maybe I'm just not looking in the right places). Anecdotally, I'm tempted to agree with Axboe, since it seems like, when doing things where several processes are all thrashing the disk simultaneously, my Linux machine feels faster than my OS X one, but this is by no means scientific (they don't have the same drives in them, not working with the same datasets, etc.).
On what drives, and under what conditions, is it advantageous to have the OS kernel perform scheduling, and on which ones is it best just to pass stuff to the drive and let the controller do all the thinking?
Re:No block devices = no disk scheduling? (Score:3, Interesting)
However, that is actually one of the benefits of character devices. They're lightweight on the hardware and the software, making "routine" activity extremely fast and efficient, and making it easier to be sure everything is correct and robust. For most "normal" activity, you're not wanting to do anything particularly complex. Wordprocessors, by and large, are not based on scatter/gather algorithms, and it is rare to find non-sequential MP3s. Also bear in mind that most CPUs outpace memory tens, if not hundreds, of time over - they are certainly going to outpace any peripherals a person might have. Why accelerate the kernel, if the kernel isn't the bottleneck? That just risks introducing bugs with no obvious gain.
Myself, I believe that it's stupid to design limitations into one component because of limitations in another. The limitations in the other component will be subject to change, but the designed limitations will hang around for much longer. I also think it's stupid to look at current typical use. Current typical use is dictated by what is currently practical. If you change what is practical, you will change what is typical use. The OS and the users are not independent of one another. What people wanted is unimportant, it's what people want to want that should dictate what OS writers should want to offer. And, yes, I believe that direct data placement has the potential to eliminate the need for both binary-only drivers and heavy-weight kernels.
(Linux contains a huge number of very low-level drivers, and is limited in what it can absorb in the way of new high-level functionality because of the risk of breakage and the difficulty of maintaining such a gigantic tree. If those had all been intelligent peripherals, the same amount of effort and coding would have produced a kernel with staggering capabilities and electronic superpowers. The drivers can't go away, even if intelligent devices replace the dumb ones of today, because people will use legacy stuff. Actually, it's worse. As Microsoft showed with Winmodems and Winprinters, it's possible to sell people dumber-than-dumb devices and even heavier-weight software that does a worse job, slower.)
Re:Reiser4 (Score:2, Interesting)