The New Linux Speed Trick 426
Brainsur quotes a story saying "
Linux kernel 2.6 introduces improved IO scheduling that can increase speed -- "sometimes by 1,000 percent or more, [more] often by 2x" -- for standard desktop workloads, and by as much as 15 percent on many database workloads, according to Andrew Morton of Open Source Development Labs. This increased speed is accomplished by minimizing the disk head movement during concurrent reads.
"
Cool (Score:4, Informative)
"The anticipatory scheduling is so named because it anticipates processes doing several dependent reads. In theory, this should minimize the disk head movement. Without anticipation, the heads may have to seek back and forth under several loads, and there is a small delay before the head returns for a seek to see if the process requests another read. "
"The deadline scheduler has two additional scheduling queues that were not available to the 2.4 IO scheduler. The two new queues are a FIFO read queue and a FIFO write queue. This new multi-queue method allows for greater interactivity by giving the read requests a better deadline than write requests, thus ensuring that applications rarely will be delayed by read requests."
Nice, but this is making things more complex. I admit I'll just keep all kernel settings at wherever Mandrake sets them as. Will other people play about and specialise their system for the task that it does?
Re:SCSI (Score:3, Informative)
Re:1,000 percent? (Score:5, Informative)
Re:SCSI (Score:2, Informative)
I run WebGUI [plainblack.com] on this machine, which recieves some 3 and a quarter million hits per month. Nothing to raise the eye brows at; but check it: on this machine the average uptime value is some 0.80. My personal (p3) machine, running a BBS, mail, bittorent, and web service maintains a constant 1.3+.
I've guaged the importance of SCSI drives in the equation via a (sadly) messy, but soon to be SourceForged Perl program. The result, confirming that which I've heard repeatedly, is that SCSI drives truly make the difference.
Re:Why not combine those two methods? (Score:5, Informative)
Re:Anti-MS Patent (Score:1, Informative)
Re:1,000 percent? (Score:5, Informative)
15% = 1.15x
100% = 2x
200% = 3x
300% = 4x
900% = 10x
1000% = 11x
a % = (a+100)/100 x
Re:Amiga Disks (Score:3, Informative)
Which version of Windows are you referring to? While risking to sound like a fan boy here, I must say that the OS load times for XP are quite fast compared to previous versions and to most vanilla linux distributions I've tried in the past (Mandrake 9.x, Redhat8/9). Whether or not this is in relation to resolving two processes arguing over access to read from the disk, I don't know. Does anyone have any more information on this?
Benchmark (Score:5, Informative)
The benchmark was made before 2.6.0, but I still think it shows the big difference from the 2.4 IO scheduler.
Quote:
Executive summary: the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster.
CFQ (Score:4, Informative)
With anticipatory or deadline, I'm experiencing awful skips with artsd under KDE 3.2 every time there is a heavy disk access, but it's [almost] completely gone with cfq.
To use it, compile a -mm kernel and add the 'elevator=cfq' to the kernel boot parameters through Lilo or Grub.
See this lwn article [lwn.net] for more info.
Re:SCSI (Score:5, Informative)
Yeah, I think so. IIRC it's called tagged command queueing - the drive can have multiple requests pending and instead of doing them first come first served, they're fulfilled in order of estimated latency to that point.
I believe Western Digital's recent Raptor IDE drives have the same feature.
The benefit of this seems contingent upon having multiple requests pending, which AFAIK is hard on linux as there's no non-blocking file IO. To me, this reads like a workaround for that.
Re:SCSI (Score:5, Informative)
Expensive, yes. Aging, no. Ten years ago people said SCSI was the future. Now everyone runs it, they just don't know it.
IDE in its original form has never been able to keep up with a 10k RPM (or higher) disk.
I think what the parent post is alluding to is Tagged Queuing. Tagged Queueing allows you to group blocks together and tell the drive to write them in some priority. That sort of thing is used to guarantee journaling and such. Interestingly, the lack of this mechanism is why many IDE drives torch journalled fs's when they lose power during a write--they do buffering but without some sort of priority. You can imagine I was pretty torqued the first time I had to fsck an ext3 (or rebuild-tree on reiserfs) after a power failure.
The reason that the kernel helps even with the above technology is that the drive queue is easily filled. Even when you have a multimegabyte drive cache and a fast drive, large amounts of data spread over the disk can take a while to write out.
This scheduler is able to take into account Linux's entire internal disk cache (sometimes gigs of data in RAM) and schedule that before it hits the drives.
You're misunderstanding something... (Score:5, Informative)
cause processes to block while waiting for the data (and can thus stall processes for long amounts of time if not scheduled appropriately), whereas writes are typically fire-and-forget. This last bit means that you can usually just queue them up, return control to the user program, and perform the actual write at some more convenient time, i.e. later. Since reads (by the same process) are usually also heavily interdependent, it is also a win to schedule them early from that POV.
That's my understanding of it.
Re:Cache? (Score:5, Informative)
Sure, and both Linux 2.4 and 2.6 do caching and read-ahead (reading more data than requested, hoping that the application will request the data in the future).
The I/O scheduler however lies beneath the cache layer. When it's decided that data must be read from or written to disk, the request is placed in a queue. The scheduler may reorder the queue in order to minimize head movements.
Also, 2.6 has the anticipatory I/O scheduler: after a read, the scheduler simply pauses for a (very) short period. This is done in the assumption that the application will request more data from the same general area on the disk. Even when other requests are in the I/O queue, requests to the area where the disk's heads are hovering will get priority.
While this increases latency (the time it takes for a request to be processed) a bit, throughput (the amount of data transfered in a time period) will also increase.
It did take a fair amount of experimenting and tuning in order to make the I/O scheduler work as well as it does now. However there still may be some corner cases where the new scheduler is much slower than the old.
Re:But how? (Score:2, Informative)
Re:I've found the opposite (Score:5, Informative)
Re:Cool (Score:5, Informative)
Sorry for biting on the troll but I felt like explaining it.
Re:Anti-MS Patent (Score:4, Informative)
Re:Why not combine those two methods? (Score:4, Informative)
Nice of you to point out the mistake like an ass, though. (Yes, just like I'm doing.)
-Rob, a Canadian in Finland
Re:Preemptive and Defragged? (Score:1, Informative)
The only reason a disk would need defragged is if the FS sucks so badly that it causes massive fragmentation. Utilizing better storage methods such as certain RAIDs, LVM, and any of the Linux or Unix filesystems drastically reduces any problems one might have.
As an example, I'm using UFS2 (FreeBSD 5.2.1) with soft updates and have been using the same UFS slices for well over a year (not always 5.2.1) constantly (webserver, fileserver, and I do some light compiling) and have 0.2% fragmentation. You're just used to terrible filesystems.
Re:But how? (Score:3, Informative)
Anyway, simple sorting on LBA address will typically reduce head seeks to a large extent, resulting in most of the potential benefit. It is important however to make sure that multiple requests are available to the driver to sort.
Kernel comparison on a SMP system (Score:4, Informative)
Re:When will Linux do direct IO? (Score:1, Informative)
Re:what's old is new again (Score:5, Informative)
The anticipatory scheduler tries to anticipate future requests (who would have guessed that?), and is relatively new [acm.org]
Re:Amiga Disks and CD-ROMs (Score:2, Informative)
Of course, this raises the point that aligning the data on a game CD or DVD for a console is a science in itself. PC game development is easy in comparison! (plonk everything on the hard drive)
phil
Re:Amiga Disks (Score:4, Informative)
Being multi-user complicates things even further. Sure, you are a single user on a desktop machine, and you double click on two programs in rapid succession, queuing them for loading one after the other may be the right thing to do. But what if those programs are actually being loaded by two different users? Can we completely lock out one user just because they started loading their program slightly later? Again, what if user A runs emacs, and a fraction of a second later, user B runs ls? Under your system, B effectively has to wait as long as it would take to load emacs, plus as long as would take to load ls?
You can't even realistically seperate the queues by user. In many situations, a single unix user may be running on behalf on many physical users (AKA human beings
I'm not saying that any of these problems are intractable (Linux is now doing a pretty fine job), just that they aren't as even remotely as trivial as queuing loads one after another.
Oh BTW, thanks for bringing back happy Amiga memories. Them were the days!
Re:what's old is new again (Score:1, Informative)
Someone should point out... (Score:2, Informative)
Re:Speed-ups (Score:3, Informative)
You don't mean multiple heads on an arm. Multiple heads on an arm would all move together, and you couldn't use two at the same time - the feedback servo which keeps it on track can only respond to one track. What you mean, I think, is two groups of arms (all the arms move together). Manufacturers have looked at that but decided against it.
The arms and associated actuators are some of the most expensive parts of the drive. If you are going to double this cost, why not throw in a few more platters andd an enclosure and have twice the capacity - and twice the throughput.
Putting two actuators in the drive increases power consumption a lot, and heat as well. Both are real problems for current drives. And a "specialist" drive doesn't have the economies of scale, and could cost more than twice as much as two simple drives - which, together, have the same number of heads and twice the capacity.
The real killer is turbulence. If you have two arms on the same surface, each is flying in the wake of the other. And, unlike its own wak, the other alters dynamically, so that seeking arm 1 can perturb arm 2.
Google has it right - lots of dumb hardware, lots of clever software. What we need id filesystems whos allocation patterns are "Raid aware". Particularly Raid 0, I can see file system allocation patterns which could (in conjunction with the otimisations mentioned here) greatly improve performance.
Re:NOVELL/NETWARE DID THIS IN 1991 (Score:2, Informative)
Re:I second that. (Score:2, Informative)
M.
--
Monete Italiane [altervista.org]
Re:Amiga Disks (Score:3, Informative)
AFAIK, the reason Windows used to take ages to boot was that drivers and services were started sequentially and no optimaztion was ever done for the boot process. Windows XP, OTOH, had a goal of less than 30 seconds for a cold boot. In order to achieve this, new BIOS specs were implemented as well as optimization of the boot process. The main things done to speed up the boot process were doing driver and service initialization and disk I/O in parallel, and prefetching. MS claims [microsoft.com] a 4-5x increase in speed using a chunked read of all boot files, but others disagree [serverworldmagazine.com] and think that prefetching accounts for most of the increase.
With a new PC and a fresh install of XP, it's very possible to get to the desktop in less than 30 seconds. Even with my aging PIII-500MHz laptop (without the BIOS optimizations called for by MS) and with additional startup software, my PC is usable in less than a minute. To be honest, it's the one reason I switched to XP from 2000.
Re:Disk Transfer QoS (Score:4, Informative)
Two words: IRIX, XFS.
IRIX had some sort of "quality of service applied to disk accesses", as you wrote, thanks to XFS. The filesystem allows defining zones that have a "minimal throughput" configured. I can't say more about it because I know only by referrals of another people O:-)
XFS is available for Linux since 2.6.0 and 2.4.24, IIRC, and I think this feature is also available in the latest kernels. Though it's still experimental, IIRC.
2.6.x faster in other ways, too (Score:4, Informative)
I don't know if it's due to SpeedStep support being in the kernel or what, but when I was running 2.4.x with the pre-emptible kernel patches, switching from wall power to battery power meant massive slowdowns, as though I had switched from a PIII-1GHz to a 100MHz Pentium classic. Simple commands like "ps" would take seconds to complete and screen redraws were visible. The whole system would feel like sludge. In spite of this fact, battery life was relatively poor. The combined effect (much slowed system, very short battery life) meant that it was difficult to get anything at all done on battery power.
Now with 2.6.x, when I switch to battery power, there is no perceptible slowdown whatsoever when compared to wall power, and battery life is much improved. Downside: suspending 2.6.x kills USB-uhci, so I've had to compile it as a module and hack up my suspend/resume scripts to reload it each time. But for the speed increase, it's well worth the trouble.
Heh. (Score:2, Informative)
Re:How can I set the boot parameters? (Score:3, Informative)
The anticipatory scheduler is the default for the vanilla 2.6 kernel.
Re:Preemptive and Defragged? (Score:2, Informative)
That way, you don't need to move the head nearly as much as if you responded directly to the other process.
Robert Love has written an excellent article about the new schedulers here: I/O Schedulers [linuxjournal.com]
Standard mistake (Score:2, Informative)
Linux doesn't work like that. The vast majority of people who work to improve linux aren't doing it because they're getting paid, and instead work on or focus on what interests them. If someone is focusing on feature X, that's not necessarily taking any time or energy away from feature Y - if they weren't doing X, they might very possibly not be contributing to Linux at all.
Seriously, complaints like this remind me of a manager coming in and discovering that some developers were talking about the finer points of thread interactions in a specific application and saying: "Who cares how the threading works? I just want something the customers can use!"
If it makes you feel better, you should learn to simply ignore discussion of technical features that upset you - this discussion does not in fact take away from discussions of user friendliness nor does it imply that the user will be forced to follow this discussion in order to use the outcome. If the user wishes, anyway, to follow this discussion then they might glean something interesting from it, but supplying the users with extra optional information can't be a bad thing, can it?
And as for access time, I have to ask: making the computer as a whole more responsive to my actions won't make me like using it better? Maybe 10% isn't going to make much perceived difference most of the time, but when it means the difference between a stutter-free movie playback and the occasional dropped frame, I'm going to notice.
Re:Disk Transfer QoS (Score:2, Informative)
research background for anticipatory scheduling (Score:4, Informative)
another I/O speed trick: mount with noatime (Score:3, Informative)
If you don't care about last access times on your files, then you should consider mounting your filesystems with the noatime mount flag as in this /etc/fstab line:
Reading a file under noatime means that the kernel does not need to go back and update the last access time field of that file's inode. Sure, multiple reads over a span of a few seconds will only cause the in-core inode to be modified, but eventually that modified inode must be flushed out to disk. Why cause an extra write to the disk for a feature that you might not care about?
For example: think about those cron jobs / progs that scan the file tree (tmpwatch, updatedb, etc.). Unless you mount with the noatime option, your kernel must at least update the last access time fields of every directory's inode! Think about those /etc files that are
frequently read (hosts, hosts.allow, DIR_COLORS,
resolv.conf, etc.) or the dynamic shared libs
(libc.so.6, ld-linux.so.2, libdl.so.2, etc.)
that are frequently used by progs. Why
waste write-ops updating their last access
time fields?
Yes, the last access time field has some uses. However, the the cost of updating those last access timestamps, IMHO, is seldom worth the extra disk ops.
There are other advantages to using the noatime mount option ... however to
wind up this posting I'll just say that I
always mount my ext3 filesystems with the
noatime mount flag. I recommend that
you consider looking into this option if you
don't use it already.
Re:I've noticed it... (Score:2, Informative)
I did this on a Compaq Presario 2100 laptop. Lookup The ACPI4Linux project [sourceforge.net].
Re:Heh. (Score:2, Informative)
I'm not sure where you'd find it, but you might make some headway searching for "anticipatory scheduler" on kerneltrap.org [kerneltrap.org]. This scheduler was discussed multiple times on that site.
--JoeFreeBSD runs faster (Score:1, Informative)