Please create an account to participate in the Slashdot moderation system


Forgot your password?
Linux Software

The New Linux Speed Trick 426

Brainsur quotes a story saying " Linux kernel 2.6 introduces improved IO scheduling that can increase speed -- "sometimes by 1,000 percent or more, [more] often by 2x" -- for standard desktop workloads, and by as much as 15 percent on many database workloads, according to Andrew Morton of Open Source Development Labs. This increased speed is accomplished by minimizing the disk head movement during concurrent reads. "
This discussion has been archived. No new comments can be posted.

The New Linux Speed Trick

Comments Filter:
  • I've noticed it... (Score:5, Interesting)

    by Anonymous Coward on Tuesday April 06, 2004 @08:03AM (#8778302)
    I'm having trouble getting ACPI working in my laptop in the 2.6 kernel (it's a bad implementation on the part of my laptop). The 2.4 series used to work (sometimes) so I installed Mandrake's 2.4 kernel and 2.6 kernels on my laptop. Using 2.4.x again was like switching to a horse and buggy from a sport-cars; KDE was that much faster with the 2.6.x kernel running the show.
    • by gazbo ( 517111 )
      Well hold on a minute: this is talking about the speed increases from a particular subsystem - the IO scheduler. Upgrading from 2.4 to 2.6 changes a hell of a lot more than just that, so your speedup could be from any number of things.

      From the sound of it you're talking about perceived speed for a desktop user, as opposed to measured server throughput. If this is the case, I imagine the biggest speed increase comes from the fact that (I believe) 2.6 offers far lower latency in the kernel, allowing it to

  • Cache? (Score:4, Interesting)

    by Anonymous Coward on Tuesday April 06, 2004 @08:04AM (#8778306)
    Whatever happened to cache. If you can anticipate the head movement surely you have already read the data before and it should be in the cache????
    • by warrax_666 ( 144623 ) on Tuesday April 06, 2004 @08:38AM (#8778490)
      AFAIK the "anticipation" bit is not so much about predicting head movement, but is more about reducing head movement. Reads
      cause processes to block while waiting for the data (and can thus stall processes for long amounts of time if not scheduled appropriately), whereas writes are typically fire-and-forget. This last bit means that you can usually just queue them up, return control to the user program, and perform the actual write at some more convenient time, i.e. later. Since reads (by the same process) are usually also heavily interdependent, it is also a win to schedule them early from that POV.

      That's my understanding of it.
    • IIRC, the OS is supposed to do some caching (in the old days a sync command helped flush buffers onto disk before shutdown), but it's not an explicit kind of think like this persistent memory-mounted filesystem [], which I've always thought was interesting.

      If you ever think about how inefficient it would be for the system to go read /bin/ls every time you typed the ls command you could see where caching is a damn good idea.

      Doing read-ahead, write-behind and maintaining coherency isn't easy, from what little

    • Re:Cache? (Score:5, Informative)

      by Erik Hensema ( 12898 ) on Tuesday April 06, 2004 @08:44AM (#8778548) Homepage

      Sure, and both Linux 2.4 and 2.6 do caching and read-ahead (reading more data than requested, hoping that the application will request the data in the future).

      The I/O scheduler however lies beneath the cache layer. When it's decided that data must be read from or written to disk, the request is placed in a queue. The scheduler may reorder the queue in order to minimize head movements.

      Also, 2.6 has the anticipatory I/O scheduler: after a read, the scheduler simply pauses for a (very) short period. This is done in the assumption that the application will request more data from the same general area on the disk. Even when other requests are in the I/O queue, requests to the area where the disk's heads are hovering will get priority.

      While this increases latency (the time it takes for a request to be processed) a bit, throughput (the amount of data transfered in a time period) will also increase.

      It did take a fair amount of experimenting and tuning in order to make the I/O scheduler work as well as it does now. However there still may be some corner cases where the new scheduler is much slower than the old.

  • SCSI (Score:4, Interesting)

    by Zo0ok ( 209803 ) on Tuesday April 06, 2004 @08:04AM (#8778308) Homepage
    Dont SCSI drives do this themselves?
    • Re:SCSI (Score:2, Insightful)

      by NfoCipher ( 161094 )
      SCSI is still so expensive and an aging technology. The idea is to get more out of the cheap IDE drives.
      • Re:SCSI (Score:5, Informative)

        by KagatoLNX ( 141673 ) <`ten.ajuos' `ta' `otagak'> on Tuesday April 06, 2004 @08:35AM (#8778474) Homepage
        ATA is basically the SCSI protocol (the good part) over IDE. There's a reason why some SATA drives appear as SCSI adapters under Linux.

        Expensive, yes. Aging, no. Ten years ago people said SCSI was the future. Now everyone runs it, they just don't know it.

        IDE in its original form has never been able to keep up with a 10k RPM (or higher) disk.

        I think what the parent post is alluding to is Tagged Queuing. Tagged Queueing allows you to group blocks together and tell the drive to write them in some priority. That sort of thing is used to guarantee journaling and such. Interestingly, the lack of this mechanism is why many IDE drives torch journalled fs's when they lose power during a write--they do buffering but without some sort of priority. You can imagine I was pretty torqued the first time I had to fsck an ext3 (or rebuild-tree on reiserfs) after a power failure.

        The reason that the kernel helps even with the above technology is that the drive queue is easily filled. Even when you have a multimegabyte drive cache and a fast drive, large amounts of data spread over the disk can take a while to write out.

        This scheduler is able to take into account Linux's entire internal disk cache (sometimes gigs of data in RAM) and schedule that before it hits the drives.
        • Re:SCSI (Score:3, Insightful)

          by shic ( 309152 )
          A question... I've asked this before without an appropriate answer.

          If I'm writing a user-land program which memory maps a large file, modifies it in memory - then uses msync() to write to disk - what can be safely assumed?

          • Can I assume that the thread (or process?) calling msync() will block until the data is successfully written to stable storage?
          • Can I assume that in a sequence of calls msync()_1 .. msync()_n that the writing of the data associated with msync_i implies that for all j<i that msync()_j
          • Re:SCSI (Score:3, Insightful)

            by Qzukk ( 229616 )
            looks like as long as you use MS_SYNC as a flag, on a file on local hardware, you can trust that the data is at the harddrive, if not on it (thanks to the drive cache). Not sure what happens if you try this on a network file system, whether it forces the hosting computer to flush to disk, or if it only forces the local computer to flush to the host.

            As for order of pending writes, I don't think you get to have a say on any particular writes, but you can sync after writing to commit everything so far.

            See a
            • Re:SCSI (Score:3, Insightful)

              Not sure what happens if you try this on a network file system, whether it forces the hosting computer to flush to disk, or if it only forces the local computer to flush to the host.

              Depends on the server - you can request it, but the server isn't obligated to comply.

        • Re:SCSI (Score:5, Insightful)

          by jesup ( 8690 ) * <randellslashdot@jesup. o r g> on Tuesday April 06, 2004 @10:37AM (#8779547) Homepage
          ATA is definitely not SCSI-over-IDE.
          ATAPI is SCSI-over-IDE however.

          I wrote the IDE/ATA drivers for the Amiga. The Amiga SCSI drivers accepted "SCSIDirect" commands from applications. Internally, all IO commands were converted to SCSIDirect commands for execution. To implement ATA, I added a SCSIDirect->ATA translator (which wasn't that hard - about 3 weeks from start to working, booting system - and I implemented just about all SCSI commands even semi-reasonable (all of CCS I think, plus quite a bit).

          Doing it this way made implementing support for ATAPI CDROMs (something I did as a contract after Commodore folded) Very Easy. :-)
        • Re:SCSI (Score:3, Insightful)

          by jesup ( 8690 ) *
          Tagged Queuing in SCSI is a good thing.

          Trying to do all the reordering in the OS (as suggested in several posts here) seems like a good idea, but ignores some issues:
          1. Disks aren't a uniform array of blocks, and even if you have disk geometry it's almost certainly at least simplified, and probably a total lie. (you can query a SCSI drive for the next "slowdown", but what that means is ill-defined, and not that useful anyways.)
          2. Because of (1), you don't know when blocks are on the same track or not.
          3. i
    • Re:SCSI (Score:3, Informative)

      by B1ackDragon ( 543470 )
      Since it mentioned that the OS is keeping "per-process statistics on whether there will be another dependent read 'soon'", I really doubt the drive controller would even be able to do that, much less want to.
    • Re:SCSI (Score:2, Informative)

      by pararox ( 706523 )
      As a college student, I feel proud to say I've access to a quad-Xeon SCSI machine; this bad thing truly burns.

      I run WebGUI [] on this machine, which recieves some 3 and a quarter million hits per month. Nothing to raise the eye brows at; but check it: on this machine the average uptime value is some 0.80. My personal (p3) machine, running a BBS, mail, bittorent, and web service maintains a constant 1.3+.

      I've guaged the importance of SCSI drives in the equation via a (sadly) messy, but soon to be SourceForg
      • Re:SCSI (Score:3, Insightful)

        by awx ( 169546 )
        I'm sure you mean load value, not uptime value. An uptime of 0.8 days isn't really that impressive...
        • Re:SCSI (Score:3, Funny)

          by Anonymous Coward
          An uptime of 0.8 days isn't really that impressive...

          It's obviously been a long time since you used Windows.
    • Re:SCSI (Score:5, Informative)

      by DuSTman31 ( 578936 ) on Tuesday April 06, 2004 @08:35AM (#8778469)

      Yeah, I think so. IIRC it's called tagged command queueing - the drive can have multiple requests pending and instead of doing them first come first served, they're fulfilled in order of estimated latency to that point.

      I believe Western Digital's recent Raptor IDE drives have the same feature.

      The benefit of this seems contingent upon having multiple requests pending, which AFAIK is hard on linux as there's no non-blocking file IO. To me, this reads like a workaround for that.

  • 1,000 percent? (Score:2, Insightful)

    by lonegd ( 538164 )
    That seems rather high. Either something was broken/badly coded or someone's been adding a couple of zero's ;)

    Linux Devices has an article on the 2.6 network features here []

    • Re:1,000 percent? (Score:4, Insightful)

      by aastanna ( 689180 ) on Tuesday April 06, 2004 @08:09AM (#8778341)
      Seems OK to me, that's a 10x improvement, and that was the theoretical high end example. Since they said it would commonly increase speed by 2x, and 15% for databases, it seems right in line.

      I suppose that since database data is generally grouped together and read in a big chunk there's less room for improvment.
    • 1000% written as a decimal factor is 10.00, or a 10-fold improvement. When dealing with latency times measured in milliseconds, that's not too out of the ordinary. I'm no expert, but look at this situation: (someone correct me if I'm wrong)

      Say, if a block is read on one end of the platter, then 10 subsequent reads are read in close proximity at the other end, followed by an 11th read at the beginning again, a predictive seeker could re-prioritize the 11th seek to be right after the first. That would cut
  • Cool (Score:4, Informative)

    by JaxWeb ( 715417 ) on Tuesday April 06, 2004 @08:05AM (#8778313) Homepage Journal
    It seems there are two IO modes you can choose from, at boot time.

    "The anticipatory scheduling is so named because it anticipates processes doing several dependent reads. In theory, this should minimize the disk head movement. Without anticipation, the heads may have to seek back and forth under several loads, and there is a small delay before the head returns for a seek to see if the process requests another read. "

    "The deadline scheduler has two additional scheduling queues that were not available to the 2.4 IO scheduler. The two new queues are a FIFO read queue and a FIFO write queue. This new multi-queue method allows for greater interactivity by giving the read requests a better deadline than write requests, thus ensuring that applications rarely will be delayed by read requests."

    Nice, but this is making things more complex. I admit I'll just keep all kernel settings at wherever Mandrake sets them as. Will other people play about and specialise their system for the task that it does?
    • Re:Cool (Score:2, Insightful)

      by Alrescha ( 50745 )
      "and there is a small delay before the head returns for a seek to see if the process requests another read."

      It's early, but did read/write heads suddenly develop intelligence while I was napping?

    • It may be more complex, but it works damn well. Squeezing an extra 30-40% out of my laptop doing disk i/o has been really worth it. Especially since I've been editing and processing 600M files on it lately.
    • "I admit I'll just keep all kernel settings at wherever Mandrake sets them as. Will other people play about and specialise their system for the task that it does?"

      Perhaps that is why the default setting is the one indicated for desktop users.

      And yes, if I were using a Linux box for specific server tasks then I would tweak the settings to get a bit more performance out of it.

    • Re:Cool (Score:2, Insightful)

      by Jeff DeMaagd ( 2015 )
      It sounds a lot like software version of the tagged command queueing that SCSI and high-end ATA drives have. I think having it in the OS would sort of defeat the drive's feature but the OS has more memory and horsepower available to it to reduce average access time.

      I think this would work to minimize the impact of a slow access drive in a heavily multitasking system too.
  • by maxwell demon ( 590494 ) on Tuesday April 06, 2004 @08:06AM (#8778327) Journal
    Is there any reason why the prediction code (anticipatory scheduler) and the extra queues (deadline scheduler) couldn't be combined in a single scheduler to give us the best of both worlds?
    • by mirko ( 198274 ) on Tuesday April 06, 2004 @08:14AM (#8778365) Journal
      what would you have expected the kernel 2.8 to bring you ?

      Basically, I think this is like the windows system settings : you either privilegiate front end services (GUI) or back end services (apache, etc) but you cannot do both because some would be optimized for reactivity, the others to handle the workload... like a ferrari and a truck... this doesn't work nor excel in the same way.

    • by Anonymous Coward on Tuesday April 06, 2004 @08:17AM (#8778381)
      I believe that the anticipatory sched uses the model of the deadline sched. See "Linux Kernel Development" by Robert Love.
  • Amiga Disks (Score:5, Interesting)

    by tonywestonuk ( 261622 ) on Tuesday April 06, 2004 @08:12AM (#8778351)

    When I had an Amiga (aroung '91ish), even though It was fully multitasking, I learnt to never open any app while another was loading. If you did, you could hear the disk head moving back and forward between two sectors on disk every half second or so, slowing both app launches to a crawl. Waiting until one loaded, and launching the second was many times faster.

    I've always wondered why there wasn't something in the OS to force this behaviour, Ie, making sure that App 2 access to the disk is queued until app 1 has finished. Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).
    • Re:Amiga Disks (Score:3, Informative)

      by jtwJGuevara ( 749094 )
      Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).

      Which version of Windows are you referring to? While risking to sound like a fan boy here, I must say that the OS load times for XP are quite fast compared to previous versions and to most vanilla linux distributions I've tried in the past (Mandrake 9.x, Redhat8/9). Whether or not this is in relation to resolving two processes arguing over access to read from the disk, I don't know. Does

    • Re:Amiga Disks (Score:3, Interesting)

      by MrFreshly ( 650369 )
      I think windows loads slow because of all the damn spyware and un-necessary default drivers and services and IE preloading...etc...

      IMHO Default windows config is kinda like a Redhat with everything and then some.

      Start in safe mode and watch all the crap that tries to load - a ton of it is not needed.

      If you tighten your install by removing a lot of the extra services, spyware, and a few performance tweaks - you'll see a major speed increase over all.

      I use XP, but I don't like it much anymore...It's s
    • Windows only takes ages to boot if something's wrong. My machines (p3 1ghz, p4 2.4ghz) all boot up in under a minute... I'd try taking a look in your error log :)
    • Yeah, the same thing happens under Windows if you read from CD-ROM. The whole thing just slows to a crawl if you try to read two files at once. I'd assume it's a hardware problem, (long seek times, large error margins) not necessarily Windows' fault, but I don't use CDs much anymore (hooray for ethernet and huge hard drives) so I don't know.

      Of course, this raises the point that aligning the data on a game CD or DVD for a console is a science in itself. PC game development is easy in comparison! (plonk ever
    • Re:Amiga Disks (Score:4, Informative)

      by jarran ( 91204 ) on Tuesday April 06, 2004 @09:00AM (#8778694)
      Because it's a lot more complicated that you suggest. What happens if A gets in first, but is doing an extremely long a disk-bound task? B will never get chance to access the disk. It could even be that B would stop after a very short amount of disk access, in which case it will have to wait until A is done, even though interleaving the reads would have been the "right thing to do".

      Being multi-user complicates things even further. Sure, you are a single user on a desktop machine, and you double click on two programs in rapid succession, queuing them for loading one after the other may be the right thing to do. But what if those programs are actually being loaded by two different users? Can we completely lock out one user just because they started loading their program slightly later? Again, what if user A runs emacs, and a fraction of a second later, user B runs ls? Under your system, B effectively has to wait as long as it would take to load emacs, plus as long as would take to load ls?

      You can't even realistically seperate the queues by user. In many situations, a single unix user may be running on behalf on many physical users (AKA human beings ;) ), e.g. in the case of any kind of server.

      I'm not saying that any of these problems are intractable (Linux is now doing a pretty fine job), just that they aren't as even remotely as trivial as queuing loads one after another.

      Oh BTW, thanks for bringing back happy Amiga memories. Them were the days! :-)
    • Re:Amiga Disks (Score:3, Informative)

      by shyster ( 245228 )

      I've always wondered why there wasn't something in the OS to force this behaviour, Ie, making sure that App 2 access to the disk is queued until app 1 has finished. Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).

      AFAIK, the reason Windows used to take ages to boot was that drivers and services were started sequentially and no optimaztion was ever done for the boot process. Windows XP, OTOH, had a goal of less than 30 seconds for a cold

      • Re:Amiga Disks (Score:3, Interesting)

        by jesup ( 8690 ) *
        Back in the day (Amiga 3000 introduction circa 1990) we could boot an Amiga off a 40MB disk from power-on (including loading a replacement for the ROM image) to fully up (including mounting NFS mounts) in ~15 seconds; 7 seconds from a warm boot. Effectively we chunked the load of much of the OS (including GFX, Workbench (desktop), etc) as part of the softload of the ROM image into RAM. On top of that, on warm boot we checksummed the ROM image in RAM (as well as various major OS library structures), and if
    • Re:Amiga Disks (Score:3, Insightful)

      by Ouroboro ( 10725 ) *

      I've always wondered why there wasn't something in the OS to force this behaviour, Ie, making sure that App 2 access to the disk is queued until app 1 has finished. Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).

      You run into a problem where you don't know when app 1 has finished loading in order to start loading app 2. Why, because a loading app looks no different than a running application. You could possibly get around this by having

    • Re:Amiga Disks (Score:4, Interesting)

      by LWATCDR ( 28044 ) on Tuesday April 06, 2004 @09:55AM (#8779157) Homepage Journal
      Actually even early versions of Novell used a system called elevator seeking. The system worked like an elevator the head moved in one direction of util it hit the lowest track/sector in the que then it changed direction. Not nearly a slick as the new system but a big improvment over a first come first serve system. Now that we have so much more memory and cpu power Linux and other OS can use more complex systems
  • by redcliffe ( 466773 ) on Tuesday April 06, 2004 @08:13AM (#8778357) Homepage Journal
    I've actually found that on my machine, a pretty much standard desktop, response is a lot slower on 2.6.5 than 2.4.22. Not sure if I got something set wrong in the compile, but moving the mouse and stuff like that seems a lot jerkier under load. I use a USB mouse and keyboard, so maybe that's part of it. Anyone else seen similiar?
    • Yup. I've seen the same thing with 2.6.3. When there's a CPU intensive process running, the mouse becomes very jerky. It doesn't seem to make a difference whether it's on the mouse port or USB.

      I've also seen a problem where the Web browser (either Mozilla or Firefox) pegs the CPU. Bad Javascript in a Webpage soemwhere? Never saw this on my old 2.4.23 kernel.
    • Yes. I found that. Then I realised I had the hard disk in PIO only mode. A recompile with DMA support and it's smooth as silk.

    • by bflong ( 107195 ) on Tuesday April 06, 2004 @08:48AM (#8778590)
      Make sure that you set X's "nice" value to 0. Some distros set it to something like -10 so that X is not disturbed by other procs. Under 2.4, this was a good thing. However, under 2.6, with it's superior scheduler, the kernel will keep interrupting X and you will see lagging performance. Google for it to get a better explanation.
    • Desktop Linux needs a scheduling policy specific to interactivity. I guess this may happen the day a decent interface gets slapped on the Linux base. Until then, we dance the same dance - every release is faster than the previous one by the benchmarks, and feels more horrid than the previous one.

      Surprise, the Mac has the same reactivity problem now thanks to its Unix (Mach) kernel, while the previous Mac OS 9 crashed regularly, couldn't multitask, but has a much snappier user-experience. Apple has been ad
  • by Anonymous Coward on Tuesday April 06, 2004 @08:14AM (#8778363)
    Obviously, this was stolen from SCO. This was based on their UNIX software and was available in the baseline from 10 years ago. It only shows that Linux, once again, is not an innovator, but just copies code from SCO to achive its scalability.
  • But how? (Score:2, Insightful)

    is accomplished by minimizing the disk head movement

    I was always under the impression that modern hard drive designs hide the physical disk bits and pieces from the PC. So how can software predict where the heads are?
    • Re:But how? (Score:2, Informative)

      by Anonymous Coward
      Clusters close together are going to be close together on the disk surface. They're not actually talking about controlling the head movement directly, but minimising head movement by realising how a hard disk works in relation to sector accesses .
    • Re:But how? (Score:3, Informative)

      by pseudorandom ( 35988 )
      The absolute translation of logical block to head position is unknown to e.g. Linux. While it is possible to reverse engineer the physical disk layout by looking at timings, for general purpose computing this is going way too far. I think the upcoming ATA-7 hard disk standard has some more options to get information about the layout of the disk, but I'm not sure of that.

      Anyway, simple sorting on LBA address will typically reduce head seeks to a large extent, resulting in most of the potential benefit. It i
  • Disk Transfer QoS (Score:4, Interesting)

    by johnhennessy ( 94737 ) on Tuesday April 06, 2004 @08:20AM (#8778394)
    I think Solaris 10 (or maybe a later version, I can't remember) is suppose to support a concept of Quality of Service applied to disk accesses.

    Is anyone in the Linux world considering this ?

    This is probably more applicable to the enterprise market, but surely any scheme of informing the scheduler about the expected disk transfer characteristics has to improve performance.

    On the other hand, it might be just Sun trying to re-invent uses of buzz words to sell their products.
    • Re:Disk Transfer QoS (Score:4, Informative)

      by Xouba ( 456926 ) on Tuesday April 06, 2004 @09:44AM (#8779059) Homepage

      Two words: IRIX, XFS.

      IRIX had some sort of "quality of service applied to disk accesses", as you wrote, thanks to XFS. The filesystem allows defining zones that have a "minimal throughput" configured. I can't say more about it because I know only by referrals of another people O:-)

      XFS is available for Linux since 2.6.0 and 2.4.24, IIRC, and I think this feature is also available in the latest kernels. Though it's still experimental, IIRC.

  • Benchmark (Score:5, Informative)

    by zz99 ( 742545 ) on Tuesday April 06, 2004 @08:22AM (#8778403)
    Here's an older benchmark [] made by Andrew Morton showing the anticipatory scheduler vs the previous one.

    The benchmark was made before 2.6.0, but I still think it shows the big difference from the 2.4 IO scheduler.

    Executive summary: the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster.
  • by Anonymous Coward on Tuesday April 06, 2004 @08:27AM (#8778428)
    It's great watching the "modern" computer industry discover all the toys and optimisations that where essential engineering for the systems I used to use in the '70s & '80s.

    All the wonderful stuff like disk seek optimisation, interleaved memory (Even MMU came to the moden computer about 15 years after everyone else had it) were technologies that made systems stand out from each other.

    Because of the speed of things these days, lots of that tech has been largely ignored, until now when we're starting to hit hard performance barriers again. Now we have to invent the technology og the '70s all over again. It's nice to see all this stuff comming back though.

    • The Renaissance (Score:4, Interesting)

      by EXTomar ( 78739 ) on Tuesday April 06, 2004 @11:58AM (#8780478)
      And we all would have benifited from this if they simply shared in the first place instead of spending 20-30 years "rediscovering" it.

      One programmer likened the 70-80s as The Dark Ages. There were cabals and secret voodoo that people sat on and didn't share and you ended up with an ignorant masses that only thought "this is as good as it gets". Hopefully this renaissance sticks because it doesn't matter how good or cool your technology is if you bury it for 20 years without another person knowing.
  • CFQ (Score:4, Informative)

    by kigrwik ( 462930 ) on Tuesday April 06, 2004 @08:31AM (#8778446)
    The cfq scheduler in the -mm (Andrew Morton) trees gives very good results in a desktop use.

    With anticipatory or deadline, I'm experiencing awful skips with artsd under KDE 3.2 every time there is a heavy disk access, but it's [almost] completely gone with cfq.

    To use it, compile a -mm kernel and add the 'elevator=cfq' to the kernel boot parameters through Lilo or Grub.

    See this lwn article [] for more info.
  • Real benefits... (Score:4, Insightful)

    by greppling ( 601175 ) on Tuesday April 06, 2004 @08:35AM (#8778470)
    ...for the typical desktop workload would come from a better cooperation between applications, glibc, and the kernel.

    Let me start by claiming that optimizing desktop performanceis all about optimizing I/O patterns (contrary to what all Gentoo users think :P). My KDE startup is about three times as fast when I everything is in the disk cache, so it is clear where the bottleneck. (Just try logging in to KDE after boot, then log out and log in again.) A concentrated effort of

    • passing on the right hints from KDE via glibc to the kernel (e.g. an madvise() call when loading executables giving the hint that probably most part of the file will be needed later on),
    • trying some anticipatory reading of config files/libraries etc. from startkde where it is known that they will be needed, and that they are hopefully laying contigiously on the disk,
    • optimizing disk layout for the common access patterns
    would IMHO make a far bigger difference for the desktop experience than optimizing compiler flags by using gentoo or using a preemptible kernel.

    There has been a lot of discussion about this on the kde-optimize list (with Andrew Morton participating), so maybe we can hope that KDE 3.3 will offer some improvements.

    As an aside, yes, we all hate the windows registry, but I think we should admit that for boot time optimization it is the right thing to do (having everything in one file that is layed out in one contigious block on the disk.)

  • Speed-ups (Score:3, Insightful)

    by jd ( 1658 ) <imipak AT yahoo DOT com> on Tuesday April 06, 2004 @08:39AM (#8778495) Homepage Journal
    I've often wondered what would happen if such I/O speedups were put into hardware. There's plenty of RAM on modern controllers, but caching adjacent tracks is not as efficient as caching distant tracks, so as to minimise the need for moving the read-heads long distances.

    Alternatively, have multiple read-heads on a single arm. 3 would be a good number. The idea here would be that you could pre-seek either side of the disk, before finishing a read by the currently-active arm.

    • Re:Speed-ups (Score:3, Informative)

      by AlecC ( 512609 )
      Effectively Scsi does I/O speedups. Firmware, not hardware, but so is everything. And the speedups by giving Scsi a lot to do and letting it do it in its preferred order can be significant. But Scsi cannot "see" processes - nor file systems. The OS can work out that a process is reading a file and read the next bit of the file - where Scsi would read the next bit of the disk, if it did so at all. The OS can see when you ahve reached EOF, or closed the file, and there is no point prereading.

      You don't mean m

  • Doesn't this involve a green marker, and tracing along the edge of the hard drive? Faster and less distortion?
  • I always heard the Linux wasn't preemptive and that is why embedded developers shy away. Is this the first step towards resolving preemptive issues?

    Also, It sounds like that if Linux had a defrag utility that the data could store data on the disk the way it would be accessed. If the OS would watch to see how the data is being accessed, it could then re-arrange the data dynamically. Example - you access File A which accesses File B and File C, the OS would recognize this and re-arrange the data in that
    • Firstly, the 2.6 kernel allows pre-emptive scheduling. Supposedly it was introduced because Linus got tired of his mp3s skipping while he compiled things.

      Second, Linux doesn't need a defrag utility. Linux filesystems (Ext2 and Ext3) allocate files properly, using clustering and inodes. The need to defrag comes from the bad design of FAT, which works great on a 8088 processor with tiny files on a 1Meg drive, but is terribly inefficient on anything past a 386.

      Of course, there does exist a 'defrag' utility
  • by hughk ( 248126 ) on Tuesday April 06, 2004 @08:43AM (#8778534) Journal
    One of the big things about databases is the reliable commit where all the crud you have done in a transaction either gets committed or backed out. The writes then become kind of important, and you really do want them to complete before continuing with anything else.

    This messing with the I/O queue may make things interesting for the journalling process which is kind of vital to integrity. File placement could become even more important for this (and also the placing of journal/log files).

    The rest seems to just effectively be a modified elevator (wait a bit before moving).

  • Would it not be possible to write a very basic adaptive network that "learns" what the best values for these parameters are for each individual machine, based on a history of its workload?
    • Re:Idea... (Score:3, Interesting)

      by cca93014 ( 466820 )
      Doh! Was meant to include this quote in the above post - mods can ignore the above post please...

      You can tune your anticipatory scheduler to improve its functionality. There are five basic parameters you can alter to change the way the wait-before-seek times function: read_expire, read_batch_expire, write_expire, write_batch_expire, and antic_expire.

      Would it not be possible to write a very basic adaptive network that "learns" what the best values for these parameters are for each individual machine, ba

  • by k-hell ( 458178 ) on Tuesday April 06, 2004 @08:57AM (#8778669) has a Linux kernel comparison [] of 2.6.4 and 2.4.25 on a SMP system with interesting results.
  • by aussersterne ( 212916 ) on Tuesday April 06, 2004 @09:59AM (#8779186) Homepage
    Aside from much better I/O performance, 2.6.x also has much better performance on my notebook (IBM T-series ThinkPad).

    I don't know if it's due to SpeedStep support being in the kernel or what, but when I was running 2.4.x with the pre-emptible kernel patches, switching from wall power to battery power meant massive slowdowns, as though I had switched from a PIII-1GHz to a 100MHz Pentium classic. Simple commands like "ps" would take seconds to complete and screen redraws were visible. The whole system would feel like sludge. In spite of this fact, battery life was relatively poor. The combined effect (much slowed system, very short battery life) meant that it was difficult to get anything at all done on battery power.

    Now with 2.6.x, when I switch to battery power, there is no perceptible slowdown whatsoever when compared to wall power, and battery life is much improved. Downside: suspending 2.6.x kills USB-uhci, so I've had to compile it as a module and hack up my suspend/resume scripts to reload it each time. But for the speed increase, it's well worth the trouble.
  • If you compile GLIBC with NPTL support you'll see even more of the new kernel in action. I quote from,

    NPTL brings an eight-fold improvement over its predecessor. Tests conducted by its authors have shown that Linux, with this new threading, can start and stop 100,000 threads simultaneously in about two seconds. This task took 15 minutes on the old threading model.
  • by Sajma ( 78337 ) on Tuesday April 06, 2004 @12:35PM (#8780943) Homepage
    The original research for anticipatory disk scheduling was done at Rice University by Sitaram Iyer and Peter Druschel and is described here [].
  • by chongo ( 113839 ) * on Tuesday April 06, 2004 @01:55PM (#8782092) Homepage Journal
    While you are waiting to install the new kernel code code, you might try a filesystem mount option called noatime that has been in many *n*x distributions for a while now.

    If you don't care about last access times on your files, then you should consider mounting your filesystems with the noatime mount flag as in this /etc/fstab line:

    LABEL=/blah /blah ext3 defaults,noatime 1 2

    Reading a file under noatime means that the kernel does not need to go back and update the last access time field of that file's inode. Sure, multiple reads over a span of a few seconds will only cause the in-core inode to be modified, but eventually that modified inode must be flushed out to disk. Why cause an extra write to the disk for a feature that you might not care about?

    For example: think about those cron jobs / progs that scan the file tree (tmpwatch, updatedb, etc.). Unless you mount with the noatime option, your kernel must at least update the last access time fields of every directory's inode! Think about those /etc files that are frequently read (hosts, hosts.allow, DIR_COLORS, resolv.conf, etc.) or the dynamic shared libs (,,, etc.) that are frequently used by progs. Why waste write-ops updating their last access time fields?

    Yes, the last access time field has some uses. However, the the cost of updating those last access timestamps, IMHO, is seldom worth the extra disk ops.

    There are other advantages to using the noatime mount option ... however to wind up this posting I'll just say that I always mount my ext3 filesystems with the noatime mount flag. I recommend that you consider looking into this option if you don't use it already.

  • by ksp ( 203038 ) on Tuesday April 06, 2004 @03:42PM (#8783618) Homepage
    I know there is a boot-time switch for changing the I/O scheduler, but I still believe you are stuck with one for all devices. How about using different algorithms for different partitions? There is quite a lot of difference between a database device, a filesystem holding binaries, shared libaries, /tmp, spool directories etc. etc. etc. When I/O schedulers are so different in their theoretical foundations, why do you have to choose only one?
    This should be a mount option, not a boot option.

Money can't buy love, but it improves your bargaining position. -- Christopher Marlowe