Forgot your password?
typodupeerror
Linux Software

The Completely Fair Scheduler 292

Posted by CmdrTaco
from the i'm-always-late dept.
hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."
This discussion has been archived. No new comments can be posted.

The Completely Fair Scheduler

Comments Filter:
  • by Anonymous Coward on Sunday April 22, 2007 @11:44AM (#18832057)
    Will the new scheduler be more efficient at scheduling single processes across a multi-core system?

    On another not[e], when are we going to be able to build a default kernel capable of low latency I/O for A/V work?
  • I/O prioritisation (Score:5, Interesting)

    by Doug Neal (195160) on Sunday April 22, 2007 @11:47AM (#18832069)
    Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse. CPU power is fairly abundant these days so managing its usage is less of an issue than it used to be, but I/O bandwidth is often in short supply and I/O-bound applications can choke a system and make interactive processes a pain in the ass to use. I'd like to see some way of reserving and limiting bandwidth to particular devices for particular processes. And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.
  • by tepples (727027) <tepples&gmail,com> on Sunday April 22, 2007 @11:54AM (#18832123) Homepage Journal

    And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.
    Windows has this to an extent. I press Ctrl+Shift+Esc to open Windows Task Manager, then View > Select Columns... > turn on Page Faults Delta, and I see the penalty for sticking with the 6-year-old paid-for PC that I still use.
  • by sphealey (2855) on Sunday April 22, 2007 @12:02PM (#18832197)
    > Can't we just give the processes weapons and
    > let them decide which follows?

    That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!

    sPh
  • by Anonymous Coward on Sunday April 22, 2007 @12:03PM (#18832211)
    On Multics, tasks that used up their CPU were put in a lower priority run queue and tasks that didn't use up their CPU time were put into a higher priority run queue.

    So interactive tasks naturally floated to the top and compute bound tasks naturally sank.

    (Anyone remember Multics?)
  • by Stephen Williams (23750) on Sunday April 22, 2007 @12:28PM (#18832379) Journal
    And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy

    I'd love something like that.

    There's a way of logging I/O; it's pretty rough-and-ready, not really suitable for permanent use, but can be handy for figuring out what keeps causing a laptop HDD to spin up, for example. As root, do:

    echo 1 >/proc/sys/vm/block_dump
    I/O is then logged to the kernel ring buffer, and can be retrieved with dmesg. The entries look like:

    pdflush(138): WRITE block 1161864 on dm-4
    pdflush(138): WRITE block 0 on dm-3
    pdflush(138): WRITE block 524328 on dm-3
    pdflush(138): WRITE block 786952 on dm-3
    pdflush(138): WRITE block 786960 on dm-3
    When you've finished, do

    echo 0 >/proc/sys/vm/block_dump
    as root to turn it off again.

    Like I said, very rough-and-ready, nowhere near as nice as a proper I/O top would be, but there it is.

    -Stephen
  • by Anonymous Coward on Sunday April 22, 2007 @12:57PM (#18832605)

    I read an article somewhere on Sun's site about I/O scheduling in SunOS, but unfortunately I can't find it now. I'll try to summarize it. The basic point was that when a task wants to write data, it can typically hand that data off to the kernel and then keep running. That means that it's making best use of system resources, since it will use the processor (for example) while the kernel writes to the disk in the background. When a task wants to read data, however, it usually can't do anything until the data's read from disk. So what SunOS will do is schedule I/O unfairly, prioritizing reads over writes, to minimize the amount of time that tasks spend blocked. (My experience is that it will even aggressively swap out pages to expand the write cache.) I wish I could find that article, I seem to remember it having some interesting nitty-gritty details about the way this is actually implemented, as well as some benchmarks from a customer.

  • by Animats (122034) on Sunday April 22, 2007 @01:24PM (#18832747) Homepage

    Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation.

    QNX has that, which is essential for real-time work.

    QNX has the advantage that I/O, like almost everything else in QNX, is done via inter-process message passing operations. The message passing system uses priority queues, and so requests to file systems and devices get handled in priority order. So resource managers (file systems, device drivers, etc.) don't have to explicitly handle priorities; it's done for them. Some resource managers, like disk handlers, process multiple requests at a time so they can reorder them to optimize access, but network devices and such are FIFO at the resource manager level and priority ordered at the message level.

    The end result is that you can compile or surf the web on a system that's also doing real time work without interfering with the real time tasks.

  • Linux "iostat" (Score:1, Interesting)

    by Anonymous Coward on Sunday April 22, 2007 @01:54PM (#18832939)
  • Re:Well duh (Score:3, Interesting)

    by Anonymous Coward on Sunday April 22, 2007 @02:02PM (#18832977)

    Obviously O(1) is a bad choice. Nice to see Linux learning something from Windows here.

    Actually I find the Windows (XP, haven't tried Vista yet) scheduler to be quite horrible especially with regards to interactivity. One resource hog and things pretty much come to a standstill. IMO it's worse than O(1) and no comparison to Con's staircase.

  • by sphealey (2855) on Sunday April 22, 2007 @02:04PM (#18832997)
    That guy was actually the best test writer and overall course designer that I have ever had among all the academic (through a masters) teachers and corporate trainers I have encountered. When you finished his course you received exactly the grade you deserved according to the formal definitions of the grades; as you indicate one didn't receive an A in that class unless one actually _understood_ the material [for the record I was in the B+ group ;-( - which was a correct evalution]. Not surprisingly it also turned out to be one of the most useful classes I ever took as well.

    sPh
  • by Anonymous Coward on Sunday April 22, 2007 @02:19PM (#18833101)
    VAX/VMS had it way back. The SYSMON application was the bees-knees. You could/can get I/O per process, see the files open per process, I/O to each individual file, I/O to a disk, which processes were doing I/O to a disk and which processes were doing I/O to a file. Some of that required a little system programming, but the important thing - the data structures - were all there, and very nicely designed. UNIX/Linux was/is primitive by comparison. Sigh. No doubt I'll get flamed by someone who has never done system programming under VMS, especially not on VMS clusters, where the I/O came from multiprocessor CPUs clustered together - 4 processors per box, 8 boxes and 8 HSCs. Ahh memories.
  • by Animats (122034) on Sunday April 22, 2007 @03:18PM (#18833499) Homepage

    I'm impressed. I did some work on CPU dispatching on a mainframe system in the distant past, and we were never able to beat O(log n) on an ordered dispatch queue. The obvious implementation is O(n); getting to O(log n) requires a tree, and I want to see how they got to O(1).

    This stuff really matters now. If we're going to have 80-CPU multiprocessors, all the main operating system algorithms have to be near O(1), or the scale-up is a lose.

  • by asninn (1071320) on Sunday April 22, 2007 @03:28PM (#18833567)
    There's some folks working on scaling [lwn.net], too. ;) I wonder where I'll get a 4096 CPU machine, though...
  • by Animats (122034) on Sunday April 22, 2007 @05:33PM (#18834523) Homepage

    Actually, that was the Mars Pathfinder [berkeley.edu]. It was running VxWorks, and the effect of the priority inversion was that the stall timer would trip and reset the whole system. The problem was that VxWorks, like QNX, lets you turn off "priority inheritance" on a mutex. This is usually a bad decision, but that was done on the Mars Pathfinder, and created the possibility of a livelock.

    So they uploaded a patch to change that mutex to "priority inheritance on", and it worked consistently thereafter.

  • by smitty_one_each (243267) * on Sunday April 22, 2007 @07:11PM (#18835211) Homepage Journal
    You mean this was the "old school" where you were graded on the basis of something they used to call "learning".
    "Progress" has saved us all of that stress and ambiguity.
    Now, you just pay a small mountain of cash for tuition, and walk away with your "A".
    It's all about efficiency these days.
  • by kcbrown (7426) <slashdot@sysexperts.com> on Sunday April 22, 2007 @07:32PM (#18835337)

    For the purpose of top and other job monitoring tools we can replace a process's "NICE" score with a "VIOLENCE" score -- an aggregate of their armor, accuracy, rampage tendencies and current ammo supply. We can rename the renice utility to medicate. The important thing about medication is that it eventually wears off, unless you specify the -l (lobotomize) option, which turns the process into a harmless drooling vegetable. Its companion utilities are aim and armor, which tune a job's accuracy and armor class, respectively.

    Of course, with such a scheduler, something like the Doom system administration tool [unm.edu] (perhaps more like Quake where you can aim vertically as well as horizontally) will become the preferred method of managing the processes on a system.

    For one thing, the processes will obviously shoot back, as the process manager itself (which you see as yourself when running it) is a running process, and thus subject to being fired upon by the other processes.

    Secondly, a headshot obviously gets you a "lobotomize" effect. This could pose a problem if one of the other processes hits you with a headshot...

    Finally, the application of a medpack to an injured process invokes the "medicate" action.

    There are a few possible problems with this, of course:

    1. When you have two or more system administrators, all running the process manager, the system itself becomes a warzone with innocent processes being killed by the dozen as the administrators go on rampages in their attempts to kill each other for supreme control of the system.
    2. Certain weapons, such as the BFG, are powerful enough to take out all but the most heavily armored processes, and since some of them are area effect weapons, a lot of innocent processes will bite the dust as a result of their usage.
    3. Lightly-armored processes will need additional protection in the form of fast reflexes to avoid being hit.
    4. Eventually the administrators will begin using aimbots and the like. One can see where the resulting arms race will go. Obviously the aimbots will have to run on a different system since otherwise they'll be potential targets.
    5. "Spawn camping" takes on a whole new meaning. Newly created processes become very vulnerable compared with running under earlier versions of Linux. Normal users will have an increasingly difficult time starting tasks like OpenOffice and will start to migrate back to Windows or other OSes with clearly inferior schedulers.
    6. Due to all of the above, the system will eventually become unusable by anyone but the system administrators. The sysadmins will, of course, say that this is how it should be.

    In short, Linux will quickly become the must-have operating system for gamers, but at the expense of the general purpose desktop.

  • by Jac_no_k (5957) <jsuzuki@spamcop.net> on Sunday April 22, 2007 @11:09PM (#18836607) Homepage Journal

    This reminds me of a movie called Tron.

  • by Anonymous Coward on Monday April 23, 2007 @01:58AM (#18837479)
    > Heh. I don't want my kernel's scheduler to be fair. The hell with "fair". "Fair" is for quiche-eaters.
    > I want it to give ME all the resources I need to get whatever I'm doing done before I even think of it.
    > Any other job can just go jump on its head...

    I recommend DOS!

    Either that or classic Mac OS - you can completely take over the computer from the OS, which is in fact how MkLinux and A/UX are booted.
  • by shaitand (626655) on Monday April 23, 2007 @04:05AM (#18837967) Journal
    'Coming up with an idea (even if totally made up) and the backing it up with arguments is much harder than memorization and regurgitation and actually backing it up with things having to do with that class shows you have learned something, or at least know about the concepts discussed in the class.'

    Sure. It demonstrates that you've learned something and that you are a quick thinking and creative person. Your writing ability also plays a major role.

    Unfortunately, you could learn the material without being creative, quick thinking, or a good writer. If you are this unfortunate sole you just got a lower grade. Grades are supposed to demonstrate understanding of the course material, not how bright you are.

All the evidence concerning the universe has not yet been collected, so there's still hope.

Working...