The Completely Fair Scheduler 292
hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."
Isnt this called Cron ? (Score:5, Funny)
I thought Linux used Cron as a scheduler ?
Re:Isnt this called Cron ? (Score:5, Informative)
This is for scheduling CPU resouces in real time. To decide if Firefox or Apache is going to be executed the following split second.
Re:Isnt this called Cron ? (Score:5, Funny)
Re:Isnt this called Cron ? (Score:5, Interesting)
> let them decide which follows?
That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!
sPh
The Mother of All Comp-Sci Flame Wars (Score:5, Funny)
GP: Can't we just give the processes weapons and let them decide which follows?
P: That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!
Behold: The Mother of All Possible Comp Sci Flame Wars: The Darwinistically Selected Genetic Algorithms -vs- the Intelligently Designed Algorithms.
Bumper Stickers $4.95; T-Shirts $19.95:
Re: (Score:2)
Re: (Score:2)
Re:Isnt this called Cron ? (Score:5, Funny)
Want to give each process a weapon? Fine. But they have to earn ammunition.
Every time a process gives up its slot, it's given a round of ammunition. It has the option of "shooting" a process ahead of it in the queue, thereby expending a round of ammunition. A shot process must give up its slot in the next round. Whether it loses all its ammo when it respawns remains a research question.
There are two floating point tunable parameters, "accuracy" and "rampage." "Accuracy" is the likelihood that a given shot will actually hit the process it aims at. "Rampage" is the tendency of a process to save up rounds for a while then go on a spree.
Okay, there's a third parameter, "armor," which is the odds of a hit actually becoming an injury. This is meant to protect system processes against luser jobs, and top-level processes against spawned threads.
Of course, the scheduler itself is a boss job that can't be killed, has perfect armor and has infinite ammo.
For the purpose of top and other job monitoring tools we can replace a process's "NICE" score with a "VIOLENCE" score -- an aggregate of their armor, accuracy, rampage tendencies and current ammo supply. We can rename the renice utility to medicate. The important thing about medication is that it eventually wears off, unless you specify the -l (lobotomize) option, which turns the process into a harmless drooling vegetable. Its companion utilities are aim and armor, which tune a job's accuracy and armor class, respectively.
There are two important things about this approach. First, it's probabilistic instead of purely heirarchical. Second, it should give Jack Thompson the screaming heebie jeebies. In fact, I'm going to call this the JTMS scheduler -- the Jack Thompson Murder Simulator Scheduler.
I'm sure this concept can be explored further, but the bar's about to close.
Re:Isnt this called Cron ? (Score:5, Interesting)
Of course, with such a scheduler, something like the Doom system administration tool [unm.edu] (perhaps more like Quake where you can aim vertically as well as horizontally) will become the preferred method of managing the processes on a system.
For one thing, the processes will obviously shoot back, as the process manager itself (which you see as yourself when running it) is a running process, and thus subject to being fired upon by the other processes.
Secondly, a headshot obviously gets you a "lobotomize" effect. This could pose a problem if one of the other processes hits you with a headshot...
Finally, the application of a medpack to an injured process invokes the "medicate" action.
There are a few possible problems with this, of course:
In short, Linux will quickly become the must-have operating system for gamers, but at the expense of the general purpose desktop.
Re: (Score:3, Funny)
Fascist!
Schedulers these days I dunnu... (Score:3, Funny)
You pesky young folk who think yer flash ram is better than good old reliable ferrite donuts have at with ye! We had disk drives that could pull yer filings out from two feet away on a max seek. Do ya get technology like that nowdays? Do ya? Nuuuuu.....
Mind ye what we had for an OS scheduler was a smiling polish gennelm
Re:Isnt this called Cron ? (Score:5, Insightful)
Re: (Score:2)
Re:Isnt this called Cron ? (Score:5, Interesting)
sPh
Re:Isnt this called Cron ? (Score:4, Interesting)
"Progress" has saved us all of that stress and ambiguity.
Now, you just pay a small mountain of cash for tuition, and walk away with your "A".
It's all about efficiency these days.
Re: (Score:3, Interesting)
Sure. It demonstrates that you've learned something and that you are a quick thinking and creative person. Your writing ability also plays a major role.
Unfortunately, you could learn the material without being cr
Re: (Score:3, Informative)
It's a philosophical question, with no correct answer, but I don't think grades should evaluate the understanding of a course. Grades should evaluate the ability to *use* the material of a course. This is beyond understanding. A student who comple
Re:Isnt this called Cron ? (Score:4, Insightful)
I disagree. That falls back to a measure of the intellect of the individual. That will play a role after school but you don't go to school to demonstrate your abilities or use material, you go to learn material.
'An A-level student is one who grasps the course material so well that he builds on it to produce other conclusions.'
I agree with that. Someone who has a fully grasp of the material understands it well. As I said, the grade should reflect understanding of the material that you took the course to learn. It should not reflect intellect (beyond that required to understand said material), creativity, etc.
'If you lack skills needed to compound your understanding of the material, tough luck. A B is not a poor grade...'
I never said a student with a more thorough understanding of the material shouldn't get an A. I said the A shouldn't be reserved for the quick thinking creative writer who can make up nonsense on the spot for a test question. The slow methodical student may have greater insight into the material but be less creative.
This is the same faulty logic that leads to essays and papers as a measure of understanding. Papers do demonstrate understanding but they aren't the best tool to do so. If papers are primary method used to measure understanding then you aren't ultimately measuring comprehension of the material, you are measuring writing ability.
Re:Isnt this called Cron ? (Score:5, Funny)
PSDoom (Doom process manager) (Score:3, Funny)
Good job sending all those /.ers over there, it will be the mother of all process fights on that poor server now. Sysadmins battling their way through hordes of zombies and monster processes, with ammo (ehm.. mem,cpu) running lower and lower until they're out, just as another wave of uglies comes out of nowhere...
Re: (Score:2)
Re:Isnt this called Cron ? (Score:5, Informative)
http://psdoom.sourceforge.net/ [sourceforge.net]
Re: (Score:2)
Fair? (Score:5, Funny)
Re: (Score:2)
Smells like Communism (Score:5, Funny)
Re:Smells like Communism (Score:5, Funny)
each CPU according to its abilities.
Re: (Score:2)
Re: (Score:2)
Re:Linux is fading away (Score:4, Insightful)
Alternative explanation: People have less problems now using Linux, so they google less for solutions on Linux problems.
Third explanation: Linux documentation got substantially better, so people have less need to use Google as a substitute.
Fourth explanation: The larger density of Linux installations comes with a larger density of Linux experts, so people are more likely to consult their local Linux guru than Google.
Pick your favorite choice or make up yet another explanation.
Yes, those explanations are all completely made up, but so was the explanation you had in mind.
Re: (Score:3, Insightful)
Surprised? (Score:5, Funny)
No I won't, because I don't know what the hell it means.
Hah! In your face, Taco!
Re:Surprised? (Score:5, Informative)
Re:Surprised? (Score:5, Funny)
Meanwhile, outside computer science, Big O faces are used for the completion of a task.
Re: (Score:2)
Re: (Score:3, Funny)
Re: (Score:2)
Re: (Score:3, Informative)
The relation between fairness and algorithmic complexity, is that you may sacrifice fairness to get a lower complexity.
This is just rephrasing GP, I don't know shit enough about schedulers to comment on the actuall algorithms.
Re:Surprised? (Score:5, Informative)
The scheduler decides which process runs when and has to make sure that no process has to wait in the queue forever without getting his share of CPU time (this is what is called "starving").
Since the scheduler is a program by itself, it has a specific runtime characteristic, usually dependent of the number of programs waiting for their CPU share. The special property of the current scheduler in linux is that its runtime is in fact independent of this number. That's expressed in CS by O(1).
Re:Surprised? (Score:5, Informative)
Re: (Score:3)
It's more like: "Our O(1) algorithm gives the optimum performance we tried so much to achieve. But when we made it, we discovered that we were persuing the wrong goal."
SMP and RT capabilities? (Score:2, Interesting)
On another not[e], when are we going to be able to build a default kernel capable of low latency I/O for A/V work?
I/O prioritisation (Score:5, Interesting)
Re:I/O prioritisation (Score:5, Interesting)
Re:I/O prioritisation (Score:4, Informative)
Now, page faults are indeed a form of I/O, but a page fault is technically seen just the fact that some memory required isn't in physical memory. I don't think the parent poster was talking about that. One of the most common reasons for page faults are simply that a block of memory has been swapped to disk, and then suddenly it is required, and as such the block of memory needs to be read into the physical memory.
I'd say: add some memory to that box of yours.
You can read up on it here [wikipedia.org]
Re: (Score:2)
That assumes that the loader is going to try to bring the whole program into memory at load time, which is almost never the case. It completely ignores the notion of working sets [wikipedia.org], or how they change over the lifetime of a process. As an example, why should a program keep its "open all files" code in memory once all the files are opened? Vomit Making System loaded the first page of a program (the entry point) and forced the program to demand in other (512 byte
Re:I/O prioritisation (Score:5, Informative)
ionice [die.net]. Available since 2005-08-28.
> And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy
I agree, that would be nice.
Re:I/O prioritisation (Score:4, Funny)
Don't you mean "that would be ionice"?
Re: (Score:2)
Re: (Score:2)
nice -n -19 ionice -c 1 kplayer
Yes. Yes. But those of us that don't live in the 1980s, launch media files by double-clicking on them in Konqueror, not dropping to a shell to write an obscure command so start a media player, so I can browse with it and open the file from there. Hell, I don't think I can do that even via the file association menu in KDE, where I can add additional arguments. But this looks like I need to run "nice" instead of "kplayer". Also I wouldn't want to do t
Re: (Score:2)
Re:I/O prioritisation (Score:5, Informative)
#!/bin/sh
nice -n -19 ionice -c 1
Re: (Score:3, Informative)
Quoting the $@ is necessary to ensure that file names with spaces are passed as single words.
Re: (Score:3, Informative)
I dunno. Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing". Recently, the I/O block layer got a new scheduler (linky http://kerneltrap.org/node/7637 [kerneltrap.org]). Regarding other I/O prioritization, I can't say with authority that this is needed or not.
Maybe all of these things are related, but in my se
Re: (Score:3, Informative)
Which, presumably, is why they're thinking of taking out its current O(1) process scheduler and replacing it with an O(log(n)) one?
Re: (Score:3, Interesting)
Re: (Score:2)
Comment removed (Score:5, Interesting)
Re: (Score:2, Informative)
It still doesn't break it down by task though, so not really the iotop people are looking for.
Re:I/O prioritisation (Score:5, Informative)
process syslogd dirtied inode daemon.log
process Y dirtied inode some_other_file1
process Z dirtied inode some_other_file2
followed by 300 writes by pdflush, which only specify a device and a block number, not a file name. There's no way you can find out for each of the 300 writes whether it was caused by the "syslogd", X or Z process. So there's no way you can count the amount of write I/O that a process has done.
Re: (Score:2, Informative)
In fact, I don't think Linux has ever suffered from the same I/O problems that people often complain about in Windows.
Re: (Score:2)
Re:I/O prioritisation (Score:4, Informative)
http://www.atconsultancy.nl/atop/kernpatch.html [atconsultancy.nl]
Re: (Score:2, Interesting)
I read an article somewhere on Sun's site about I/O scheduling in SunOS, but unfortunately I can't find it now. I'll try to summarize it. The basic point was that when a task wants to write data, it can typically hand that data off to the kernel and then keep running. That means that it's making best use of system resources, since it will use the processor (for example) while the kernel writes to the disk in the background. When a task wants to read data, however, it usually can't do anything until the
Re:I/O prioritisation (Score:5, Interesting)
Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation.
QNX has that, which is essential for real-time work.
QNX has the advantage that I/O, like almost everything else in QNX, is done via inter-process message passing operations. The message passing system uses priority queues, and so requests to file systems and devices get handled in priority order. So resource managers (file systems, device drivers, etc.) don't have to explicitly handle priorities; it's done for them. Some resource managers, like disk handlers, process multiple requests at a time so they can reorder them to optimize access, but network devices and such are FIFO at the resource manager level and priority ordered at the message level.
The end result is that you can compile or surf the web on a system that's also doing real time work without interfering with the real time tasks.
Re:I/O prioritisation (Score:4, Interesting)
Actually, that was the Mars Pathfinder [berkeley.edu]. It was running VxWorks, and the effect of the priority inversion was that the stall timer would trip and reset the whole system. The problem was that VxWorks, like QNX, lets you turn off "priority inheritance" on a mutex. This is usually a bad decision, but that was done on the Mars Pathfinder, and created the possibility of a livelock.
So they uploaded a patch to change that mutex to "priority inheritance on", and it worked consistently thereafter.
Re: (Score:3)
credit goes to Con Kolivas (Score:4, Insightful)
Re: (Score:2)
Re:credit goes to Con Kolivas (Score:5, Informative)
Con's scheduler seemed to work better at higher workloads than the mainline, by just trying to distribute load evenly and not trying pretty interactivity tricks. But several ppl did say it didn't perform well for certain X client workloads. That's when Ingo's CFS was posted.
There really is 2 alternatives Ingo's CFS & Con's SDL that's being simultaneously tested by the kernel developers now, and none is accepted into mainline.
So it wouldn't be fair to say that CFS is *the* next Linux scheduler. It could be SDL as well.
Re: (Score:2, Insightful)
I don't think your characterization is fair, I read the entire thread and all I see is someone doing his job.
Re: (Score:2, Insightful)
I don't know any of these guys and never dealt with them on any level.
Optimizations leading to less optimized code (Score:4, Insightful)
After enough number of iterations trying to optimize a software program to do everything very well compared to the base "naive" solution, you end up with an OS that does everything poorly.
It's counterintuitive, but we see it it every day around us.
Re: (Score:2)
I don't know about you but my 2.6 based box runs just fine, certainly no slower than my previous 2.4 boxes, and it packs a lot more features, drivers and other internal changes/improvements.
Re: (Score:3, Insightful)
That's blatantly false. Sure, there are tradeoffs. There are also cases where a better algorithm is an outright win 100% of the time.
How does this work? (Score:2)
The Multics scheduler always seemed very nice (Score:4, Interesting)
So interactive tasks naturally floated to the top and compute bound tasks naturally sank.
(Anyone remember Multics?)
Re:The Multics scheduler always seemed very nice (Score:4, Funny)
Re: (Score:2)
Re: (Score:2)
I guess that explains the slow response I got - I mostly used Multics [multicians.org] from an IBM 2741 [wikipedia.org], so I didn't have any cards.
I love this. (Score:2)
>> [...] the core scheduler got smaller by more than 700 lines:
Someone who I respect tremendously once said
"Perfection is attained when you remove everything that is not essential"
I have to agree
Re: (Score:2)
Il semble que la perfection soit atteinte non quand il n'y a plus rien à ajouter, mais quand il n'y a plus rien à retrancher.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Completely fair? (Score:5, Funny)
Re: (Score:2)
Interactive tasks (Score:4, Insightful)
Even so, I'd prefer to have IO better scheduled - ionice doesn't really seem to work at least for me.
Wait wait wait! What about fixing disk IO first? (Score:2, Informative)
http://forums.gentoo.org/viewtopic-t-482731.html [gentoo.org]
How about fixing that first?
I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?
But at a higher level there's been such brokenness.
K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
Firewire
O(1) too good. Surprise? (Score:2, Informative)
Fair schedulers are for the weak (Score:4, Funny)
(from here [sdf-eu.org])
This is very impressive (Score:3, Interesting)
I'm impressed. I did some work on CPU dispatching on a mainframe system in the distant past, and we were never able to beat O(log n) on an ordered dispatch queue. The obvious implementation is O(n); getting to O(log n) requires a tree, and I want to see how they got to O(1).
This stuff really matters now. If we're going to have 80-CPU multiprocessors, all the main operating system algorithms have to be near O(1), or the scale-up is a lose.
Re: (Score:3, Informative)
Internally, the kernel only has five priority levels. Each one is a queue, and it compares among all of them to determine which task to run, but it only compares the head of the queue. So it's O(5), which is of course O(1), but if it supported an arbitrary number of priority levels (which IMO it should) then it would become O(n) again.
Re: (Score:3, Informative)
Re:O(1) - what a huge misnomer (Score:4, Informative)
Re: (Score:2)
Re:O(1) - what a huge misnomer (Score:4, Informative)
Re:O(1) - what a huge misnomer (Score:5, Informative)
Here's how Linux 2.6.x task scheduler works:
Re:O(1) - what a huge misnomer (Score:5, Insightful)
Re:O(1) - what a huge misnomer (Score:4, Informative)
Re: (Score:2)
Re:O(1) - what a huge misnomer - INCORRECT (Score:5, Informative)
The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.
O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.
So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.
Re: (Score:3, Interesting)
Actually I find the Windows (XP, haven't tried Vista yet) scheduler to be quite horrible especially with regards to interactivity. One resource hog and things pretty much come to a standstill. IMO it's worse than O(1) and no comparison to Con's staircase.