Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Linux Software

Preemptible Linux Kernel: Interviews and Info 238

Posted by michael
from the patch-and-patch-again dept.
An anonymous submitter sends: "MontaVista and Robert Love are developing a patch for the Linux kernel to make it fully preemptible. Lots of users are involved, and tests show huge reductions in latency. Robert's kernel patches are here. Finally, an interview with Robert, on preemption and more."
This discussion has been archived. No new comments can be posted.

Preemptible Linux Kernel: Interviews and Info

Comments Filter:
  • Wow! (Score:1, Interesting)

    by Free Bird (160885)
    Cool! I really hope this'll make it into the official kernel, hopefully even 2.4 (though I doubt so not that 2.5 has been started).
  • by geekplus (248023) on Sunday October 14, 2001 @01:31PM (#2427740)
    The reductions in latency -- would that include the type of latency that plagues real-time audio applications like sound-on-sound recording?
    • Nope. Decent sound-on-sound recording uses pretty large buffers to get around that sort of problem.
    • No, unfortunately. Professional audio processing requires an extremely special form of real-time processing that is pretty much only good for handling audio, and which actually can cause problems with any other types of software. Therefore, it is unlikely that preemption patches for Linux, which must remain a general-purpose system, will be made. Even most Windows professional audio programs don't use Windows' built-in scheduling; they instead take advantage of Windows' rather loose kernel hooks to preempt the operating system and handle real-time scheduling for themselves.

      Look at BeOS for an example of why this sort of processing can't possibly fit into a normal-use system. BeOS was constructed especially for the handling of low-latency media such as audio, but as anyone who tried to program it can tell you, it was exceptionally difficult to program anything other than media apps with it! The extremely high-resolution threading of the operating system made even the simplest programming tasks near impossible, as mutex locks and thread conditionals needed to be spread throughout the code to ensure proper execution. This is why BeOS ultimately flopped: it was too hard to program for.

      But, of course, this is an area where Linux could shine. Due to its open-source nature, a special media-processing fork of the kernel could be made for those who need to deal with real-time audio, while the general-purpose kernel remains general-purpose. In fact, DeMuDi Linux [xdv.org] is already striving for this goal.

      • by Xoro (201854) on Sunday October 14, 2001 @02:05PM (#2427864)

        I don't want to sound like I'm contradicting you, but did you happen to read this link [gardena.net]from the article? It's specifically about realtime audio. Key paragraph:

        *EXCITING* NEWS: things getting almost perfect ! Ingo's lowlatency-2.2.10-N6 patch with the shm.c part backed out and a modification of filemap.c (thanks to Roger Larsson) performs _REALLY_ well, using my usual latencytest parameters (4.3ms buffer), I got NO DROP-OUTS anymore, with sporadic maximum peaks of ONLY 2.9ms This is really exciting because it opens the doors to a whole new class of Realtime applications for Linux, simply using userspace processes scheduled SCHED_FIFO. I heard of comparable low-latencies only from BEOS, Windows can't simply guarantee these kind of latencies, not even using DirectX. Using a soft-synth on Win98 on my BOX I must use 15-20ms audio buffers to get _SOMEWHAT_ reliable audio. This is actually about more than 3-4times the buffer I used for testing under Linux ( 4.35ms).

        I don't know much about the field, but the page seems to speak to several of the audio-related concerns mentioned above.

        • by Anonymous Coward
          The last time I looked, the Linux process time slice HZ constant was an anemic 100. When will it finally be defaulted to a respectable 1000 so that programs that act on events are more responsive? You can't even make a smooth scrolling ticker on Linux without chewing up all the CPU if the HZ constant is 100 - too slow. I realize that more time will be spect in context switching - so be it - the time is miniscule.
      • I couldn't disagree more. BeOS is far from hard to program for. The API's are relatively simple, in fact BeOS is one of the only O/Ses you can read people calling a joy to code for. How much do you even know about it?
      • Ummm,
        Sorry, just want to note that mutex and semaphore programming is not all that difficult if you do it much. True windows have a few kinks, but the concept is pretty basic. Basically I would have to disagree that mutex and thread programming makes programming hard. It's just programming once you understand it, it's pretty straight forward.

        As for the windows problem use startthreadx instead of startthread (Yeah probably not the real api functions, but close enough haven't worked on windows for a while.)

        Lando
      • OK... I'm confused... this is extrmeley off-topic and I've never touched BeOS (besides failing to install it) so I don't know what I'm talking about.

        Are you saying that in a multi-threaded program BeOS was so finely preemptible w/ small time slices that you couldn't be sloppy with resource contention like you can get away with on most UP platforms? I can't see any other way a scheduler would effect a user application.

        Or maybe the application framework you used for BeOS applications allowed events to be handled in parallel? Or...?

        Brian Macy
      • BeOS programming really isn't that hard. You just have to get used to the idea of locking every single thing that could possibly be shared.
      • The extremely high-resolution threading of the operating system made even the simplest programming tasks near impossible, as mutex locks and thread conditionals needed to be spread throughout the code to ensure proper execution.

        Right on! I ran BeOs under VmWare [vmware.com] to try developing for it, and the pthreads compatibility was... well let's just be polite and say "extremely non-optimal". The spin locks in the kernel were so tightly placed that any possible race condition you could think of would occur if you didn't mutex lock the hell out of it, and the littany of devices you had to lock to access memory was just unbelievable. I pretty much had to read through the video driver code to get anything done as the documentation got as far as "Hello World" before wishing you luck.

        Anyway, DeMuDi looks to be a step in the right direction - maybe if a Linux distro starts shipping with 2 kernels, a standard kernel and a multi-media enhanced kernel, we'll finally have a workable solution.

        • Anywhere that BeOS highlighted your race condition by causing unwanted behaviour is somewhere that you'd get "random" crash bugs from in another OS if you didn't fix the code.

          Other OSes don't guarantee much about how long your timeslice is, or how often you'll get time, it's sort of haphazard. That randomness means that while those race conditions don't manifest as much, they're still there to bite you.
          Think of it like memory leaks and dangling pointers. Ninety-nine times you can use an element of a linked list after delinking it, one time it will have already been written over. But you don't want to somehow make the bug come up one in a thousand times... you want it to come up EVERY time so that you fix the problem before release.

          It might be a bit of a pain to put locks around everything, but after a while it becomes quick and natural and you still have the power of a fast kernel with a very small timeslice for when you need it.
      • "Professional audio processing requires an extremely special form of real-time processing that is pretty much only good for handling audio, and which actually can cause problems with any other types of software."

        Special in what way? I'm not really familiar with audio software but I have a hard time picturing what you mean by "special" real-time.

        You say that in Windows software handles it's own scheduling and bypasses the kernel. What exactly does that buy you that you couldn't get more elegantly in Linux by creating a kernel patch (A premptable kernel patch for example). The windows way strikes me as not very stable, flexible or good.

        Linux let's your program hog the cpu already by setting nice levels. With preemption even if it gives the cpu to a different process it can take it back right away.

        The Linux way seems better exactly because it's not a special purpose hack. Why is hogging the cpu for audio processing any different than hogging the cpu for video processing?
      • This is why BeOS ultimately flopped: it was too hard to program for.

        True, but in a very different way. The lack of decent developer support, for a platform running on hardware most people use windows on, aimed at the market that people buy macs for, compatible with less actual hardware than either, with no software from vendors anyone had ever heard of, is why it flopped.

        I liked BeOS too. I ultimately wiped it off my system because I just didn't have a use for it.
      • Those locks you're leaving out of your non-BeOS code might make it easier to code, but they mean it'll crash at random years later when some odd combination of load variables causes your program to be yanked out of a critical area before completion and lets the next thread enter too soon. But you'll have run it for months, seen no problems, and shipped it.

        You *need* to lock everything that might ever be an issue, even if it's the tiniest operation. That's why there's a "Test & Set" operation in ASM. It might be the tiniest thing, but you need to guarantee it's atomic.

        I wish more OSes were hard in the way you describe - if race conditions were more easily shaken out they wouldn't plague "release caliber" software.

        And as to BeOS being hard to program for... What?!? It might have enforced better style which could be a pain at first, but it was (is still, I guess) a wonderful OS for programmers.
  • We're sorry, but tonight's "Linux" will not be aired. Normally you would find 2.4.12 or 2.4.13-pre2 on Sunday nights, but not this evening. Now that Linux is fully preemptible, NBC will be airing a four-hour music-and-ice-skating tribute to Bill Gates.

    We apologize for any inconvenience, and for the reduced uptime. Enjoy the show.
  • Finally.. (Score:2, Interesting)

    by Renraku (518261)
    You'd think this would have been one of the first few 'features' of the Linux core. If the latency were high, it would screw programs and things that rely on low latencies to compute. Better late than never.
  • Does anyone think that this will ever make it into the kernel? It seems that Linus does not like this because it cures the symptons of latency in the kernel instead of the real problems.

    • Why was this modded 'flamebait' ?

      The poster raises a valid point which reflects Linus' attitude pretty good. IIRC Linus himself said, that they should rather fix the CAUSE of those latencies instead of the symptoms. This is one of the reasons, why Linux is against kernel debuggers. They tend to lure the coder into fixing symptons on the surface instead of perhaps rethinking the design (off by one errors are an example).
  • Hmm (Score:3, Interesting)

    by drinkypoo (153816) <martin.espinoza@gmail.com> on Sunday October 14, 2001 @01:38PM (#2427762) Homepage Journal
    I thought the Slack 2.0 release had a 1.1 kernel.

    I'm wondering about this paragraph:

    We had to modify the interrupt code in entry.S to prevent some situations and to allow preemption on return from an interrupt handler. However, we can't preempt within critical regions for the same reason we can't allow concurrency within them with SMP -- so we prevent preemption while holding a spinlock. The bottom half handler and scheduler were also modified to prevent preemption while they are executing.

    Can anyone give a nice layman's description of what he's talking about here?

    • Re:Hmm (Score:1, Informative)

      by Anonymous Coward
      If the kernel is preventing the same piece of code from running on more than one processor (by acquiring a spinlock), then it is also preventing the code from being preempted.

    • Re:Hmm (Score:5, Informative)

      by selectspec (74651) on Sunday October 14, 2001 @01:53PM (#2427823)
      The interrupt handlers can't allow premption during the context switch of an interrupt because the registers are intransit. Basically, you can't have an interrupt while your in the process of any kind of context switch otherwise you're never sure what registers you were able to flush to and from the CPU to the stack.

      Critical Sections (such as access to the IP stack or I/O queues) have to be protected. With the advent of multi-processor systems under the SMP scheme, there is already considerable locking within the kernel to synchronize access of critical resources between processors. Critical regions also need to be protected from interrupt concurrent access as well.

      Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.

      All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.
      • Re:Hmm (Score:5, Funny)

        by Anonymous Coward on Sunday October 14, 2001 @02:09PM (#2427880)
        Only on slashdot would "IP stack", "I/O queues", "interrupt concurrent access" and "SMP" be considered laymans terms.
      • Re:Hmm (Score:5, Informative)

        by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @02:54PM (#2428019) Homepage
        I originally felt I should stay out of any discussion here, but I want to answer some of these questions and clear some of this stuff up. To be honest, it is a little embarrassing having everyone read and comment on the interview. :)

        Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.

        Interrupts, even just the in question, are not disabled during a bottom half, at least in general. The reason we can't preempt bottom halves is that they are guaranteed to be serialized w.r.t CPUs (ie a given BH runs on only one CPU at a time). Because of this, the BHs are designed without a regard reentrancy. So we can't preempt them.

        All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.

        Hardly. Would you say an SMP system is not SMP if it is non-concurrent inside critical sections? No, you wouldn't, and this is the same situation we have here with preemption. We can't preempt inside critical regions. We have concurrency and reentrancy concerns, just like SMP does. We also can't preempt inside interrupt handlers or bh's because they aren't designed to be preempted (nor would you want to interrupt the top half of an interrupt, anyhow).

        The current kernel is not preemptive _anywhere_. The only way, in fact, kernel code ever yields execution is if it explicitly does so or returns. Since with the preempt-kernel patch we can now preempt in 90% of the kernel, I think its safe to say we have a preemptible kernel now.

        • I agree with Robert here, and give him full kudos on his work, and appriciate his clarifications. We should all support his work with MontaVista. I apologize as my comments were hastely put together and unfairly characterized this project (which I wholeheartedly think is cool!). I would agree that the kernel with this patch is preemptive (just not "fully" preemptive however). I realize after reading my original comments that I should have choosen some better wording! These guys have done some kick ass work, and I'm sure that it was a considerable amount of work.

          -Pete
    • by Alien54 (180860) on Sunday October 14, 2001 @01:59PM (#2427844) Journal
      There are these links:All around useful stuff, enough to get you started destro^H^H^H^H^H^H hacking your own kernel
    • The left hand (processor 0) needs to know what the right hand (processor 1) is doing.

      Reverse if necessary.

      Ambidexterous people...just HUSH! :)

      Heh, how about that, computers do have a real life (tm) frame of referrence.
      Moose.
    • Re:Hmm (Score:4, Informative)

      by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @03:05PM (#2428038) Homepage
      I thought the Slack 2.0 release had a 1.1 kernel.

      It could of, I just seem to remember a 1.0 kernel.

      Can anyone give a nice layman's description of what he is talking about here?

      Basically I am explaining the modifications to the kernel we made in order to make it preemptible. To try to put it more for the layman, besides just allowing the kernel to preempt itself as needed, we had to prevent some certain situations from being preempted. This is the same situation with SMP. We use SMP's locks to disallow preemption, for concerns of concurrency and reentrancy. We can't preempt during interrupt or BH handling because those things are not designed for concurrency, either.

      To sum it up, we have to prevent preemption in some situations. Those situations are: while locks are held, while handling interrupts and bottom halves, and while inside the scheduler itself.
  • by mike_the_kid (58164) on Sunday October 14, 2001 @01:40PM (#2427772) Journal
    JA: What tips and inspiration can you offer aspiring kernel hackers?

    Robert Love: Read the source, play with the source, and bathe regularly.

    All computer science labs should have available eye-wash style emergency showers.
  • Can someone fill me in... Hasn't Microsoft been claiming windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?

    What is this 'preemptive' thing refering to? Task scheduling?
    • by jeffy124 (453342) on Sunday October 14, 2001 @02:13PM (#2427892) Homepage Journal
      pre-emptive is a form of multi-threading. the other form is co-operative.

      Co-operative means that threads relinquish control on their own. This meant that a greedy thread could put a serious stranglehold on the OS and lock-up the system, forcing a reboot.

      Co-operative was used in every Mac prior to and including OS-9, which made it very unstable should a thread crash.

      Pre-empt means the OS decides when the thread loses control. A thread can still voluntarily relinquish control, but the final call still comes down to the OS.

      OS-X is fully pre-empt, meaning a crashed thread doesnt crash the entire system, bettering the stability overall as that will usually only crash the program that thread belonged to, not the entire system.

      I dont know what MS has for their threading model, they seem to have a very bad hybrid system. The threading in Windows 95/98 tends to cause a good number of BSODs. NT/2000 OTOH, had a better model and crash a lot less often, which is why they have traditionally been the more stable MS OS.

      Task scheduling has to do with what thread gets control next. Priority and other factors decide that. Solaris threads have 2^31 possible levels of priority, Windows (all versions, IIRC) has 5 classes and then 5 sub-classes of priority for each (a REALLY screwed up and tough to understand and explain technique, iow not a clear-cut 25 levels), and Java has 10 levels for cross-platform threading. Each model has their plusses and minuses, but that's getting offtopic from preemptive vs. co-operative.
      • Not quite correct.
        It's not preemptive vs. cooperative.

        But preemptible vs. non-preemptible kernel.

        "Pre-empt means the OS decides when the thread loses control."
        Yes, that's preemption.

        B,ut there is another preemption.
        Should a process get a higher priority than the currently running process, then the current process gets preempted.

        E.g.
        You have a low priority CPU-bound process A(e.g. Seti@home) and you have a high priority I/O-bound process B (e.g. XMMS).
        Usually, B does nothing but waiting for I/O (e.g. the soundcard and the harddisk). While waiting, the process is not in the run-queue.
        Meanwhile, A hogs the CPU. Usually, when the I/O request is done, the CPU gets an interrupt request (IRQ) which causes the OS to switch in kernel mode and handle the request. B gets active again and has a higher priority than A, so A gets preempted. Usually that works fine, but now A wants to do some I/O (deliver a packet) and calls the kernel, which handles the request. Just this moment is the I/O for B ready. In Linux (as in most other OSs too) B has to wait until A gets its syscall done, since the kernel is not preemptible. This period of time until the B gets the CPU increases the latency.

        Windows 95 is preemptive (at least according to A. Silberschatz) as is Linux.

        The high amount of crashes of the whole system stem from the resource protection (direct hardware access), not the scheduling.
      • by Anonymous Coward
        As I understand it:
        NT has 32 priority levels.

        The split into idle (p=0), low, below-normal, normal, above-normal, high and realtime (p>=16) (which I assume is what you were referring to) is just a simple way to name different general priority levels. It's the 32 levels that matter.

        Normal priority is 14.

        Anything running at 16 or above ('realtime') will never get interupted by threads running at lower priorities. The OS will never change these priorities, though the user can.

        Ready to run threads of priority =14 can be given a temporary priority boost to 15 (lasts for a double timeslice which is 40ms normally) if they have been ready to run for about three seconds. Anything at lower than priority 16 shares what time is available, with higher priorities being favored. At priorities lower than 16, no thread will ever be totally starved of CPU time.

        Priority 0 is for things which should only run when nothing else needs CPU time, like RC5 or SETI@home (though some such apps actually set themselves to priority 4 and hence slow most things down. folding@home used to do this).
        • Also, whenever a thread unblocks on I/O, it gets a priority boost so it can run again, quickly issue another I/O, and go back to sleep. The boost varies depending on the thing that it unblocked from, such as audio or input. Input tends to get large boosts, which is one reason why Windows tends to be "Snappier" than Linux.
    • by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @03:18PM (#2428064) Homepage
      Can someone fill me in... Hasn't Microsoft been claiming windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?

      You are thinking of forms of multitasking. One form is preemptive, in which tasks are given a specific period in which to run (timeslice) and then forcibly preempted by the next runnable task when that quanta ends. Win95, NT, all Unices, and anything decent fit in here

      The other form is cooperative, in which tasks run until they yield execution. This is how Win 3.1 is. In 3.1, tasks ran until they finished processing their current Windows Message or called yield().

      This article is about a preemptive kernel, where actually the same ideas apply. Inside the kernel, things are currently cooperative in the sense the kernel code runs until it completes or yields control. This patch makes it preemptive -- it will be preempted when something more important needs to happen.

      Win95 does not have a preemptive kernel (it isn't even reentrant). NT might. Solaris does. Linux does with this patch.

    • windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?

      Windows' "preemptiveness" refers, as explained somewhere else here, to the windows kernel being able to jump in and stop any user process executing to give the next one its term - so (in theory) no user-run program can hog all of the CPU and resources.


      Linux has always done this - it's the standard way to write a unix kernel.


      In relation to the audio discussion, preemptive in a linux kernel means (as far as I understand it) that the kernel attempts to guarantee a minimum time between an interrupt coming in on some device and control being handed to the driver for that device. It does this by preempting its own tasks in order to hand control over to the driver for the device needing the attention (the driver, of course, runs as a kernel process, also).


      Typically, the goal is to get a maximum latency of 10ms or better (less) between the interrupt and the waking up of the driver.


      In a professional audio situation, of course, the user can go a long way by stripping all the unnecessary hardware and tasks out of the configuration of the machine, which will mean that (if done properly) the only thing which can get in the way is linux' internal book-keeping. This is a different situation to playing with audio apps on a networked computer while you print out web pages.. ;)


      Beyond this, there is real-time linux in which (as I recall) a hard maximum latency of 2ms or so is claimed. But the overheads introduced by all the timing and checking which guarantees this impact the performance to the extent that it's quite a different beast, for specialised applications.


      Some audio programmers would like a low-latency patch (either the preemptive one or some other) which has a soft guarantee of "almost all" latencies below 5-10 ms to go into the standard kernel because they would like their userbase not to have to deal with the complexities of kernel recompilation and/or patching, but this is a pretty tall order because Linux will not like having basically ugly fiddly designs with lots of volatile little conditionals which have to be fiddled with everytime something changes going into the beautiful kernel.


      Maybe vendors like mandrake should pick up the baton and provide a low-latency alternative kernel installable with their gui tools or at install time, which would keep everyone happy at the cost of not too much effort and space.

    • Can someone fill me in... Hasn't Microsoft been claiming windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?

      What you're thinking of here is userspace preemptiveness. A userspace application can be preempted to make way for another process. The other process could be in userspace OR kernelspace. Linux has always been like this.

      The article is describing kernelspace preemptiveness. Basically if the kernel is doing something (eg, reading a block off disk) then the current Linux kernel can't preempt that to do something else in userspace OR kernelspace.

      These patches add kernelspace preemptiveness in addition to the already existing userspace preemptiveness. It makes Linux extremely suitable for low-latency applications (eg, professional audio).

  • I'm not sure... (Score:2, Interesting)

    by TheMMaster (527904)
    I think this is a good short-term solution for the latency problems but I personally wouldn't include it in the main kernel releases. I believe that it *might* be a good idea to fork the kernel releases (temorarily) in two groups: One for servers and one for workstations until the problems have been solved.
    I think that (for now) using this patch on workstations is a pretty good idea. And I think that there should be a better solution for the problem witch should THEN be something along the lines of kernel 3.0
    I am not a kernel developer or anything, but I am currently reading up on the source and the mailing lists.
    Basically all I am trying to say is: Make it work NOW and solve the real problem later. Just make sure that is WILL be solved... (no microsoft coding ways here ;-)). We still need a larger user base...
    • Re:I'm not sure... (Score:3, Informative)

      by STSeer (119553)
      Love said that this patch even if added to the main tree would still be a config option.
    • Re:I'm not sure... (Score:3, Interesting)

      by debrain (29228)
      Actually, given the current state of the vm parameters set almost exclusively for a workstation (since bdflush chokes a server real good), would seem to dictate that you have to tinker with the kernel anyway and that forking the kernel itself wouldn't necessarily help since the number of forks for each configuration of properly scalable high intensity server would be enormous. It works good for a workstation, and perhaps preemption should be default on a workstation (I use Love's patch on mine), but splitting the kernel between workstations and servers is probably not the best way to go about making servers customized to their personal best performance level since the configuration is quite sticky anyway.
    • Re:I'm not sure... (Score:5, Informative)

      by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @04:09PM (#2428171) Homepage
      Disclaimer: It's my patch

      I think this is a good short-term solution for the latency problems but I personally wouldn't include it in the main kernel releases. I believe that it *might* be a good idea to fork the kernel releases (temorarily) in two groups: One for servers and one for workstations until the problems have been solved.

      I tend to look at this more of a long-term solution, and I think people who see it has a short-term solution or hack are missing the point. First, this is a feature. We aren't kludging kernel code so that we can lower latency by stopping it when needed. We are effectively using the SMP code to multitask better within the kernel.

      Second, forking the kernel over this is a terrible idea. Since it is a config setting, this is a non-issue anyhow, but I really don't want to see this thing forked off. In fact, I think the ideal situation is where we can get a preemptible kernel that benefits throughput so that server processes benefit from it as well.

      I think that (for now) using this patch on workstations is a pretty good idea

      Agreed :)

      And I think there should be a better solution for the problem witch should THEN be something along the lines of kernel 3.0

      There isn't a better solution that is not a hack. There is a reason Solaris, NT, and all RTOS are preemptible inside the kernel: it is the only way to achieve real-time response. You just _have_ to be able to respond to events when needed.

      The "better" solutions in this case are "simpler" -- if we can hack some conditional schedules into places, perhaps simplify some algorithms, etc. then we can perhaps reduce latency without preemption. This is what Andrew Morton's low-latency patches do. But we need more. The point is not that preempt-kernel is a hack, but that it is a whole new high-tech feature, and some people want to find a simpler solution.

      Personally, I don't think a simpler solution exists, and I believe the preemptive kernel satisfies other problems (and it also a neat feature:>). Thus I work on it.
      • There is a reason Solaris, NT, and all RTOS are preemptible inside the kernel: it is the only way to achieve real-time response.

        I thought that what (certain) kernel hackers really objected to is preemption while locks are held. The complications (eg priority inversion) they talked about seem only to arise in that case.

        So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?

        Second, observed results aside, what reason do you have to believe that preempting the lock-less parts of the kernel is "good enough". All else equal, one would expect the latency distribution to be similar with and without locks, so you would expect plenty of "worst cases" to occur with locks. Of course, there is already a pressure to reduce the time that critical locks are held, but I wouldn't be surprised to see non-contended locks (especially outside the kernel core) held for long times. So is there a good reason that the important "worst cases" are happen without locks?

        IANAKH.

        • Re:I'm not sure... (Score:5, Informative)

          by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @07:28PM (#2428838) Homepage
          I thought that what (certain) kernel hackers really objected to is preemption while locks are held. The complications (eg priority inversion) they talked about seem only to arise in that case.

          There are a few reasons other hackers complain, although I didn't know this was one of them. Since MontaVista's original preemptive kernel work, I believe, we have never preempted inside of locks. Note that you can, but then you reach the issues with deadlocks and thus the need for priority-inversion that you spoke of.

          So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?

          I would say it means sans locks. None of the mentioned OS's are preemptive while holding a lock. You always have to respect the lock. Now, you can preempt during the lock and go do other things. If you do this, you are assuming the lock is going to be held long (or else it is favorable to just spin for a cycle or two). In this situation you want to use semaphores, which we _do_ preempt during.

          When a process hits a semaphore that is in use, it goes to sleep and something else continues. The process awakes when the resource is available. Now we reach the problem you wrote of above: priority inversion. What if task A holds resource Y and sleeps waiting for resource X and task B holds resource X and sleeps waiting for resource Y? You deadlock.

          Thus we need to use a type of semaphore called a priority-inheriting mutex, which inverts the priority of the task holding a resource so it will always complete and release the lock. I know Solaris has these. However, I would consider any kernel that can preempt itself in general a preemptible kernel.

          Second, observed results aside, what reason do you have to believe that preempting the lock-less parts of the kernel is "good enough". All else equal, one would expect the latency distribution to be similar with and without locks, so you would expect plenty of "worst cases" to occur with locks. Of course, there is already a pressure to reduce the time that critical locks are held, but I wouldn't be surprised to see non-contended locks (especially outside the kernel core) held for long times. So is there a good reason that the important "worst cases" are happen without locks?

          First, before I cast results aside, let me mention that observations show we are already lowering latency a great amount. But, you are right, periods in which locks are held are a problem. This is why I mentioned in the interview the use of things like Andrew Morton's low-latency patch, the preempt-stats patch (for finding the locks), etc.

          Some of the problems still occur while locks are held, but thankfully the point of a spinlock is that they are held for a VERY short time. A solution to this may be to replace the spinlocks held for a long time with a priority-inhereting mutex.
        • Re:I'm not sure... (Score:3, Informative)

          by be-fan (61476)
          So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?
          >>>>>
          I don't know about those, but BeOS isn't preemptible during a spinlock either. BeOS requires you to disable local interrupts before acquiring a spinlock, which means that the scheduler never even gets to run on that CPU because it won't take the timer interrupt. I'd surmise that almost all preemptible kernels work like this. Judging from this doc [qnx.com] it would appear QNX does it this way as well. This method shouldn't effect latency, because you are only supposed to hold a spinlock for a very short time.
  • needed badly (Score:4, Insightful)

    by xah (448501) on Sunday October 14, 2001 @01:49PM (#2427804) Homepage
    A fully preemptible kernel is important to the future of Linux. Everyone knows that the system will lock up a lot if it's misconfigured, or if a piece of hardware is buggy.

    So long as the console driver and the keyboard driver are alive, root should always be able to open a new shell and kill an offending process that is hanging the rest of the system. Right now, this is too frequently a non-option.

    • Re:needed badly (Score:1, Insightful)

      by Anonymous Coward
      Confusion again about the terms.. The only way bad software takes down the system right now is if it runs the machine out of an available resource (such as ram)

      Pre emptive kernel will NOT help this case.
      • I'm pretty sure that a buggy driver can take down the system. Wouldn't a pre-emptive kernel stop that?
      • Actually, there's some truth to his point. Say a process makes a system call and the kernel code for that call hangs in a loop. Since the scheduler won't preempt the kernel code, that process will run forever and the machine will hang. If the kernel can be preempted, the user can get to a shell and kill the stuck process. I have no idea how often this situation would happen in the real world, though. I'd think that infinite loops would be too much of a newbie bug.
        • Re:needed badly (Score:2, Interesting)

          by Peter La Casse (3992)
          Actually, there's some truth to his point. Say a process makes a system call and the kernel code for that call hangs in a loop. Since the scheduler won't preempt the kernel code, that process will run forever and the machine will hang. If the kernel can be preempted, the user can get to a shell and kill the stuck process. I have no idea how often this situation would happen in the real world, though.

          Will the linux kernel allow a user process to be killed that is blocked in a kernel call? In my experience, Solaris and Tru64 do not: a user program that is blocked in a kernel call will stay blocked until the kernel call returns, regardless of any action (short of rebooting the machine) that a user can take. I assume that there is some well-thought-out reasoning behind this, but sometimes (e.g. during device driver development) I wish it were somehow a configurable behavior.

          I'd think that infinite loops would be too much of a newbie bug.

          Lots of times, the most junior person gets stuck writing device drivers. And even experienced programmers can have brain farts.

    • Re:needed badly (Score:1, Informative)

      by Anonymous Coward
      Wrong, dewd. You're thinking of a microkernel. They're two totally different architectures.
      • Xah is correct except for one detail. As far as the scheduler is concerned, pre-emptable threads running inside the kernel should be pretty much the same as pre-emptable user-space threads in a microkernel system. They should be able to be killed and/or restarted if they've hung.

        The one mistake I think xah made was using the term "process". Linux's current design encourages the confusion between threads and processes by implementing threads as processes that happen to share "process stuff" (address space, file handles, credentials, rlimits etc).

    • Re:needed badly (Score:5, Insightful)

      by chabotc (22496) <chabotc AT gmail DOT com> on Sunday October 14, 2001 @05:57PM (#2428587) Homepage
      Actualy 9 out of 10 cases when that happens, and the hardware is locked up, it will have locked up the PCI bridge as well (they have to to communicatie), so this wont do anything.

      Also if the systeem feels locked up, and its not a hardware lock, there's a good chance its the tty/console subsystem thats killed.

      only in a few cases, where a run-away process would deal out so much of a beating to the system, then the better multithreading will help in the way you described.

      (ps, telnetting in is always a good work around for a system with a dead keyboard/console :P)
  • I thought giving the Kernel the ability to preemt other programs was important. If you give programs the ability to preempt the kernel, doesn't that just change the system back to cooperative multi-tasking? I could just see programmers abusing the ability to preempt the kernel to squeeze a little more speed out of their app.
    • by naasking (94116)
      No, I think you're misunderstanding. It's not preempting the kernel, it's preemtping a lower-priority thread that happens to be in the kernel (ie. during a system call). If there is a runnable thread with a higher priority, it should be set running. But as things currently stand, if the low-priority thread is in the kernel it can't be preempted, and so the high priority thread has to wait. That is bad.

    • I thought giving the Kernel the ability to preemt other programs was important. If you give programs the ability to preempt the kernel, doesn't that just change the system back to cooperative multi-tasking?

      Nope, because the kernel is still always in control. In a cooperative multi-tasking enviroment the userspace programs can choose to hold on to the processor as long as they like (i.e. not cooperate nicely with others). This patch simply allows a lower priority process to be interrupted by a higher priority one even if the low priority one is in the kernel, doing a system call for example. However, this preemption is done by the kernel scheduler.

      -adnans
  • by alewando (854) on Sunday October 14, 2001 @01:55PM (#2427831)
    If you're wondering what the heck a preemptive kernel entails, then here's some background [gatech.edu].

    Also, if you don't like Robert Love's implementation, then Andrew Morton maintains a patch [uow.edu.au] with a similar low-latency goal.
  • by jeffy124 (453342)
    Mac OS-X is fully pre-empt already, making it a greatly stable system.

    I can only see a fully pre-empted Linux increasing it's already solid stability.

    Now if only we could remove co-operative threading from windows....
    • Re:OS-X (Score:2, Informative)

      by JanneM (7445)
      Linux is fully preemptible, and has always been. This is about being preemptible while executing in the kernel. I have no idea if OSX allows this or not - it's BSD based, so probably no, but then Mach is involved someway or other, so maybe. It would be interesting to know.

      /Janne
      • the marketing blitz about OS-X has indicated "fully preemptive multitasking" See http://www.apple.com/macosx/ [apple.com]
        • by JanneM (7445)
          Yes, yes, but does Darwin preempt _running in the kernel_? It's not the same thing.

          "Fully preemptive multitasking" is about preempting userland programs - and Linux (and other Unices) has had this since day one.

          /Janne
          • that i do not know.

            I suspect that the answer would be yes it does because Apple had to seriously overhaul the multitasking for the user layer, and knowing the amount of work they did for OS-X, I wouldnt see them not leaving out preemtiveness from the kernel. They also used the word "fully." If the left it out of the kernel, they couldnt use it.

            But that's just my educated suspicion, i dont know for sure what fact is.
            • They also used the word "fully." If the left it out of the kernel, they couldnt use it.

              Remember, they also touted OSX as "The most advanced operating system in the world." Not that it's not a fine OS, but that's a bit of a stretch. Apple is well noted for it's marketing speak and as such, saying it's fully preemptible does not mean that the the kernel itself is preemptive. It very well may be, I just take what they say/advertise with a grain of salt. I've never been too keen on marketing speak.
            • I can't speak to whether or not OS X is kernel preemptible either, but I assume when Apple talks of "fully preemptible" they are just drawing a contrast with the cooperative multitasking that MacOS has had since day one.

  • by Anonymous Coward
    This has nothing to do with cooperative vs. preemptive multitasking. In that sense, Linux (and every other Unix-like OS on the planet) has been preemptive forever.

    This is about making the kernel preemptible, which means that a process can be preempted if it's in kernel space (i.e. making a system call) as well as when it's executing normal user code.

    Without a preemptible kernel, a process can remain on the cpu during the several milliseconds that a system call can potentially take to return or sleep, even if a higher priority process becomes runnable during that time.

  • yes, but why? (Score:3, Informative)

    by markhahn (122033) on Sunday October 14, 2001 @03:35PM (#2428104)
    it's all very well to say that you want to trade 5% of normal performance for a 200% improvement in latency. but why does anyone need better latency? afaikt, the latency here is strictly for people who want to do RT audio effects. this has nothing to do with audio playback, which has no latency sensitivity (because of buffering). this also has nothing to do with "feel", since humans are terribly slow, and cannot possibly feel the difference between 5 and 10ms.

    I hope that Linus will look at whether these patches hurt the normal case. "normal" means things like kernel compilation, not just an arbitrary latency measure and dbench (one of the least realistic benchmarks possible!)

    there are good reasons to be skeptical of all-out premptiveness: it will unavoidably lower throughput in easy-to-define cases. any intro OS text will talk about optimal scheduling, where 'optimal' requires a definition of throughput or some other metric. preemptive kernels will context switch more, and will probably interfere with the natural 'batching' that happens when a big job runs for a while. think about caches: you never want to switch unless you must. this is not an argument against low-latency! it's an arguement against lowest latency as an absolute; we need to set a target (5ms would be fine imo) and meet it. going beyond such a goal will hurt the normal case.

    • Re:yes, but why? (Score:4, Insightful)

      by Spy Hunter (317220) on Sunday October 14, 2001 @04:06PM (#2428161) Journal
      this has nothing to do with audio playback, which has no latency sensitivity (because of buffering)

      Unless you're writing a game, where sounds have to happen at specific times synchronized with events on-screen. Or you're in KDE and you want a "minimize" sound effect to happen when you press the button, not a second afterward. Or you're writing a media player and you want to have an EQ that responds immediately rather than a second from now, making it a tedious chore to adjust the settings.

      Large latency is very noticable in these situations. While it may sound like pointless whining to complain about the "minimize" sound effect being a second late, it really creates a bad perception in the user's mind about the speed of KDE. These things are actually important.

    • Re:yes, but why? (Score:5, Informative)

      by sagei (131421) <[rlove] [at] [rlove.org]> on Sunday October 14, 2001 @04:39PM (#2428292) Homepage
      Disclaimer: It is my patch

      but why does anyone need better latency? afaikt, the latency here is strictly for people who want to do RT audio effects. this has nothing to do with audio playback , which has no latency sensitivity (because of buffering). this also has nothing to do with "feel", since humans are terribly slow, and cannot possibly feel the difference between 5 and 10ms.

      You ever have an mp3 skip? Audio become out of sync in a game? That is caused by scheduling latencies becoming greater than the duration of the audio buffer. Ie, audio playback does not just need x units of CPU but it also needs it every y units of time. The preempt-kernel patch helps alleviate this.

      I hope that Linus will look at whether these patches hurt the normal case. "normal" means things like kernel compilation, not just an arbitrary latency measure and dbench (one of the least realistic benchmarks possible!)

      Not only does preempt not hurt a kernel compile, but it helps it. I and many users have benchmarks. One of my requests from users is to get a lot of benchmarks and "feelings" so I can substantiate the patch. I am _not_ an audio guy. I use my Linux machine to code, go on the net, etc. just like 90% of the people here. Preemption helps me. I don't want to hurt the common case either.

      Even so, it is a configure item. Merging it into the kernel does not equate to you having to use it. But I bet you would want to!

      there are good reasons to be skeptical of all-out premptiveness: it will unavoidably lower throughput in easy-to-define cases. any intro OS text will talk about optimal scheduling, where 'optimal' requires a definition of throughput or some other metric.

      The cases in which we lower throughput are cases in which file I/O is favored since it runs until completition. In this case, you can extend that argument to be that I/O-intense tasks should just be cooperatively scheduled. An I/O task won't be preempted unless its timeslice has run out (ie, it should be preempted, and it would be if it were in userspace). If the I/O is so critical, run it at a higher priority. Hell, maybe we should look into a higher timeslice.

      Note that a lot of this is a non-issue, since we don't affect throughput (or actually improve it!) In the cases throughput is decreased, it is just a couple of percent, which could be cost-benefited to the increase in response some other application gets.

      we need to set a target (5ms would be fine imo) and meet it. going beyond such a goal will hurt the normal case.

      This is very very true, and an insightful point. One of the problems with this whole latency quest is that eventually we are going to reach some point and have to decide if enough-is-enough. We can always keep doing more and eventually the work _is_ going to be detrimental to the common-case. I agree we need to set a threshold and celebrate when we reach it. The super-special situations needing much lower latency can apply super-special solutions.

  • Options... (Score:2, Informative)

    by Mike McTernan (260224)
    Whether this patch is added or not is surely just a question of whether it is stable enough or not.

    As it says in the interview, the enablement of the patch is an option in the config... For those that want it (i.e. most desktop users I would expect) it's there. For those that don't, it can be disabled.

    It seems that the patch works, as scientifically explored by his benchmarks. If there is a fault in the patch, I'm sure that half of slashdot will email the chap.

    In summary, it works, is probably stable and can be enabled/disabled in config if needed. It already does, and probably can, benefit lots of people.

    Put it in!
    (At worst it can be removed and a new kernel released the day after... hehe)

  • Tried this patch before... it works great adds a nice option in the kernel config. But the problem is pcmcia-cs doesn't work with it. Says in the changelog it will be fixed on the next release of pcmcia-cs but I want it now!
    It does work nicely... everything is a lot more responsive.
    Great work!
  • by bruns (75399) <bruns&2mbit,com> on Sunday October 14, 2001 @08:09PM (#2428947) Homepage
    After messing with it on several machines, here is what I have found.

    * it doesn't work well on a shell server or anything which might have alot of disk activity. The changes seem to do everything at the expense of disk IO and network IO. I do see better speed on interactive stuff though. Its not worth the hit in IO.

    * there is no option to turn it off while in operation. Means you have to run different kernels if you want to do some things with the preempt, and other stuff without.
    • * there is no option to turn it off while in operation. Means you have to run different kernels if you want to do some things with the preempt, and other stuff without.

      This is something I haven't seen brought up yet. Windows 2000 can change favor from background to foreground processes on the fly (right click 'My Computer' and check out the properties). Now while, this is not the same thing, it's in the ballpark to most users who understand it as something that speeds up their apps. We really need a /proc switch that lets you turn this thing off and on.
  • Linus is suspicious (Score:3, Informative)

    by steveha (103154) on Sunday October 14, 2001 @11:39PM (#2429480) Homepage
    In his recent interview [osnews.com] on osnews.com [osnews.com], Linus said he was in no hurry to include the kernel preemption patches in the official kernel source. He said:

    Some people have been playing with using the [SMP] locks on UP too, creating a fully preemptible kernel. A lot of people are playing around with the patches, and we'll see when/if I'll integrate them into the standard tree. It's not a high priority for me: they don't add performance (like the SMP scalability does), and if they improve latency noticeably I'd really rather look at why the latency is bad in the first place.

    So right now as far as I'm concerned it's one of those "cool features" things, and it will need some prodding from the real world to show whether it is worth it.

    I was surprised he said this. This isn't a big scary kludge that inserts a bunch of hacks all over the place in the kernel; this is a relatively small patch that simply leverages all the SMP work. It won't make the kernel uglier or harder to maintain, so IMHO it is very worth adding.

    I am confident that Linus will get that prodding from the real world he is waiting for, because my own experiences with this patch are overwhelmingly positive. I'm using kernel 2.4.10 with the preemption patch on my desktop Linux boxes, and I love the snappy feel it gives my system. Playing back MP3 music never skips now, and my K6-III/450 system pops up web pages in Galeon so fast it feels like an Athlon system.

    Kudos to Robert Love and anyone else who worked on this patch.

    steveha
    • Linus: and if they improve latency noticeably I'd really rather look at why the latency is bad
      in the first place.


      I don't really agree with this. I would imagine that there are many things in the kernel that could be written much cleaner, smaller, and faster if the writer did not have to worry about latency. Since making the kernel pre-emptable would allow this I could see things actually improving as a result.

      PS: I don't know anything about kernel design, so ignore my comment if necessary...

  • I don't know how this might manifest itself, but I could see how some existing, highly tuned programs could have problems with such a patch. If software is developed with the current mainstream kernel in mind, it may make certain internal assumptions about not being preempted during certain operations, and some timing getting messed up because of that. One poster mentioned a potential VMware problem, which could be an effect of something like this.
    I would like to see this patch in there, but I could see some reasons to be hesitant about putting it in now. I would love to see latency on the level of QNX, that seems very responsev..

6 Curses = 1 Hexahex

Working...