Forgot your password?
Linux Software

2.4, The Kernel of Pain 730

Posted by jamie
from the my-destiny-to-be dept.
Joshua Drake has written an article for called The Kernel of Pain. He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?
This discussion has been archived. No new comments can be posted.

2.4, The Kernel of Pain

Comments Filter:
  • 2.4.x has been OK, albeit not totally stable. I've got 2.4.17 running and I like it quite a bit. As for me, it has probably been benifical since it got me reading a bit of the LKML, and learning more of how to do stuff with my kernel.

  • Au contraire (Score:3, Interesting)

    by AntiPasto (168263) on Thursday January 17, 2002 @03:34AM (#2853104) Journal
    I would say it is almost the opposite. I think Linux is very stable for the server end, but what about the desktop?!

    I'll tell ya, I tried the preemptive patches, and all the -ac stuff naturally, and well, the desktop just isn't snappy ... I mean, Windows (follow me here) just feels better. I don't need a force feedback mouse or anything, it just doesn't not show me that it is rendering a window... and that's something that Gnome was doing even on 450 mhz machine.

    Also, even with the preemptive patches, I could hold down a key in, say, star office or abi word, and it would stutter! Hold down the arrow key, and it stutters.

    These are basic inferace issues that could use some due attention before Linux is ever ready for the desktop.

    • Re:Au contraire (Score:3, Insightful)

      by PD (9577)
      I hadn't noticed that Linux was any slower feeling than Windows. On my Celery 300A Windows is PAINFUL to use, but Linux is amazingly quick - running 2.4.17. I run Windowmaker, and that's it. No Gnome, no KDE, no funny transparent terms.
      • Re:Au contraire (Score:3, Insightful)

        by roguerez (319598)
        Must have been a problem with your system. I have been running Windows 2000 on a K6-266 with 128 MB of RAM for about a year. It flew. It's important to have good disk access, so I put a 10 MB 7200 rpm disk in it and installed Windows on that, which made it even snappier.

        The only reasons I bought a new machine is that I needed the K6 to act as a FreeBSD box and because I wanted to play DiVX and games, both of which demand more than a K6-266 regardless of the OS used.
      • Re:Au contraire (Score:2, Insightful)

        by bartok (111886)
        Windowmaker is a window manager, not a desktop environment and that's why it's faster. You don't gets half of the features and integrations a real desktop provides when you only run a WM.

        What you're saying is equivalent to one of the many post on using VI when the discussion's topic is IDEs. fine if it does the job for you but 90% of people out there want the fully integrated DEs like GNOME and KDE.
      • Re:Au contraire (Score:3, Interesting)

        by Cryptnotic (154382)
        2.4.17 wasn't the problem. 2.4.17 finally fixed the problems that were inherent in all of the 2.4.* kernels before 2.4.17. If you read the article, you'd see that he had problems with 2.4.6 and 2.4.9 and even 2.4.16.

        Also, the problem wasn't that the system was slow, but that when you had many active processes, the system would respond very poorly or lock up.

    • Re:Au contraire (Score:5, Interesting)

      by ElOttoGrande (183478) on Thursday January 17, 2002 @03:49AM (#2853153)
      it just doesn't not show me that it is rendering a window... and that's something that Gnome was doing even on 450 mhz machine.

      The preemptive patches have made my system a lot more responsive under use. Most notably the mouse cursor doesn't slow down during heavy compiles and audio latency is good enough to play with some of the more interesting sound software projects out for linux.

      But it really sounds like your problem isn't with linux but with XFree86. X has its share of problems but if you have a good video card that's supported well under it, you should get more than acceptible 2d drawing performance. I use a 3dfx voodoo3 here and its about as good as win2k running KDE (sometimes you can see it rendering when resizing or moving windows quickly but i like to think of it as a cool effect ;) and its way faster with lighter WM's like blackbox.

    • Re:Au contraire (Score:4, Informative)

      by Ace Rimmer (179561) on Thursday January 17, 2002 @04:38AM (#2853263)
      Try the low-latency patches to 2.4 tree. They have much better impact than those call "preemptive".

      nice -n -10 /usr/bin/X11/X
      helps quite a lot on an average desktop linux
    • Re:Au contraire (Score:5, Informative)

      by hansendc (95162) on Thursday January 17, 2002 @05:19AM (#2853309) Homepage
      What are you smoking?!? High end box DOES NOT mean your 1.2 GHz Athlon!! We're talking about machines with >8 processors here. Machines which need to use the PPro PAE so that over 4gig of memory can be addressed.
      There are serious VM stability issues with these systems. Ever wonder why Redhat hasn't released a >2.4.9 kernel? It's because 2.4.10 is where the new VM system went in. Redhat is busily porting Rick van Riel's 2.4.9 VM up to the later kernels so that they can use it.
    • Yes, I've noticed something similar. XP on my 900 MHz Athlon is noticably snappier than Linux with KDE or Gnome on my 1200 MHz Athlon. Much of this is that XP simply caches the heck out of everything in sight. Simple, but very effective.
    • Re:Au contraire (Score:5, Insightful)

      by captaineo (87164) on Thursday January 17, 2002 @06:46AM (#2853466)
      I definitely see this too... In fact I'm about to start a full crusade against shitty windowing performance. (long visible lags between exposure and repaint, very jerky moving/resizing, etc). These are very real issues on Linux/XFree86. I plan to go as far as shooting my screen with a 100Hz camera to really see what's going on, retrace by retrace.

      There are many links in the GUI chain, any of which can cause a problem. Roughly from top to bottom-

      1. Widget toolkit (GTK, QT, etc)
      2. Client painting library (GDK, QT, etc)
      3. Window manager
      4. X protocol
      5. context switches
      6. X server
      7. 2D video card driver

      The folklore seems to be that 4, 5, and 7 are the major problems - "the X protocol is badly designed, switching between client, server, and window manager processes is too expensive, and XFree86's video drivers are no good."

      In reality though, the problems aren't where most people expect. The X protocol is not generally a bottleneck, especially if the client programmer knows what he's doing (wait until the input queue empties before repainting anything, avoid synchronous behavior, double-buffer windows using server-side pixmaps, etc). The copy-and-context-switch overhead isn't too bad either (keep in mind that context switches are much more expensive on Windows, and Windows is the platform to beat for 2D smoothness!). And finally, many of the 2D drivers really do take advantage of all the hardware offers.

      The real culprits are turning out to be 1 and 3 - the tookits and window managers. Many of the Linux toolkits (especially GTK) have very advanced widget alignment/constraint systems that bog down when windows are resized. Some toolkits are doing naughty things with the event loop (painting while events are still in the input queue, or trying to "optimize" by pausing for new events), and most of them aren't fully double-buffered yet (though GTK 2.0 and recent KDE/QT are most of the way there). Window managers are some of the most horrific perpetrators of 2D crappiness. Some of them try too hard to snap or quantize window sizes and positions, resulting in jerky motion. Kwin seems to prolong expose/repaint cycles much more than necessary. And finally, I will make one criticism of X's overall architecture - I don't think separating the window manager from the X server was a good choice. The asynchronous relationship between X and the wm can cause nasty delays in window moving and resizing. (plus, all widely-used wm's have basically the same features these days; what's the use of having a choice? ;]).

      I used to think that the only way to get perfectly smooth 2D would be to embed the widget toolkit in the X server, so that it could handle repainting all on its own. Now I don't think one needs to go that far; it may just take a well-written window manager, and a similarly carefully-designed widget toolkit. (though it may be helpful for the server to mandatorily double-buffer every window - hey, video RAM is plentiful these days =)

      There are lots of issues I haven't investigated yet - for instance, I think Windows may be doing something interesting with vblank; dragging windows around seems to show a lot less tearing compared to X... Also, 3D OpenGL windows seem to cause much worse artifacting on both X and MS Windows. It's almost possible to bring an animating OpenGL program to a complete halt just by resizing the window, or dragging another window in front of it.

      It's an interesting problem, and I'm glad to see I'm not the only one who cares about it. I find it apalling that (to my knowledge) not one major 2D GUI system has been able to produce 100% correct results - i.e. every window correctly drawn on every single monitor retrace, even while dragging or resizing. Why should we settle for less than 100%?
      • Re:Au contraire (Score:5, Informative)

        by Havoc Pennington (87913) on Thursday January 17, 2002 @11:33AM (#2854426)

        As the author of a window manager and big hunks of GTK, I don't think your analysis is quite right.

        The primary problem is synchronization, not delay. GTK 1.2 is very fast, its geometry code is not causing any slowness. You are confusing slow with flicker. Flicker looks slow but slow is not the problem; no matter how fast code is, if it flickers, you will see it, and it will look slow.

        Similarly when opaque resizing a window; it has nothing to do with quantization or speed, the problem is that the window manager frame and the client are not resized/drawn at the same time resulting in a "tearing" effect. This would be visible no matter how fast you make things.

        As you say, putting the toolkit in the server or putting the WM in the toolkit are overradical ways to fix this. It's not even necessary to backing store all X windows. It could be done with an extension allowing us to push a backing store on a single X window during a resize, for example. However fixing it 100% pretty clearly requires server changes, and that's why you haven't seen a fix yet.

        • Re:Au contraire (Score:3, Insightful)

          by spitzak (4019)
          This is correct. The unavoidable flicker is due to the fact that the window frame is drawn by a seperate program (the window manager) than the contents.

          Other causes of flicker: multiple visuals (not a problem on most Linux XFree86 systems), and toolkits (fixable with double buffering and can be reduced though not eliminated by the programmer of the toolkit).

          I think the window hould be put into the toolkit. The window borders are no different than any other widget. In fact I believe far more code is expended trying to talk to a window manager than would be needed to do this in a toolkit (which already contains code to draw the buttons and borders). This would allow new ideas in window management to be experimented with, such as getting rid of the borders entirely.

          The system might provide a "Task Manager" (using the term taken from Windows) that any program creating a window would talk to. The program would indicate the task that window belonged to and the name of the window itself. The task manager would send commands like "raise this window" or "map this window" or "hide this window" to the program, and by watching the visiblity and positions of windows could provide pagers, icons, and taskbar type interfaces.

          I strongly believe that putting widgets into the server is BAD. If X had done this we would be using Athena widgets right now and X would look laughably bad. The fact that X can emulate Windows and Mac interface designs invented 10 years after X was is definate proof that keeping UI elements out of it was the best possible design.

  • by SuperDuG (134989) < minus physicist> on Thursday January 17, 2002 @03:37AM (#2853107) Homepage Journal
    2.4 was long over do. Does anyone else remember the "coming soon ... erm wait ..." and the date kept getting pushed back further and further ...

    I really like using USB, and I like not having to use ALSA for my sound card (not that I have anything against ALSA).

    After playing around with debian the other day and seeing all of my hardware that WON'T WORK with the 2.2 series it has basically come to my attention that I am all for the 2.4 series.

    Linux is a continously developing system, whether it be the kernel, distribution, or software. Linux will always be "In Developement". Which is perfect for linux.

    So yeah ... if you don't like 2.4 ... go back to 2.2 ... yeah ... thought so :-P

    • Call me precautious but I usually test out everything, including the kernel by running it on clients and development servers before putting it on any mission critical servers. As much as I like the improvements I didn't find it stable enough for heavy usage until recently so I just never upgraded any major servers to use it until now. No pain at all because as an admin I did my job. Anyway it always did pretty good for me unless I put it on a total crap box (of which I have many) and stressed it a lot (which I tend to do) so I don't think it had that big of problems to begin with. In reality Netscape was the only program I found that caused consistant problems with the 2.4 kernels. From time to time programs like Xine would also but that was usually when I did something stupid like trying to run several movies at once on a low end machine with barely enough RAM to breath. My development web servers don't get a lot of traffic but they do some heavy data processing and I never noticed any problem there.
    • Why Linux? (Score:3, Insightful)

      by evilviper (135110)
      I've always found it incredible how hypocritical SOME peole can be.

      The big arguement FOR linux up until recently was stability of the OS. With Windows 2000 (and XP I assume) it seems that Linux users are now the ones willing to put up with the more problematic OS, and for some of the same reasons Windows users clung to not-long-ago.

      Now my question... Why use Linux? It's that needlessly complex and clunky operating system in-between Windows/OS X.1 & the *BSDs. Windows & the *BSDs being far easier to configure than Linux, with the *BSDs being faster, more secure, more stable, and simply smoother (less clunky) all around.

      The *BSDs are PnP (no need for Kudzu) no kernel modules to be manually configured, and the configuration files are far simpler than ANY Linux distro (although you CAN use Sys V scripts instead if you are so inclined-anyone who uses the BSD-style scripts for awhile will not want to use anything else though).

      So I ask politely, hoping to avoid flames and rants... Why choose Linux? It's not the most stable, the most secure, the fastest, the most free, the easiest.
      • Re:Why Linux? (Score:4, Insightful)

        by orangesquid (79734) <orangesquid&yahoo,com> on Thursday January 17, 2002 @09:29AM (#2853796) Homepage Journal
        Actually, there's many reasons I still use Linux rather than BSD, but the chief one being:

        Linux is a buzzword, and being one, gets the benefits. When people talk about "non-Windows support," what jumps to mind is "Linux" even before "Mac" for many developers. Thus, there are many precompiled binaries, precompiled kernel modules, etc., that run under Linux. (I know BSD can run many Linux binaries, but what about kernel modules?) Additionally, many people are actively developing hardware drivers for Linux, but not so many for BSD.

        Plus, it's very easy to find support for various Linux-related problems, because so many people use it.
      • Re:Why Linux? (Score:4, Insightful)

        by Codifex Maximus (639) on Thursday January 17, 2002 @11:02AM (#2854221) Homepage
        > So I ask politely, hoping to avoid flames and
        > rants... Why choose Linux? It's not the most
        > stable, the most secure, the fastest, the most
        > free, the easiest.

        I like Linux would be the first answer that comes to mind. Linux is very stable, very secure, and quite fast, very free, and once you get to know it - very easy! Linux is all these things and more.

        Linux is stable - OpenBSD may be more stable.
        Linux is secure - NetBSD may be more secure.
        Linux is fast - BeOS may be faster.
        Linux is free - FreeBSD may be freer.
        Linux is easy - OS-X may be easier.

        Linux gives me all these benefits in one package AND the GPL'd codebase keeps getting richer.
    • by pellaeon (547513) on Thursday January 17, 2002 @09:27AM (#2853787) Homepage
      almost from the start (2.4.3 if I remember correctly). We patched XFS into the kernels and some other stuff we needed and the transition to 2.4 was relatively painless using a fresh install (for XFS).

      We ran into some trouble with a number of Athlon systems but that was due to the 'Athlon bug' and was soon fixed. More worrisome was the performance of pre-2.4.9 kernels on the desktop: sometimes they slowed down to a crawl (and i'm talking about lightly loaded ~750MHz machines here).

      We got over that with the -ac kernels however, and it's been a breeze ever since. We currently use 2.4.14 with XFS patched in (although we're ditching it in favor of ext3 now that it's been integrated and the RH installer supports it) and we're looking at 2.4.17 now.

      Why use 2.4 on servers (as some have asked)? Well, iptables is a good reason, for one. Other security-related things count heavily too. And XFS seemed a good reason to do it at the time too. It can deliver very good performance.

      Some stats:
      zuse [1] > uname -a
      Linux zuse 2.4.14-xfs_MI10 #1 Tue Nov 6 17:34:04 MET 2001 i686 unknown
      zuse [2] > uptime
      2:25pm up 61 days, 21:21, 1 user, load average: 1.07, 1.02, 0.93
  • I've had a lot of poor swap perfromance on my 2.4.x kernel compared to my 2.2.x kernel. On my dual processor machine with 1G ram I haven't had problems, but then I use it so lightly it has never had to swap anything! On anything where normal load causes some swap out, I get mighty slow response when I go to do something after some idle time: type, change input focus in X, etc.

    I imagine I could suss it out, but it isn't a big issue for me. I'm told later 2.4.x kernels fix this (I'm running 2.4.9).

    Anecdotal, I know. For myself, I'd run 2.2.x still on production systems. But I don't run any big production systems...
  • by jorre (40596)
    We've got a 2.4.7 kernel with RTAI real-time extentions for a house automation system running for several months now without ANY problem. Besides the house automation stuff, this box also acts as a mail,web,ftp,file,whatever server. 2.4 unstanble? I don't think so!
  • Alphas (Score:5, Informative)

    by Paul Komarek (794) <> on Thursday January 17, 2002 @03:38AM (#2853114) Homepage
    And this guy appears to be talking about only x86 machines. My lab has had a horrible time with 2.4 on Alphas. In fact, we've moved back to 2.2.18 on some macines. (2.2.20 for Alpha didn't compile properly, and I didn't want to mess with it -- anyone know if which 2.2 kernel is best for number-crunching Alphas right now?). Oh, the pain. The lost time. "Kernel of Pain" is a fine description of our 2.4 experience on Alphas.

    -Paul Komarek
  • I've been using linux for nearly 7 years and the 2.4 tree has been pretty buggy for a stable kernel. 2.2 was always pretty rock solid for me, and 2.4 was quite unuseable for me until after 2.4.7 when SCSI emulation and loopback filesystems started working for me again. I think 2.4 was a bit rushed, but I'm glad it was, I will start experimenting with the unstable trees now, its much more exciting!

  • My experience (Score:5, Informative)

    by nzhavok (254960) on Thursday January 17, 2002 @03:39AM (#2853117) Homepage
    What have your experiences been?

    8:33pm up 45 days, 5:49,

    Shameful I know, but I had to move city before that I had 6 months. Should had a UPS ;-)

    This is pretty much a desktop/development box running postgres, JBoss, tomcat, apache, JBuilder and (occasionally) kylix. No problems so far, touch wood.

    I also used to work at the comp-sci department of a university were we had 40 boxes in the linux lab, no real problems except they were running ext2 so only the occasional manual fsck. Now the maclab, that is another story (OS9 not OSX).
    • My server (which runs about 10 daemons in addition to many other small things) has an uptime of 72 days and not a single problem (It's been about 80 since I installed slackware on it, and I'm now running 2.4.12-ac6 - I had to reboot to get that in and another time after I reconfigured it for IPv6). I think the most telling thing about my experience is that my kernel was the most recent one available last time I rebooted. I'm going for 256 days, then I'll put in a new kernel :)
    • by Matts (1628)
      The people reporting problems are talking about heavily loaded systems usually on SMP systems. This is the stuff 2.4 promised to deliver, and while it's delivered them, it's delivered unstable stuff.

      But this is the way of open source - it's obvious this stuff wasn't tested to destruction while still in the 2.3 phase, or we wouldn't be seeing this stuff. However distribution developers should be doing this testing before releasing a new OS, and they're obviously not doing so.
  • by Kenneth Stephen (1950) on Thursday January 17, 2002 @03:45AM (#2853137) Journal

    As a sysadmin, I have to state that the 2.4 kernels have ruined whatever reputation that existed before about the 2.2n series kernels being stable. Atleast in the 2.0 and the 2.2 series, you had islands of stability where really careful distributions could pick a kernel version as their default kernel. One of the main problems with Debian not finalizing a 2.4 kernel has been due to the fact that there hasnt been an island of stability so far in the 2.4 series.

    And I've been waiting a long time now. The early 2.4 series didnt really work out on my SMP servers. The 2.4.6 onwards kernels broke Tulip support for me. Then came the VM switch. Then just when I decide, ok, 2.4.16 seems stable enough, we have the OOM problem. And I also keep hearing statements being made about the new VM being more friendly to desktop systems than servers.....

    Now if only 2.2 offered iptables.....

    • by FreeUser (11483) on Thursday January 17, 2002 @11:48AM (#2854531)
      ... you are absolutely correct in observing that the 2.4 debacle has used up a great deal of Linux's reputation for being stable. I use 2.4.x with SGI's xfs patches both in production systems at work, and at home (like others, we need various features of 2.4.x not available in 2.2.x), and while it has never been anything close to as flakey as the most stable of Microsoft systems, it has in comparison to 2.2.x (and FreeBSD for that matter) been pretty damn unreliable. In comparison to just about everything else it is still quite stable, so happiness is indeed to some degree relative.

      And now for some arm chair quarterbacking, all that having been said, I really think Linus needs to excersize some self discipline and stay away from maintaining even-numbered kernel releases (x.0.x, x.2.x, x.4.x, etc.). By his own admission he isn't good at being a stable kernel maintainer and prefers the more interesting work done in development kernels, and his track record in 2.2 wasn't fantastic (particularly in comparison to 2.0, where he did a fantastic job) and was pretty abysmal in 2.4. As someone who's been using GNU/Linux since the early pre 1.0 days I hope he'll put his efforts where his talents are (managing changes in odd numbered development releases) and leave stable maintenance to Cox and Marcelo (who are very good at maintaining and improving stable releases). But enough commentary from the peanut gallery...
  • by renoX (11677) on Thursday January 17, 2002 @03:46AM (#2853143)
    This guy is complaining that he had troubles on a production server with Mandrake8.1 and its kernel 2.4.

    But Mandrake 8.1 ships with both kernel 2.4 and 2.2.
    The idea behind it is: if you need all the fancy stuff use 2.4 but if you want stability use 2.2.

    So using 2.4 on a server and then complaining that it isn't stable enough is silly IMHO.

    That said I agree that 2.4 has been slow to stabilize (VM mess apparently caused by communications problems between Linus and Rick Van Riel).
    • if you need all the fancy stuff use 2.4 but if you want stability use 2.2.
      Yeah, cause Linus was joking when he said that even numbers were "stable".

      2.4 is a supposedly stable tree.
      It's supposed to be Odd versions have fancy (ie experimental) stuff, use at own risk, Even versions are stable and suitable for real usage.

      So using 2.4 on a server and then complaining that it isn't stable enough is silly IMHO
      Then Linus should stop saying that the even versions are stable.

      Insert obligatory *BSD advert here

    • But 2.4 is *supposed* to be stable. That's why it's called a stable kernel branch. It's perfectly justified to complain that it's not stable.

      And as far as the VM mess, it wasn't really an issue of communication. It was an issue of Linus arbitrarily accepting some patches from Rik, and ignoring others. Alan Cox at least made a real attempt to incorporate all of Rik's VM patches in the -ac branch. And the -ac branch had a much improved VM as a result. But Linus didn't make the effort for some reason.

      The reason 2.4 has been unstable is because the maintainership has been poor. Usually Linus turns over maintainership to someone else (previously Alan) very early on in the series. I think that happened at 2.2.7 for the 2.2 series. Alan puts out of lots of prepatches and gives people enough time to test prerelease kernel patches. Linus is random about it. He'll release a kernel that has changes that weren't in the prepatches. And a bunch of times those changes broke something badly. It probably doesn't help that he has a day job. Alan gets paid to work on Linux full time. The 2.2 series only started getting stable when Alan took over. 2.4 only just recently got handed off.
      • man, this type of comments really pisses me off. why is everyone falling on top of linus sudenly? linus receives billions of patches from different people, do you really want him to check carefully each patch and make sure they work???

        he is right on putting the responsability on top of the maintainer, otherwise IT WOULD NOT BE A DISTRIBUTED FRIGGIN SYSTEM. and a project this large can only work if everyone makes sure their little bit works. now i tell you this much, y'all falling on top of linus but i reckon that if rick had followed linus's model and had made sure the VM patches were all applied in order, at this time we would probably be celebrating the reasonable quality of the 2.4 series. thats how ficke we are, mate.

        nuff said.

  • by Cryptnotic (154382) on Thursday January 17, 2002 @03:47AM (#2853146) Homepage
    The article is a little short on the details, but we had a similar problem here at work with a new Redhat 7.2 server (kernel 2.4.9) we were setting up. The machine was to be a CVS/file server, running a cvs pserver and Samba. It had 1GB of main memory, and a 180 GB RAID5 array (external via a Mylex RAID card w/ LVD SCSI U160). The machine would seem to run fine, but then in testing, the machine would block on processes for seemingly no reason. It was something in the [kswapd] kernel process that was blocking things. If you logged in at a terminal or over a network, you'd get extreme "stuttering" on your responsiveness. Basically, it was unresponsive under loads with several running processes. This wasn't even excessive.

    Oh yeah, and the machine would crash randomly and lose data. We were using ext3, so the file system was (supposedly) still consistant, but whatever was being worked on would be lost.

    Ultimately, we upgraded the kernel to 2.4.17, and the problems have been fixed. But the "even number == stable reliable" rule failed us that time.

    Since then, I've read that "the entire VM system in 2.4 was replaced around 2.4.10". This really scares me. I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.


    • I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.

      Neither Linus, nor Alan Cox maintain 2.4 at the moment. Marcelo Tosatti does, and from what I read on LKML some ppl thought that to be a bad move at the beginning, but I think it works out just great (the first release he made was 2.4.17 IIRC)

  • by mvdwege (243851) <> on Thursday January 17, 2002 @03:48AM (#2853149) Homepage Journal

    First, he replaces a known working server with something new. Then he keeps on adding bleeding edge newest kernel upon newest kernel to this box (following his narrative, it sounds as if he installed new kernels upon release).

    Second, nowhere does he mention why he needed a 2.4 kernel in the first place. In fact, he mentions how he finally decided to downgrade to 2.2.

    So, in conclusion: He upgrades to the bleeding edge without proper need, and when trouble ensues, instead of rolling back, he continues upgrading. Tell me why this guy is not a hopelessly incompetent sysadmin who's trying to blame Linux for his shortcomings?

    Hell, even I as a home user waited until 2.4.17 before upgrading my main box from 2.2.19. If I can perceive the weaknesses of the 2.4 kernel, why can't a professional do so?

    • by bakreule (95098) <> on Thursday January 17, 2002 @05:35AM (#2853330) Homepage
      "Hopelessly incompenetent"??? Are you kidding? You think he has shortcomings because he was doing what every single rational person does when encountering a software problem? When a program that I buy/download doesn't work, I immediately search for a patch. VERY reasonable behaviour.

      Far be it for me to criticize Linus, et. al as I could never do what they do, but the shortcomings are not with this guy, but with the buggy kernels. These are release kernels, they are not beta kernals. I think, considering the reputation of Linux, that a release kernel should be stable. Yes, bugs happen, and when they do, you would expect a patch to fix these problems.

      If everyone did as you suggested and rolled back to 2.2.x at the first whiff of trouble, who would be out using these "bleeding edge kernels"??

      I think you should cut the boy some slack.....

    • by Lumpy (12016) on Thursday January 17, 2002 @07:50AM (#2853565) Homepage
      EXACTLY!!!! Everyone that has been bitching about the 2.4 series could never give a real reason why they switches their servers to it. I swirtched to 2.4 and kept using every version because of my firewire video editing projects. and firewire is just now getting stable and useable. a server does not need this. nor does it need usb, or anything else added to the 2.4 series. All of my linux servers at work are running 2.2 and will continue to do so until they NEED a 2.4 kernel. It this insane constant "tinkering" that many linux zealots do that makes management not even consider linux as an option. MY boss mentioned that another devision asked him how we kept the linux boxes running well, I told him that we installed it ,configured it and then LEFT IT ALONE except for security patches. and that kiddies is the key to any server..
    • This is hilarious!

      At my work, just yesterday we were discussing how frustrating it was that users would downgrade when they had a problem instead of reporting it or checking for a newer version! The argument was that since they kept doing that, we could never determine if a new version needed bug fixes or not, and the bug reports we did get were meaningless because they were always on dated versions. I find this to be a common mentality.

      Now I hear the exact opposite. This guy did exactly the right thing. Don't use beta versions, but if you have a problem, upgrade to the NEWEST, don't downgrade to an old version.
    • He mentions in passing that the reason for wanting to use 2.4 are the 'big iron' features - better support for large memory, large file and SMP. Notice it's better support so 2.2 will work but may not be exploiting his hardware to its best ability.

      The reason he doesn't downgrade immediately is that it's a big job. Compared to a downgrade - which presumably involves a backup, rebuild and restore (sounds like several hours downtime), an upgrade to the next kernel is basically a reboot.

      The fact is they took a significant decision when they decided to go 2.4 to begin with. Having made that decision - rightly or wrongly - he then has to make decisions about what to do when he hits bugs. The business may prefer (initially at least) to live with the problems rather than face a prolonged downtime for the downgrade. Or live with them until they can schedule such a downtime.

      There may have been things he could have done better but hopelessly incompetent is a bit harsh.
  • 2.4 is hit and miss. (Score:5, Interesting)

    by aussersterne (212916) on Thursday January 17, 2002 @03:48AM (#2853150) Homepage
    We're running the Red Hat 2.4.9-13 kernel on several SMP database servers and they have been perfect (not rebooted since 'rpm -U' of the new kernel) for several weeks. Before that, we were running 2.4.7-something from Red Hat and they were the same -- ran straight from the day we installed the kernel to the day we updated without needing to be restarted.

    On my desktop machine, I've taken more risks (installed pretty much every official 2.4.x-linus release as they have come out) and some have been good, while others have been total dogs.

    I'm running 2.4.17 right now. It seems okay; I've only had a freeze-up once over the last couple of weeks, though it was a total hard freeze (i.e. no ping, no magic SysRq, no nothing), which I haven't had in Linux for several years.

    The obvious issue is VM; if you keep lots of memory (768M, or preferably 1.0G+) in your system, things to much more smoothly, though MP3 playback still skips a little.

    Right now, I'd prefer some work on the RAID and IDE performance issues. One or two of the 2.4 series have had disk performance 100%+ better than the current 2.4 kernels. Why? I'd like to get the disk I/O back to reasonable levels.
    • We're shipping machines with 2.4.9 preinstalled, and have had few problems; these run the gamut of mid- to high-range stuff, Intel and AMD... we did have one nasty problem involving a Mylex card, but after talking to the gentleman that wrote it, I went to 2.4.17-0.13 (Rawhide, with whatever AC did to that - it's not a Linus kernel) and pounded the hell out of them and couldn't make it die.... my intent is to make it our standard shipping kernel within a week or two...

      The original article author went off and yelled about this problem and that problem in the Linus kernels, but totally left Red Hat's stuff out in the cold until the very end.... yes, I admit, right now is not a good time to be following 100% pristine Linus code. But the beauty of Linux now is what everybody feared would get really ugly: We have SEVERAL forks in the code, and at least one of them is working quite well....

      I'd still rather run Alan's beta code than the best Bill can possibly offer.

  • Cluestick (Score:4, Insightful)

    by thimo (36102) on Thursday January 17, 2002 @03:59AM (#2853176) Homepage
    Oke, so we're talking:

    1. Mandrake 8.0, *the* desktop distribution _and_ a dot zero release.
    2. A kernel lt/eq 2.4.6 with known problems and definetaly /not proven/.
    3. A large-scale *production* server.

    Somebody hit this guy with a cluestick! Please?

  • by arsaspe (539022)
    I run Linux 2.4.16-pre1 on both my desktop machine and a server and have never had any probs (except for the odd system slowdown due to ext3 sync()`ing, but winME was much worse.) Ironicly, I run windows XP as a NAT server on my dialup box, because it also has to run some windows-only software that doesnt like wine. It took me HOURS to get the bloody thing setup and working, and I spent another 3 hours downloading all the patches, plus a virus scanner (AVG... very good-, ZoneAlarm, and then had to wrestle with XP's bullshit "User friendly" configuration while it told me that everything I did wasn't a good idea. After all that, XP's built in 'firewall' (which is on even though I turned it off) conflicts with ZoneAlarm, and constantly locks down all internet traffic, requiring a reboot. It also runs like a sloth with 520mb ram on a 1.5 ghz p4. And to top it all off, XP constantly refuses to connect to my ISP... which are running "Incompatible" windows2000 servers.
  • Worked for me. (Score:5, Interesting)

    by roystgnr (4015) <roystgnr AT ticam DOT utexas DOT edu> on Thursday January 17, 2002 @04:03AM (#2853182) Homepage
    There was a bad period where the Soundblaster Live driver (particularly mixer settings) was broken. That lasted through at least three kernel releases. There was a worse period where the VM had fits, and where performance degraded way too rapidly if the system had to swap. That lasted at least six kernel releases. There were at least one or two releases where I discovered that Alan Cox's (usually more bleeding edge) tree was being better behaved.

    Of course, whenever I'm playing around with this stuff I don't delete my "last known good" kernel, so if after a couple hours or a couple days I noticed a problem, I just booted back to what worked. The default (albeit heavily patched) Red Hat kernels were good, so "last known good" always existed for me.

    To summarize: this hasn't been a source of inconvenience for me, but it has been one of vicarious embarrassment. I've only been using Linux since 2.0.somehighnumber, but this is the worst mess I've seen the "stable" kernel tree go through in that time. Don't get me wrong, I've experienced system-crashing bugs (a tulip driver that freaked at some tulip chipset clones, some really bad OOM behavior a couple years ago) before, and pragmatically I guess that's worse... but those problems were always fixed fast enough that the patches predated my bug reports. Watching even the top kernel developers seem to flounder for months over bugs in a core part of the OS like the virtual memory system just sucked.
  • 2.4.16 + preempt (Score:3, Interesting)

    by mirko (198274) on Thursday January 17, 2002 @04:04AM (#2853185) Journal
    I have a kernel 2.4.16 + preempt patch [].
    It is the most stable config I ever had using this kernel generation.
    I explain :
    Before, with kernel 2.2.1x I only had "some" preformance issues (mostly disk access related) and what I thought were apm problems (this is a laptop).
    Since I have been using kernel 2.4 I happened to have good times but mostly bad surprises.
    pcmcia (I use the pcmcia-cs package []) is not quite plug'n play (system even hanged once) but symptoms vary from version to version.
    So, the big PROS is that, yes, I boot a much quicker way.
    The CONS is that since the 2.4.6/7, I bitterly regret upgrading this kernel since the functionality I gained was compensated by the new bugs.
    Note that I don't mention the APM because besides the Windowmaker apm applet, I don't even imagine using the suspend/resume on this laptop.
    BTW, when I see the difference with and without the preempt kernel, I wonder why this is not implemented in the official tree (radio button : "server or desktop" ?).
  • by oingoboingo (179159) on Thursday January 17, 2002 @04:05AM (#2853187)
    Interesting what the author was saying about 2.2 versus 2.4 in terms of stability. We have 3 Linux machines which are used quite heavily here at the moment:

    1) A dual PIII-800/Intel 440GX/512MB ECC RAM based server, with a Mylex AcceleRAID 170 adapter, an Adaptec AIC-7896 SCSI adapter, Intel EtherExpress Pro 10/100, and an external 450GB SCSI RAID-5. This box is used for NFS/Samba file serving and an e-mail server for around 100 users.
    It runs kernel 2.2.17

    2) A dual PIII-800/VIA 133 server/1GB PC-133 RAM server, with an Initio A100U2W SCSI adapter, Intel EtherExpress 10/100 and 70GB of external SCSI RAID 1/0. It runs MySQL, Apache, and a collection of internally developed Perl, C and Java server apps, on kernel 2.4.3

    3) A dual PIII-450/Intel 440BX/512MB PC-100 RAM server, with an Adaptec 2940UW adapter, Intel EtherExpress 10/100 and 170GB of external SCSI RAID-5. It is used as a development system, and runs MySQL, Apache, and assorted Perl, C and Java apps, on kernel 2.4.1.

    Systems 2 and 3 have both been up for 197 days as I type this, and would have been up for over 250 days had we not needed to power them down to move them to a new server room.

    System 1 (with the 2.2.17 kernel) has never stayed up for more than 55 days. It hard crashes without anything informative being written to the logs, and obviously required the reset button to be pressed.

    Has anyone got any ideas, given the hardware configs and software running on these machines why 2.2 is so horrendous, yet 2.4 so stable?
    • by WasterDave (20047) <davep&zedkep,com> on Thursday January 17, 2002 @05:14AM (#2853302)
      I know that there certainly were some... ahhh... issues with the Intel 8255x driver for Linux. There was a bunfight a while back when FreeBSD wasn't compatible with the embedded version of the 82559 (82559er), and the suggestion was made that someone look at the Linux driver to see what the command we were missing was. This led to a big stream of mails about how bad Linux's 8255x driver was, see.

      Or something like that.

      Anyway, I'd look at the changelogs for the network driver between 2.2.17 and 2.4.1, you may learn something.

    • NFS and 2.2 (Score:3, Informative)

      by ansible (9585)

      There were some lingering problems with NFS (even v2 using UDP) in the 2.2.x kernel series until 2.2.19.

      I recommend that you upgrade the machine that's running 2.2.17, or else apply the NFS patches. If you're using NFS v3 or TCP, you definitely want to upgrade to the latest version, and get the latest NFS utils.

  • Kernel Panic (Score:4, Insightful)

    by ChaoticCoyote (195677) on Thursday January 17, 2002 @04:07AM (#2853189) Homepage

    Is Linux 2.4 unstable? It depends on your perspective and luck. I'm running 2.4.8 and 2.2.19 (Debian potato) on my systems successfully; 2.8.9 thru .12 have been glitchy for me, especially when it comes to running big jobs that stress the VM. Haven't tried anything above .12 yet; I'm waiting for .18. My old cluster runs 2.2 simply because I have no reason to change.

    Your mileage, of course, may vary.

    I do think that 2.4 has been managed poorly. People complain that Microsoft beta-tests software on thier customers -- yet that is precisely what the kernel team does to Linux users when they release a "stable" kernel with an entirely new VM. A couple months' (weeks'?) testing on developer workstations is not sufficient for an "enterprise" class operating system. Anyone who understands the least bit about complex systems knows that you don't replace critical architecture (the VM) without jeopardizing stability.

    It's all water under the bridge now; I hope Linus and company have learned from the 2.4 battles. If 2.6 has the same kinds of problems and controversies... well, I prefer not to think about it. For my part, I plan to beat 2.5 beta kernels to death, to help the testing along. Testing is as important as kernel hacking -- even if it isn't as sexy.

  • well, my personal experience with the 2.4.x kernel is a good one, i didn't have any problems since my upgrade. i suppose that you can get a stable kernel if you just spend enough time fiddling with the compiler and its options.

    as an electronics student, i wouldn't dare criticizing the kernel programmers: if you ever tried to program a kernel from scratch, you'd know what a damn job that is...

    for all of you interested, there's a great book over at o'reillys, understanding the linux kernel []. it covers the changes from the 2.2 to 2.4 version and explains into every very detail the structures behind all the features you enjoy in you everyday linux life ;-)

    cu, mephinet

  • by JeffL (5070) on Thursday January 17, 2002 @04:11AM (#2853200) Homepage
    I'll start by saying that I find 2.4 to be very stable, and to perform mostly ok on 1 and 2 way machines. My laptop, desktop, and 2-way server stay up until I decide to reboot them. Actually, I brought my 2-way server down for a disk upgrade today for the first time since early November when I installed 2.4.14.

    Having said that, there are some serious issues with 2.4 on some 8-way 8GB machines that I manage. They have been running 2.4.13-ac7 since November, because that is the last kernel that is usable for me (-ac11 would probably be ok). Newer kernels have terrible behavior under the intense IO load these machines go through. They get 14-30 days of uptime, and then hang or get resource starved or something and have to be rebooted.

    I think part of the issue is that there simply aren't that many people running 8-way boxes, so bugs aren't found as easy, this is of course on top of having 8-way SMP being much more complex than a defacto single user, single processor desktop machine. To make it even worse, the machines are pushed hard. They move around GBs of data every day, and often will run for extended periods with loads over 25.

    Of course, it is still mostly ok. While the machines are working they mostly work fine. Of course 20 days of uptime is totally unacceptable. I have an alpha running Tru64 pushing 300 days of uptime, and the last time it was down was due to a drive failure, not an OS problem.

    My only remaining issue with Linux on "small" machines is an oscillation problem in IO. Data will fill up all available memory before being written to disk, and then everything from memory will be written out, and then memory fills up again before anything new is written to disk. This is a bit inefficient, and the machine's responsiveness at the memory-full part of the cycle is poor.

    What are my options though? I guess I could try FreeBSD, but a bit of lurking on their lists and forums reveals plenty of problems there, too. Do I switch and hope things get better, or wait out 2.4 and hope it comes around soon? Aside from a few nasty bugs in some releases, pretty much each successive 2.4 kernel has been better than the previous one, at least on small systems.

    Several years ago I was having a hard lockup problem with Tru64 (Digital Unix, at the time) and that was very scary. It took time to get the problem escalated to the OS engineers, instead of just sending an e-mail to lkm. Even then I could only hope that the issue was being addressed, but I had no way to know if anybody was doing anything about it or not. (Turned out to be an bug in the NFS server that would cause the machine to lockup when serving to AIX.) For all of its problems though, it is extremely reassuring for me to be able to monitor the development process of Linux through the linux-kernel-mailing list, and other specialized lists. If I feel that people aren't aware of some problem I am experiencing, I can raise the issue. I am not in the dark about what is happening, and what fixes are being made. I know what changes have gone into each kernel update, so I know if there is a chance of it fixing my problems.

    • by _johnnyc (111627) <johnnyc@northern ... g minus caffeine> on Thursday January 17, 2002 @11:59AM (#2854626)
      Same here.

      At about the time the 2.4 kernel was first released, we were bulding a server for serving out large media files for encoding. We were on a limited budget, so we put together a PC with about 256 MB RAM running on a K6-2/500. Set it up with a combination of RAID 1 and RAID 5 with 2x40GB and 2x80 GB IDE drives. While running with the stock RH 6.2 kernel we had no problems. But we needed the 2.4 kernel for large files, so we waited until we couldn't wait any longer.

      This turned out to be problematic to say the least. While we had 7 servers running RH 6.2 and never had a crash, the machine serving up the media files would lock up whenever copying large files, or whenever many files were being copied. Kept me working through a few weekends trying the latest kernel and then stress testing the server with large file copies. We wound up reverting back to a 2.2 kernel because the crashes were too frequent.

      I haven't tried the RH kernels for 2.4 on anything other than desktop systems. I can say that, on RH 7.1 at least, the 2.4 kernel in use is rock solid and has never crashed for me at home or on desktop systems at work. I never got the chance to try the kernels on RH 7.1, but I suspect Redhat kernels would probably be more stable. They've got the resources to stress test and modify kernels for specific needs.

      I liked the article. He's not a kernel hacker and writes from his experience of the 2.4 kernel with clients. Only problem I see is WTH was he thinking using Mandrake 8.0 for a server? That version of Mandrake, more than any other I've used, I've found to be very unstable on 2.4.
  • Desktop Myth (Score:5, Insightful)

    by ImaLamer (260199) <> on Thursday January 17, 2002 @04:14AM (#2853209) Homepage Journal
    He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use.

    I don't get it. I use Linux on the desktop. I have to admit that I don't run linux on my main machine. This is only because I've taken my second hard drive out, and put it back into an older machine. [sorry, wine doesn't like Red Alert 2]

    Before I did this though, I ran 2.4 kernels on my desktop. None of the problems I may have had were with the kernel. Problems I had were mainly with certain applications and when I pushed them to their limits. Pan, for instance, crashed a lot on me, but that was because I was downloading gigs per day. A simple Pan upgrade fixed that.

    In my humble opinion, 2.4 is prime for the desktop. Linux is more than ready for the desktop. I know he says it's ready for the desktop, but not ready for high end systems. To me 'high-end' is what you ask of a computer. I've got a 333MHZ running Red Hat 7.2. The computer is running webmin, proftpd, apache, and many mail daemons. I must also mention that SETI runs 24/7, it only has 64 MB of RAM. It never goes down, it never 'crashes', and is up as long as there is power running to it.

    So... it's ready for the desktop? Sure, 2.4.x is prime. All the drivers I've needed supported are there. Even my >$50 webcam.

    The question of 'desktop' use isn't with the kernel though. Desktop users don't patch or compile the kernel... how many times do they do it in *indows or MacOS X? They install complete distributions. IMHO, again, the only thing that keeps Linux off the desktop is easy program install. RPM has killed itself with dependencies, and apt-get is console based. Apt-get is waaay better, and it has worked wonders on my Red Hat machine [apt-rpm]. The problem is not being able to download an app and install it like *indows.

    Solve this and I will sit outside my local computer store and hand out CDs. I don't know about high end systems, but dammit!, desktop users are ready... format that *indows crap and get a real OS!

    Gimme a good apt-get gui... or have the system run apt-get in the background solving dependencies when needed... my g'ma will have it.

    BTW, I just saw a guy on TV and his name is... get this: Joe Householder
  • by edibleplastic (98111) on Thursday January 17, 2002 @04:23AM (#2853235)
    <troll>And in other news, the Associated Press is reporting that Linus Torvalds has sent out a memo to the core Linux development team telling them to make stability their "highest priority". In his memo he called this strategy "Trustworthy Computing", saying that it should not be the case that people have to use previous versions of the OS in order to find a stable working environment.</troll>
  • by Oestergaard (3005) on Thursday January 17, 2002 @04:28AM (#2853246) Homepage
    jakob@unthought ~> uptime
    9:21am up 181 days, 13:25, 3 users, load average: 3.57, 3.33, 2.79

    jakob@unthought ~> uname -a
    Linux 2.4.0-test4 #1 SMP Fri Jul 14 01:56:30 CEST 2000 i686 unknown

    I suppose that ain't too bad. Other than that, with real 2.4 kernels, on UP and SMP systems, I've been fairly satisfied.

    There was a RAID bug (RAID-1) in 2.4.9 or there about, which they forgot in the article. I think, except for the fs/raid corruption problems (which are horrible when they happen), that the 2.4 kernel has been a nice experience.

    Think back for a moment: How would you like *not* to have iptables, reiser, proper software RAID, etc. etc. etc.

    I think I would miss 2.4 if I went back, although the fs/raid corruption bugs made me "almost" do that.
  • I notice that a few people mention they don't have problems with 2.4. I find that true based on certain conditions.

    For home use, I really don't find a lot of problem with 2.4 except minor driver problems. But at work, things are very different. I run a few high load critical servers at work that are still on 2.2, the lab attempt to upgrade 2.4 (at early stage) failed because of lock up and performance issues (yes, some due to VM)

    It was till recently, I tried again with 2.4.16 that I am getting some reasonable results with the 2.4 series. For your information, performance are about the same on 2.4 with my application, I cannot confirm high load stability issue yet as I need more time to test. But initial results tells me 2.4.17 are resonably stable, only one lockup so far (for two weeks).
  • by Anonymous Coward
    The problem is the "release" level kernels usually aren't really ready for release. Most hard-core linux people tend to know this, but those that are coming in from elsewhere expect that a "release" product is, well, ready for release... maybe with some hesitation on a .1, but by .2 or .3 the thing should be good.

    Maybe holding on to "beta" status for a little longer, or having a "unstable", "testing" and "stable" like debian. So that when someone wants the latest stable kernel, they don't end up with something the kernel guys think is stable... till they release the next "stable" version a day later...
  • I've noticed that most of the comments both in the article and others complaining about the 2.4.x kernels and various stability problems are running RedHat [], Mandrake [], and even Debian [] Distros. Hell my best friend even like RedHat best. But that something that we and many Linux people will argue/discuss until their blue in the face. I have used just about all the distibutions from time to time. However I have prefered Slackware since I started with Linux in late 1994. (Boy have things changed since then). What's my point? Many if not most of the problems people are having can be traced to a few reasons:
    1. running a distribution with bad compatiblity between libs,tools,compilers and the kernel (i.e. Redhat 7.0)
    2. upgradings the kernel without regard to upgrading libs,tools,compilers, etc.
    3. upgrading for the purpose of upgrading's sake - no real reason.
    4. Not that much knowledge of what's really goining on in their machine.
    I have set-up enterprise level production servers on Linux for many years and haven't ever <knock on wood - my head > had stability problems. I know of 4 Servers runnig at my most recent employer that have been running 2.4.x Kernels since they were put into service and have never crashed or even hiccuped - over 2 years ago. They are running Oracle, MySQL, Apache with multiple modules, Perl Apps, Samba and NFS filesystems, and other stuff I can't think of right now. Why haven't they had problems -
    1. Good STABLE distribution
    2. conservative upgrades (2.4.1 to 2.4.5 to 2.4.12)
    3. running in test before placed in production
    Eveyone seems to think that with a kernel from the "stable" tree you shouldn't have any problems whatsoever. Keeps in mind that the kernel alone doesn't make the OS the user space tools are also part of it. Therfore running a kernel in the 2.4.x tree and bleeding edge alpha versions of user space software and server daemons != a stable system necessarily. How many people are running a version of Apache in the 1.3.x tree? Well if you are that's a development tree and not necessariliy stable. Yes there are stable versions, but you must test! Also remember to separate device drivers from the kernel. Just because many are distributed with the kernel doesn't make them part of it.

    The other problem I've noticed that started with the 2.4.x tree was the 'ac' or Alan Cox branch. Don't get me wrong I think Alan has contributed meny good thing to the kernel; however, I do think Alan has gotten to feeling a little to self-important to Linux's success. Also keep in mind that he works for Red Hat now and everything I've seen with Red Hat's politics is they want to "control" Linux. So, I say stay away from any '-ac' kernel. But that's just my opinion and I could be wrong.

    As far as the 'new' VM being put into the 2.4.x kernel - I don't completely agree or disagree with it being done when it was but there were various reason to do it then. I was holding up some important things and the kernels wasn't ready for a 2.5.x tree yet. It was a hard decision on Linus's part but not one I'm going to second guess.

    Don't get me wrong. you can get a stable machine with just about any distribution; howver, I have found, from experience, that Slackware has a track record for being the most stable 'out-of-the-box'.

    Also keep in mind that with Winblows you'd be rebooting every 14 to 30 days. Even with 2000 and XP.

    On the other hand I still have one box at home still running 1.3.72 with an uptime of over 3 years. It's running as my router and is my experiment on just how long with a Linux box run without crashing.
    • by schwap (191462) <beauh@schw o o g> on Thursday January 17, 2002 @06:14AM (#2853405) Homepage
      How many people are running a version of Apache in the 1.3.x tree? Well if you are that's a development tree and not necessariliy stable. Yes there are stable versions, but you must test!

      Um... 1.3.x is, indeed, the stable version. From the website:

      The Apache Group is pleased to announce the release of the 1.3.22 version of the Apache HTTP server. Apache 1.3.22 is the best version of Apache currently available.

      2.0.x is the unstable tree at the moment.

  • by johnburton (21870) <> on Thursday January 17, 2002 @05:20AM (#2853310) Homepage
    Some people have already mentioned this in passing, but I think its an important point to make.

    I've found that the kernel is pretty stable for me. I use my system mostly for code development and as a server for files and web pages.

    I find that the kernel itself is pretty stable, although as the article says, it does seem less stable that 2.2 series did. But even so it's not bad for the use I've made of it.

    The old style applications are also very good. The command line tools, and the development tools (gcc etc) are all totally solid and are why linux gained it's early reputation for reliability.

    *BUT* I find that much _new_ software. Both gnome, kde and others GUI software to be terribly unreliable. Say what you like about microsoft outlook but it rarely just crashes. On the other hand, every "modern" mail program I've used on linux tends to end with a crash eventually. And it's not just mail programs. I find that many of the programs I would use tend to crash quite a lot. Not all the time, but just once is too much.

    It's rather sad in my opinion that such a solid base of reliable code is being let down by the stability of some of the more modern software. Frankly it doesn't matter how stable the kernel is if the programs that run on it crash.

    This isn't indended to be a complaint, and I realise that before applications can be considered reliable the kernel needs to be, but it does concern me that the overall reliability of linux systems does seem to be going downwards.
    • So true. I often have to reboot my home machine, because X crashes and locks up the console. The kernel itself is ok, I just can't type anything. On the occasions when I have my laptop available, I can just telnet in and reboot the system cleanly. Unfortunately, most of the time I have to hard-reset it and experience the joy of fsck.
  • by ameoba (173803)
    What I find truley ironic about the 2.4 series is that, before it was released, there was a big delay because they wanted it to be perfect. It was held up and held up, thinking that the fate of 2.4 was the fate of Linux, it was going to be the kernel that everyone was watching.

    What do we get? Stability problems, kernels with DO NOT USE warnings, massive changes to the core of the OS, the list goes on. All on what was supposed to be the flawless kernel that proved the worth of Linux to the masses.
  • I have used 2.4.* since the very beginning on my firewall (to get IP-tables). arch/i386/kernel/bluesmoke.c is broken for my IBM PC Server, Dual P90 since 2.4.5 or something but just replacing with the old version works fine.

    I am running the entire system in RAM (loading a 96 Mb almost empty ramdisk from floppy, populating the filesystem via rcS from /dev/hda and then shutting down the hard drive). I had problems with this in 2.2 and early 2.4 (perhaps 2.4.10) because for some reason double amount of RAM was required. Allocating 20 Mb of RAMdisk cost me 20 Mb of ram, but when I filled the filesystem another 20 Mb was used. sync did not help. Unmounting and remounting helped... but it was my root filesystem and no good solution.

    If you run old hardware, puts little load on it, and patches the kernel before you compile it you will be quite satisfied with 2.4.* ;)

    Oh, btw. I run 2.4.* on a laptop (with more than enough RAM). 2.4.* has been stable and satisfactory since the first release, for me.
  • by markj02 (544487) on Thursday January 17, 2002 @06:36AM (#2853450)
    Let me first say that, despite the problems I mention below, I really appreciate the work that has been going into the Linux kernel. Once you get it configured and compiled, it's a reliable and powerful system. But if the kernel is too hard to configure and compile, that severely limits how widely Linux can get adopted.

    Now, what problems am I talking about? The latest 2.4 kernels still have compilation problems in some drivers (2.4.17 has problems in USB, 2.4.18pre4 has problems in one of the sound drivers). Important and mature packages like MOSIX require patching the kernel and aren't integrated into the kernel. Many hardware setups require recompiling the kernel and experimenting endlessly. Every time you recompile the kernel, you need to recompile some kernel modules. Dependencies and recompilation aren't working correctly--some things don't recompile when they should, and lots of things recompile over and over and over again. The kernel itself is a 30Mbyte download. And the list of problems goes on and on.

    People seem to have gotten used to it and think there is nothing wrong. The kernel hackers keep telling us that C and make are just great tools for building kernels. But as a user and sometime driver hacker, I think the kernel is falling apart under its own weight. This is not a system I can recommend to non-technical users--commercial distributions can't cover all the possible kernel configurations (even with fully modularized kernels), and recompilation is out of the question for many users. What is needed?

    • It must be possible to write drivers and other kernel modules that can be compiled separately from the kernel and work across many versions. Binary modules really should keep working across minor version number changes (2.2 to 2.4, for example).
    • As a consequence, it should be possible to package bits and pieces of the kernel separately. If I want the ACPI module, I should be able to install that with "apt-get install kernel-2.x-module-acpi". I should be able to download RPM packages from and install them on (at least) any 2.4 kernel without recompiling anything.
    • It must be possible to write kernel modules with more safety in mind. There should also be some way to apply some memory protection to kernel modules when desired.
    • The build system needs to get fixed. There is no reason why adding or removing a module should result in a recompilation of the whole kernel. Maybe it's time to get rid of "make" altogether for the kernel.
    • The configuration system needs to get fixed. The kinds of questions it asks right now just cannot be answered by a normal user. In fact, there really shouldn't be much of any configuration: all the different options should be dynamically loadable. Yes, this even means MMX-optimized versions of some piece of code or other. And most of the drivers and file systems should be distributed in completely separate source packages, independent of the kernel. (The new configuration system treats the symptom but not the root case.)

    I think, ultimately, if the kernel wants to survive and be able to keep up with the world, it needs some kind of more flexible dynamic binding of functions at runtime. It also must allow people to start writing kernel components in languages other than C, foremost C++. No, C++ isn't the epitome of good language design, and, yes, people can write even more horrible code in C++ than in C, but C++ can really help with safety, security, resource management, and modularity.

    If those things don't happen, I think the Linux kernel will simply fall so far behind that it will get replaced by something else. And that would be a shame because the Linux kernel actually does have a lot of useful functionality, and once compiled and configured, works very well.

    • It sounds a lot like you've described a microkernel. Linux is decidedly not microkernel-esque, though at times it might be hard to tell.

      Being monolithic is pretty much what makes the Linux kernel the Linux kernel.

      If you are looking for Microkernel-esque UNIX, look at HURD.
      • by leandrod (17766)
        Yeah, you nailed it down.

        Now remember that the whole point of microkernels is to enable flexibility -- not only for development's sake but also to be able to adapt to different loads and usage characteristics.

        So the HURD seems to be the answer. That or Linus (or someone else, or a group of kernel hackers) over a reasonable amount of time manages to get better at (1) understanding modifications to Linux and their consequences and (2) based on this understanding only release "stable" kernels once they're done.

        Given the complexity of the task, I doubt it's doable. The free flavours of BSD never scaled as far as GNU/Linux; the proprietay Unices have basically choosen to scale up and up, forgetting the small systems situations and accepting bloatedness in order to cater for stability, resilience and other high-iron stuff. Even single-server microkernels like Windows NT and Apple Darwin haven't much to offer, being bloated, slow and not flexible at all.

        We still have a free software and open systems advantage, because POSIX OSs can cover the whole gamut of computing systems simply by having different kernels with the same APIs; standard bodies and GNU libc are the real heros here, not Linux or BSD. Proprietary software usually will be even more fragmented, with all those slightly incompatible, underperforming, unstable versions of Microsoft Windows, Mac OS Classic and so on. But still it would be nice to have a free, copylefted common kernel. Unless the Linux situation improves dramatically soon, the only answer in the horizon is the Hurd -- and it still needs to be finished.
    • I think your message is highly misinformed and borders on trolling. Maybe you're just new.

      Many hardware setups require recompiling the kernel and experimenting endlessly.
      This is true. On machines with really exotic hardware, I have had to recompile a great many kernel configurations. Usually, however, I can just rmmod & insmod to test the new configurations without rebooting, so the experimenting phase is not overlong.
      Every time you recompile the kernel, you need to recompile some kernel modules.
      You are in no way forced to compile anything as a module -- the kernel will live quite happily as a solitary elf executable. So don't tell me 'every time'.
      Dependencies and recompilation aren't working correctly--some things don't recompile when they should, and lots of things recompile over and over and over again.
      That's possible anywhere, and I have seen little evidence for your recompiliation loop. It has been some time since I have last seen an incorrect dependency in the kernel build. And on an average uniprocessor machine, my full builds complete in under two minutes. So I'm not crying for time.
      The kernel itself is a 30Mbyte download.
      Cry me a river. Get DSL. Or learn to use the patch command -- that's why all those patch files are on the kernel mirrors. I've been pulling kernel sources off a 33k modem link for the last 6 months, and I'm not hurting for the speed.
      And the list of problems goes on and on.
      All of which are apparently handwaving. Let's watch.
      The kernel hackers keep telling us that C and make are just great tools for building kernels.
      I agree with you that make sucks. Unfortunately, it still sucks less than almost everything on the field. Please suggest an alternative. I also agree that C sucks. OTOH, C++ sucks even harder, and for its extra demands of space and time and its ability to obfuscate, C++ doesn't deliver any of the benefits that a real language (like LISP) does. C++ has been out for 20 years, and it still hasn't superseded C in close-to-the-metal progging. Figure it out.
      This is not a system I can recommend to non-technical users--commercial distributions can't cover all the possible kernel configurations (even with fully modularized kernels), and recompilation is out of the question for many users.
      I have to agree with you on that, but recent kernels are pretty complete -- most users won't need to recompile.
      It must be possible to write drivers and other kernel modules that can be compiled separately from the kernel and work across many versions. Binary modules really should keep working across minor version number changes (2.2 to 2.4, for example).
      You can do that. Say yes to 'attach version information to modules' in the kernel config.
      It must be possible to write kernel modules with more safety in mind. There should also be some way to apply some memory protection to kernel modules when desired.
      I agree with you, but that's pretty far off. The MIT exokernel is I think the shining example of what you are looking for. In the meantime, most people get the same effect by running your theoretic modules outside of the kernel, in daemons or shared libs or something. The user/kernel protections are usually enough.
      The build system needs to get fixed. There is no reason why adding or removing a module should result in a recompilation of the whole kernel. Maybe it's time to get rid of "make" altogether for the kernel.
      There *is* no reason to recompile the whole kernel to add a module. What are you smoking? "make modules","cd to blah","cp blah.o /lib/modules/x.y.z/","depmod". Or just "make modules; make modules_install". As for 'getting rid of make', what would you use to replace it?

      I saved this one for last:

      Important and mature packages like MOSIX require patching the kernel and aren't integrated into the kernel.
      You see, that's what we call not in the linus kernel. Your impressions of importance and maturity of the patch are really something you should take up with Linus himself. I, for one, wish Ingo's TUX subsystem makes it into the linus tree sometime soon. But you have no basis to say that just b/c a kernel patch is out, and linus hasn't integrated it into his stable tree, the linux process is flawed. Get a clue! Independent patches come out much faster than anyone can pull them into the core; they are usually conflictive and compete with other patches to solve the same problem. So it takes a while. If you want it in the linus tree sooner, help out. Welcome to open source.
  • I think kernel 2.4 has what I always dreamt of on my linux firewall: Stateful firewalling [] and NAT. It is great for building inexpensive firewalls that can be as good as those costing grands.

    Also, the VM system is much improved, when compared to the 2.2.
    The only thing I think was a little too risky was replacing the entire VM (originally built by Rik van Riel) with a new one, by Andrea Arcangeli. I believe such dratic changes should be reserved for developmente kernels. But the important thing is that now it's working wonderfully, and is much improved.

    I don't think 2.4. should be called the Kernel of Pain. We're what, in 2.4.17 ? Remember 2.2.17 or 2.0.17 ? Heck, 2.0 had DoS bugs till release 2.0.35.

    I am running 2.4 on some production boxes. They're behaving fine and very stable, thank you, and I think 2.4 is ready for production.
  • by Spunk (83964) <> on Thursday January 17, 2002 @06:50AM (#2853472) Homepage
    Linux isnt ready for the -


    Wait a minute...
  • The RAID-0 in 2.4 has so far been between -18% and 0% faster over the raw hdd speeds for those same md partitions on my system. The 2.2 kernels were giving me about 30% speed increase.

    PS, notice thats negative 18%, I say negative 18% faster, since RAID-0 is supposed to give a speed increase above all else.
  • I guess the following makes me unusual for a Linux user--I value stability quite highly, innovation is just a nice-to-have.

    I'm happily running Debian stable (potato) with a 2.2.18pre21 kernel. I'm only vaguely aware that some 2.4 kernel features could be nice to have. The main attraction is iptables, which of course wouldn't be any use on my desktop and laptop machines.

    I'm about to play around with 2.4.17 on my current firewall server using Adrian Bunk's 2.4 kernel package, for that very reason. Would it be wiser to hold off? ipchains does an adequate job under 2.2, after all, and I'm perfectly happy with 2.2 on the desktop machines until Woody (the Debian candidate in testing) is moved to stable, by which time all the major gotchas will have been ironed out.

  • Dude.. (Score:3, Funny)

    by OpCode42 (253084) on Thursday January 17, 2002 @08:05AM (#2853587) Homepage
    Dude, where's my stability?
    Where's your stability dude?
    Dude! Where's my stability?
    Where's your stability dude?
    Oh! There it is! (points to linux-2.2.20.tar.gz)

  • by rsd (194962) on Thursday January 17, 2002 @08:11AM (#2853594) Homepage
    IMHO, the real problem is the stock kernel.

    It is commom to see that the stock kernel has lots
    of missing patchs to increase stability and as pointed out by
    Rik van Riel [] which was posted []
    here in slashdot, Linus rejects
    random patchs which cause some areas of the kernel to not be "as good as it should".

    The VM is one part which Linus just got random
    patchs from Riel and rejected some of them randomically which made the VM suck so hard in
    earlier stock 2.4 kernels.

    OTOH, kernels shiped from distributions includes
    (at least it should) the missing parts and should
    be better than the stock kernel from .

    I don't use Mandrake to tell how good their
    kernel is or is not. But I use
    Conectiva Linux and I know how good their kernel package is.
    Their kernel includes missing fixes that do not get over the stock kernel.
    Better of all, their kernel maintainer is
    Marcelo Tosati
    who maintains the stable kernel tree now.

    I think that we will see an improvement into new
    2.4 releases.

    The latest 2.4.17 kernels from Conectiva can be found in here [] .
  • by Tack (4642) on Thursday January 17, 2002 @09:17AM (#2853745) Homepage
    I had been running 2.4 on our router for many months. That's not to say those months were consecutive running. :) I had so many problems. I reached the point where I had to reboot the box at least once a week (usually twice) or else it would suddenly become unresponsive. If I had an uptime over 10 days I was doing REALLY well. I tried about 10 different 2.4 kernels (up to 2.4.13), as well as RedHat's 2.4.7 kernel. (I was forced to use 2.4 because of features I required.) At any rate, after about 6-8 months of this, I was resigned to putting either freeBSD on the router or recommending we buy a hardware solution next fiscal year (i.e. cisco router).

    Well, I put 2.4.14 on the box and I haven't rebooted since. I have 61 days of uptime and that's the most I've seen on that box ever. It is finally stable. The only thing I can conclude is that it's AA's VM that is doing the trick. And in hindsight, it makes sense. The behaviour of the box was that it was thrashing, but at the time it didn't seem that way because I hadn't noticed the HDD light was disconnected from the box and I couldn't hear the disk in the noisy server room.

    So, Linux 2.4 is (knock on wood) stable for my servers, now.

  • STABLE vs STABLE (Score:5, Interesting)

    by ajs (35943) <ajs&ajs,com> on Thursday January 17, 2002 @10:27AM (#2854056) Homepage Journal
    I'm beginning to feel like a broken record, and maybe Linus should just change the terminology so that people stop making the same assumption over and over, but people: wake up and smell the bogomips!

    "Stable", in the context of a kernel release refers to the interfaces. When Linus releases 2.&ltleven>.0, he is saying that this kernel is one that has reached some arbitrary plateu of development stability, and it's now ready for others to begin actuall release engineering on.

    You have to understand that the Linux Kernel is released by Linus in a state that is very reasonable for a development team, but that will never be "production quality". Debian puts a lot of realease engineering work into a Kernel. As does Red Hat. As does SuSe, etc, etc.

    If you just grab 2.4.x and install it, you're acting as Linux Q/A, and I applaud your effort, but when it breaks in your environment, you should not be stunned.

    Once again, production release != stable release. A stable release is just one the developers are happy with (and I've yet to see a 2.4 kernel that I can say developers should not be happy with).

    So, maybe next time, 2.6.0 should be called the "post-development" release so that people don't go off half-cocked installing it on production systems.
  • by aphor (99965) on Thursday January 17, 2002 @10:53AM (#2854178) Journal
    If the code is unreadable for most advanced users (people with enough C background to follow execution flow in readable code), then you really can't get the benefit of having a large community debugging the software.

    What you have here is a LARGE body of (l)users, and a small cadre of kernel hackers who are separated and out of communication. The three times I've found a problem in FreeBSD-STABLE (to be honest, I think one was the 2.2.6, and two were 3.x branches), I sent my bug report in with instructions on how to repeat the problem, and a patch for the ugly hack I made to make the problem go away. I never beat the committers to the bug. Now, before I do all the work, I watch the WebCVS for commits for a couple of days and email the committer who touched the effective files last with a quick question "hey, I see this problem.. have you?"

    I tried to read some Linux (the kernel) code a long while ago. There were some funny comments, error messages, and variable identifiers, but otherwise it gave me a headache. I just felt being a Linux participant was beyond my tolerance. I browsed the FreeBSD source code though, and even in the midst of reorganization, both organising schemes were apparent, well documented, and there is a clean style to all FreeBSD code I've seen that makes for (relatively) easy reading.

    I guess that the difference is culture, but in this case it seems to be a serious problem. I have to wonder why the Linux kernel people haven't broken the Linux VM code out for modularity and borrowed the FreeBSD VM as an option? I mean: FreeBSD is free-as-in-beer and also free-as-in-software. I'm not volunteering, but after all these years wouldn't it make sense to hack out some Linux wrappers for the FreeBSD VM system? I think I remember reading a Matthew Dillon interview where he talks about all the good stuff he's done to FreeBSD VM recently...
  • Why 2.4 was released (Score:3, Interesting)

    by EvlG (24576) on Thursday January 17, 2002 @11:05AM (#2854248)
    If my memory serves me correctly, one reason to release 2.4 a year ago is to get more people to use the damn thing.

    There is a relatively small number of people that use the odd-numbered, experimental kernels. At the end of 2000, it was becoming clear that having the same people running the development kernels on their hardware wasn't fixing many more bugs - I remember a post from Linus to LKML to that effect.

    2.4 was stable enough for mass consumption, and so it was released. However, it is important to remember that this is free software, and frequent incremental updates are the rule. Free software can't work if it is not constantly evaluated by users, bugs reported and fixed, and new versions shipped out.

    Software is an evolutionary process; it is important to remember that free software (especially the Linux kernel) fully embraces this notion.
  • He Almost Had Me (Score:5, Interesting)

    by bwt (68845) on Thursday January 17, 2002 @11:32AM (#2854424) Homepage
    I almost took this guy seriously until this part:

    The kernel seemed to show more stability. Then we hit kernel 2.4.15.

    Linux version 2.4.15 contained a bug that was arguably worse than the VM bug. Essentially, if you unmounted a file system via reboot -- or any another common method -- you would get filesystem corruption. A fix, called kernel 2.4.16, was released 24 hours later.

    Look, anybody who is deploying a kernel on the day it is released on a production server deserves what they get. One day turnaround on a bug fix is phenomenal. Even if these are marked as "stable" kernels, trying to track the new versions in real time is a dumb thing to do.

    This guy has written a moan and groan article based on a small set of bugs, some of which he only could have experienced if he is experimenting on his production system. He obviously requires extreme stability and says he needs this over the new 2.4 features (SMP, 2G memory, 2G files), which makes me ask: why was he putting new kernels on his production system before emperical evidence was there about high stability?

    Open source will fix bugs faster the proprietary. It doesn't change reality to make bugs impossible. This is true even in "stable" releases, especially if you are talking about highly stressful production environments.
  • hmm.. (Score:3, Insightful)

    by talks_to_birds (2488) on Thursday January 17, 2002 @02:54PM (#2856278) Homepage Journal
    Me thinks there's one faint, common thread here:

    • "...upgrading a customer's server from Red Hat 6.2 to Mandrake 8.0..."

      "...when I upgraded from 2.2 to 2.4 (Mandrake 7.2 to 8.1), I had (still have) many stability problems..."

      "...I don't know what compelled Joshua to choose Mandrake, whose "bleeding-edgeness" usually keeps them a bit unstable and unpolished..."

      "...He's using MANDRAKE on a SERVER. For crying out loud, you don't use Mandrake on a server. Get something realistic like Slackware or Debian, and if you want to be a idiot use redhat, not Mandrake..."

      "...First of all, who would used Mandrake for a server. We are talking about an installation that is meant for a laptop environment..."

      "...i was running a 2.4 on mandrake 8.0 and had nothing but problems...."

      "...I've noticed that most of the comments both in the article and others complaining about the 2.4.x kernels and various stability problems are running RedHat [], Mandrake [], and even Debian [] Distros..."


    Perhaps we have a Mandrake issue, here, and not really a 2.4.x kernel issue, and certainly not, as the few M$ trolls have tried to suggest, a Linux issue...


    Food for thought.


  • by kindbud (90044) on Thursday January 17, 2002 @04:03PM (#2857006) Homepage
    It was the kernel of fire... the kernel of destruction... the kernel that took back what was ours. It was the kernel of rebirth... the kernel of great sadness... the kernel of pain... and the kernel of joy. It was a new age. It was the end of history. It was the kernel where everything changed. The year is 2001. The version: Linux 2.4.5
    Cue martial music

Simplicity does not precede complexity, but follows it.