2.4, The Kernel of Pain 730
Joshua Drake has written an article for LinuxWorld.com called
The Kernel of Pain.
He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?
Well, from my point of view... (Score:2, Interesting)
--Josh
Au contraire (Score:3, Interesting)
I'll tell ya, I tried the preemptive patches, and all the -ac stuff naturally, and well, the desktop just isn't snappy ... I mean, Windows (follow me here) just feels better. I don't need a force feedback mouse or anything, it just doesn't not show me that it is rendering a window... and that's something that Gnome was doing even on 450 mhz machine.
Also, even with the preemptive patches, I could hold down a key in, say, star office or abi word, and it would stutter! Hold down the arrow key, and it stutters.
These are basic inferace issues that could use some due attention before Linux is ever ready for the desktop.
2.4 running just fine here (Score:2, Interesting)
Yes, the emperor has no clothes! (Score:3, Interesting)
As a sysadmin, I have to state that the 2.4 kernels have ruined whatever reputation that existed before about the 2.2n series kernels being stable. Atleast in the 2.0 and the 2.2 series, you had islands of stability where really careful distributions could pick a kernel version as their default kernel. One of the main problems with Debian not finalizing a 2.4 kernel has been due to the fact that there hasnt been an island of stability so far in the 2.4 series.
And I've been waiting a long time now. The early 2.4 series didnt really work out on my SMP servers. The 2.4.6 onwards kernels broke Tulip support for me. Then came the VM switch. Then just when I decide, ok, 2.4.16 seems stable enough, we have the OOM problem. And I also keep hearing statements being made about the new VM being more friendly to desktop systems than servers.....
Now if only 2.2 offered iptables.....
2.4 is hit and miss. (Score:5, Interesting)
On my desktop machine, I've taken more risks (installed pretty much every official 2.4.x-linus release as they have come out) and some have been good, while others have been total dogs.
I'm running 2.4.17 right now. It seems okay; I've only had a freeze-up once over the last couple of weeks, though it was a total hard freeze (i.e. no ping, no magic SysRq, no nothing), which I haven't had in Linux for several years.
The obvious issue is VM; if you keep lots of memory (768M, or preferably 1.0G+) in your system, things to much more smoothly, though MP3 playback still skips a little.
Right now, I'd prefer some work on the RAID and IDE performance issues. One or two of the 2.4 series have had disk performance 100%+ better than the current 2.4 kernels. Why? I'd like to get the disk I/O back to reasonable levels.
Re:Au contraire (Score:5, Interesting)
The preemptive patches have made my system a lot more responsive under use. Most notably the mouse cursor doesn't slow down during heavy compiles and audio latency is good enough to play with some of the more interesting sound software projects out for linux.
But it really sounds like your problem isn't with linux but with XFree86. X has its share of problems but if you have a good video card that's supported well under it, you should get more than acceptible 2d drawing performance. I use a 3dfx voodoo3 here and its about as good as win2k running KDE (sometimes you can see it rendering when resizing or moving windows quickly but i like to think of it as a cool effect ;) and its way faster with lighter WM's like blackbox.
Worked for me. (Score:5, Interesting)
Of course, whenever I'm playing around with this stuff I don't delete my "last known good" kernel, so if after a couple hours or a couple days I noticed a problem, I just booted back to what worked. The default (albeit heavily patched) Red Hat kernels were good, so "last known good" always existed for me.
To summarize: this hasn't been a source of inconvenience for me, but it has been one of vicarious embarrassment. I've only been using Linux since 2.0.somehighnumber, but this is the worst mess I've seen the "stable" kernel tree go through in that time. Don't get me wrong, I've experienced system-crashing bugs (a tulip driver that freaked at some tulip chipset clones, some really bad OOM behavior a couple years ago) before, and pragmatically I guess that's worse... but those problems were always fixed fast enough that the patches predated my bug reports. Watching even the top kernel developers seem to flounder for months over bugs in a core part of the OS like the virtual memory system just sucked.
2.4.16 + preempt (Score:3, Interesting)
It is the most stable config I ever had using this kernel generation.
I explain :
Before, with kernel 2.2.1x I only had "some" preformance issues (mostly disk access related) and what I thought were apm problems (this is a laptop).
Since I have been using kernel 2.4 I happened to have good times but mostly bad surprises.
pcmcia (I use the pcmcia-cs package [sourceforge.net]) is not quite plug'n play (system even hanged once) but symptoms vary from version to version.
So, the big PROS is that, yes, I boot a much quicker way.
The CONS is that since the 2.4.6/7, I bitterly regret upgrading this kernel since the functionality I gained was compensated by the new bugs.
Note that I don't mention the APM because besides the Windowmaker apm applet, I don't even imagine using the suspend/resume on this laptop.
BTW, when I see the difference with and without the preempt kernel, I wonder why this is not implemented in the official tree (radio button : "server or desktop" ?).
Unfortunately I have to agree (Score:5, Interesting)
Having said that, there are some serious issues with 2.4 on some 8-way 8GB machines that I manage. They have been running 2.4.13-ac7 since November, because that is the last kernel that is usable for me (-ac11 would probably be ok). Newer kernels have terrible behavior under the intense IO load these machines go through. They get 14-30 days of uptime, and then hang or get resource starved or something and have to be rebooted.
I think part of the issue is that there simply aren't that many people running 8-way boxes, so bugs aren't found as easy, this is of course on top of having 8-way SMP being much more complex than a defacto single user, single processor desktop machine. To make it even worse, the machines are pushed hard. They move around GBs of data every day, and often will run for extended periods with loads over 25.
Of course, it is still mostly ok. While the machines are working they mostly work fine. Of course 20 days of uptime is totally unacceptable. I have an alpha running Tru64 pushing 300 days of uptime, and the last time it was down was due to a drive failure, not an OS problem.
My only remaining issue with Linux on "small" machines is an oscillation problem in IO. Data will fill up all available memory before being written to disk, and then everything from memory will be written out, and then memory fills up again before anything new is written to disk. This is a bit inefficient, and the machine's responsiveness at the memory-full part of the cycle is poor.
What are my options though? I guess I could try FreeBSD, but a bit of lurking on their lists and forums reveals plenty of problems there, too. Do I switch and hope things get better, or wait out 2.4 and hope it comes around soon? Aside from a few nasty bugs in some releases, pretty much each successive 2.4 kernel has been better than the previous one, at least on small systems.
Several years ago I was having a hard lockup problem with Tru64 (Digital Unix, at the time) and that was very scary. It took time to get the problem escalated to the OS engineers, instead of just sending an e-mail to lkm. Even then I could only hope that the issue was being addressed, but I had no way to know if anybody was doing anything about it or not. (Turned out to be an bug in the NFS server that would cause the machine to lockup when serving to AIX.) For all of its problems though, it is extremely reassuring for me to be able to monitor the development process of Linux through the linux-kernel-mailing list, and other specialized lists. If I feel that people aren't aware of some problem I am experiencing, I can raise the issue. I am not in the dark about what is happening, and what fixes are being made. I know what changes have gone into each kernel update, so I know if there is a chance of it fixing my problems.
Let's call it a curiosity (Score:4, Interesting)
9:21am up 181 days, 13:25, 3 users, load average: 3.57, 3.33, 2.79
jakob@unthought ~> uname -a
Linux unthought.net 2.4.0-test4 #1 SMP Fri Jul 14 01:56:30 CEST 2000 i686 unknown
I suppose that ain't too bad. Other than that, with real 2.4 kernels, on UP and SMP systems, I've been fairly satisfied.
There was a RAID bug (RAID-1) in 2.4.9 or there about, which they forgot in the article. I think, except for the fs/raid corruption problems (which are horrible when they happen), that the 2.4 kernel has been a nice experience.
Think back for a moment: How would you like *not* to have iptables, reiser, proper software RAID, etc. etc. etc.
I think I would miss 2.4 if I went back, although the fs/raid corruption bugs made me "almost" do that.
Needs "unstable", "testing", "stable" or something (Score:2, Interesting)
Maybe holding on to "beta" status for a little longer, or having a "unstable", "testing" and "stable" like debian. So that when someone wants the latest stable kernel, they don't end up with something the kernel guys think is stable... till they release the next "stable" version a day later...
Kernel is ok, biggest problem is the applications (Score:3, Interesting)
I've found that the kernel is pretty stable for me. I use my system mostly for code development and as a server for files and web pages.
I find that the kernel itself is pretty stable, although as the article says, it does seem less stable that 2.2 series did. But even so it's not bad for the use I've made of it.
The old style applications are also very good. The command line tools, and the development tools (gcc etc) are all totally solid and are why linux gained it's early reputation for reliability.
*BUT* I find that much _new_ software. Both gnome, kde and others GUI software to be terribly unreliable. Say what you like about microsoft outlook but it rarely just crashes. On the other hand, every "modern" mail program I've used on linux tends to end with a crash eventually. And it's not just mail programs. I find that many of the programs I would use tend to crash quite a lot. Not all the time, but just once is too much.
It's rather sad in my opinion that such a solid base of reliable code is being let down by the stability of some of the more modern software. Frankly it doesn't matter how stable the kernel is if the programs that run on it crash.
This isn't indended to be a complaint, and I realise that before applications can be considered reliable the kernel needs to be, but it does concern me that the overall reliability of linux systems does seem to be going downwards.
Re:Kernel too big? (Score:2, Interesting)
This is my opinion:
optional hardware, devices, peripherals -> modules
hardware found on most x86 machines -> built in
Re:Au contraire (Score:3, Interesting)
Also, the problem wasn't that the system was slow, but that when you had many active processes, the system would respond very poorly or lock up.
Cryptnotic
Port the FreeBSD VM (Score:1, Interesting)
Re:Unfortunately I have to agree (Score:2, Interesting)
That's unfortunatly the point. 2.4 is stable for desktop use, but obviously it's the 8-processors+RAID heavy use which is problematic. But honestly I must say I'm not surprised... Having done a bit of kernel developping, I always wondered how developpers managed to get SMP right (with spin locks etc...), when most of them probably don't have SMP machines AND considering that fine-grained SMP locking was added as a backthought. Actually when I wrote my little personnal kernel module, I was amazed at how many things could go wrong in SMP, so much that my envy of dual Athlon motherboards is now close to zero.
I think Linux is now meeting the exact same problem as Windows NT+ kernel, i.e. fine grained SMP is hell.
Re:large system problems (Score:3, Interesting)
Now remember that the whole point of microkernels is to enable flexibility -- not only for development's sake but also to be able to adapt to different loads and usage characteristics.
So the HURD seems to be the answer. That or Linus (or someone else, or a group of kernel hackers) over a reasonable amount of time manages to get better at (1) understanding modifications to Linux and their consequences and (2) based on this understanding only release "stable" kernels once they're done.
Given the complexity of the task, I doubt it's doable. The free flavours of BSD never scaled as far as GNU/Linux; the proprietay Unices have basically choosen to scale up and up, forgetting the small systems situations and accepting bloatedness in order to cater for stability, resilience and other high-iron stuff. Even single-server microkernels like Windows NT and Apple Darwin haven't much to offer, being bloated, slow and not flexible at all.
We still have a free software and open systems advantage, because POSIX OSs can cover the whole gamut of computing systems simply by having different kernels with the same APIs; standard bodies and GNU libc are the real heros here, not Linux or BSD. Proprietary software usually will be even more fragmented, with all those slightly incompatible, underperforming, unstable versions of Microsoft Windows, Mac OS Classic and so on. But still it would be nice to have a free, copylefted common kernel. Unless the Linux situation improves dramatically soon, the only answer in the horizon is the Hurd -- and it still needs to be finished.
Kernel series 2.5 (Score:2, Interesting)
Maybe even number kernels should only be declared stable when they have an experimental branch
Saying that, I've not had many problems with 2.4, bar terribly IDE performance on my hpt370 controller.
My 2.4 issues. (Score:1, Interesting)
The problem I had with 2.4 was just that, it crashed for me. I managed roughly 6 linux servers for an ISP all running 2.2. Would I even consider sticking 2.4 on those after what I witnessed with my personal machine? No. But now things are much different now. My box is flawless, and I've since upgraded those boxes that had 2.2 to 2.4.
-reid
My hardware is: ASUS P3B - PIII-550 (slot 1), 2 x 256mb Hitachi RAM, Nvidia TNT2, Maxtor HDs.
Re:Well, from my point of view... (Score:3, Interesting)
We've been using it... (Score:4, Interesting)
We ran into some trouble with a number of Athlon systems but that was due to the 'Athlon bug' and was soon fixed. More worrisome was the performance of pre-2.4.9 kernels on the desktop: sometimes they slowed down to a crawl (and i'm talking about lightly loaded ~750MHz machines here).
We got over that with the -ac kernels however, and it's been a breeze ever since. We currently use 2.4.14 with XFS patched in (although we're ditching it in favor of ext3 now that it's been integrated and the RH installer supports it) and we're looking at 2.4.17 now.
Why use 2.4 on servers (as some have asked)? Well, iptables is a good reason, for one. Other security-related things count heavily too. And XFS seemed a good reason to do it at the time too. It can deliver very good performance.
Some stats:
zuse [1] > uname -a
Linux zuse 2.4.14-xfs_MI10 #1 Tue Nov 6 17:34:04 MET 2001 i686 unknown
zuse [2] > uptime
2:25pm up 61 days, 21:21, 1 user, load average: 1.07, 1.02, 0.93
I'll throw my two cents (Score:2, Interesting)
Re:Oh, stop it! (Score:3, Interesting)
However, you are absolutely right that Mandrake focuses on ease of use and new desktop-like features while Red Hat focuses on stability at the expense of "coolness."
Both can be frustrating if they're used in the "wrong" places.
Re:Why Linux? (Score:5, Interesting)
At least for nVidia, it is being worked on: FreeBSD NVIDIA Driver Initiative [netexplorer.org]
STABLE vs STABLE (Score:5, Interesting)
"Stable", in the context of a kernel release refers to the interfaces. When Linus releases 2.<leven>.0, he is saying that this kernel is one that has reached some arbitrary plateu of development stability, and it's now ready for others to begin actuall release engineering on.
You have to understand that the Linux Kernel is released by Linus in a state that is very reasonable for a development team, but that will never be "production quality". Debian puts a lot of realease engineering work into a Kernel. As does Red Hat. As does SuSe, etc, etc.
If you just grab 2.4.x and install it, you're acting as Linux Q/A, and I applaud your effort, but when it breaks in your environment, you should not be stunned.
Once again, production release != stable release. A stable release is just one the developers are happy with (and I've yet to see a 2.4 kernel that I can say developers should not be happy with).
So, maybe next time, 2.6.0 should be called the "post-development" release so that people don't go off half-cocked installing it on production systems.
FreeBSD VM, sync(3) OK for 10 YEARS. (Score:4, Interesting)
What you have here is a LARGE body of (l)users, and a small cadre of kernel hackers who are separated and out of communication. The three times I've found a problem in FreeBSD-STABLE (to be honest, I think one was the 2.2.6, and two were 3.x branches), I sent my bug report in with instructions on how to repeat the problem, and a patch for the ugly hack I made to make the problem go away. I never beat the committers to the bug. Now, before I do all the work, I watch the WebCVS for commits for a couple of days and email the committer who touched the effective files last with a quick question "hey, I see this problem.. have you?"
I tried to read some Linux (the kernel) code a long while ago. There were some funny comments, error messages, and variable identifiers, but otherwise it gave me a headache. I just felt being a Linux participant was beyond my tolerance. I browsed the FreeBSD source code though, and even in the midst of reorganization, both organising schemes were apparent, well documented, and there is a clean style to all FreeBSD code I've seen that makes for (relatively) easy reading.
I guess that the difference is culture, but in this case it seems to be a serious problem. I have to wonder why the Linux kernel people haven't broken the Linux VM code out for modularity and borrowed the FreeBSD VM as an option? I mean: FreeBSD is free-as-in-beer and also free-as-in-software. I'm not volunteering, but after all these years wouldn't it make sense to hack out some Linux wrappers for the FreeBSD VM system? I think I remember reading a Matthew Dillon interview where he talks about all the good stuff he's done to FreeBSD VM recently...
Why 2.4 was released (Score:3, Interesting)
There is a relatively small number of people that use the odd-numbered, experimental kernels. At the end of 2000, it was becoming clear that having the same people running the development kernels on their hardware wasn't fixing many more bugs - I remember a post from Linus to LKML to that effect.
2.4 was stable enough for mass consumption, and so it was released. However, it is important to remember that this is free software, and frequent incremental updates are the rule. Free software can't work if it is not constantly evaluated by users, bugs reported and fixed, and new versions shipped out.
Software is an evolutionary process; it is important to remember that free software (especially the Linux kernel) fully embraces this notion.
Re:2.4 is hit and miss. (Score:3, Interesting)
The original article author went off and yelled about this problem and that problem in the Linus kernels, but totally left Red Hat's stuff out in the cold until the very end.... yes, I admit, right now is not a good time to be following 100% pristine Linus code. But the beauty of Linux now is what everybody feared would get really ugly: We have SEVERAL forks in the code, and at least one of them is working quite well....
I'd still rather run Alan's beta code than the best Bill can possibly offer.
He Almost Had Me (Score:5, Interesting)
The kernel seemed to show more stability. Then we hit kernel 2.4.15.
Linux version 2.4.15 contained a bug that was arguably worse than the VM bug. Essentially, if you unmounted a file system via reboot -- or any another common method -- you would get filesystem corruption. A fix, called kernel 2.4.16, was released 24 hours later.
Look, anybody who is deploying a kernel on the day it is released on a production server deserves what they get. One day turnaround on a bug fix is phenomenal. Even if these are marked as "stable" kernels, trying to track the new versions in real time is a dumb thing to do.
This guy has written a moan and groan article based on a small set of bugs, some of which he only could have experienced if he is experimenting on his production system. He obviously requires extreme stability and says he needs this over the new 2.4 features (SMP, 2G memory, 2G files), which makes me ask: why was he putting new kernels on his production system before emperical evidence was there about high stability?
Open source will fix bugs faster the proprietary. It doesn't change reality to make bugs impossible. This is true even in "stable" releases, especially if you are talking about highly stressful production environments.
Re:We've been using it... (Score:1, Interesting)
Oh, congratulations! I can do that too.
2:25pm up 1030044 days, 21:21, 1 user, load average: 0.07, 0.02, 0.93
Bottom line: you're probably lying.
Re:While Linux remains superior to Windows (Score:2, Interesting)
Excellent point about the difference in maintaining stable vs. development series.
While watching the 2.4 drama unfold, I've been wondering if it wouldn't be a better idea to switch kernel development to a Debian-like model -- a simultaneous Stable, Testing and Unstable (experimental) series. It may just be too much to ask bleeding-edge hackers to wait *years* before trying out their latest ideas. At the same time, those who excel at tweaking, optimizing and clarifying shouldn't have the ground shifting under them all the time.
It would probably take some more resources, but it seems that there is a surplus of good minds drawn to kernel development anyway.
Re:Kernel is ok, biggest problem is the applicatio (Score:3, Interesting)
Point Overstated (Score:2, Interesting)
While the facts in this guy's article are correct, he is way overstating stability problems. My company, a high-traffic dot-com survivor, has been running on 2.4 since the pre-releases, and we have never had a single incident on any of our servers.
Although known problems with 2.4 might account for his troubles, it's equally likely he has been having hardware problems. These "bugs" may very well be a straw man, drawing attention away from faulty hardware, which is usually the problem when a machine just suddenly locks up.
"Mac OS X.i is what Linux-on-desktop People Crave" (Score:2, Interesting)