2.4, The Kernel of Pain 730
Joshua Drake has written an article for LinuxWorld.com called
The Kernel of Pain.
He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?
Alphas (Score:5, Informative)
-Paul Komarek
My experience (Score:5, Informative)
Well:
8:33pm up 45 days, 5:49,
Shameful I know, but I had to move city before that I had 6 months. Should had a UPS
This is pretty much a desktop/development box running postgres, JBoss, tomcat, apache, JBuilder and (occasionally) kylix. No problems so far, touch wood.
I also used to work at the comp-sci department of a university were we had 40 boxes in the linux lab, no real problems except they were running ext2 so only the occasional manual fsck. Now the maclab, that is another story (OS9 not OSX).
Similar problem here... (Score:3, Informative)
Oh yeah, and the machine would crash randomly and lose data. We were using ext3, so the file system was (supposedly) still consistant, but whatever was being worked on would be lost.
Ultimately, we upgraded the kernel to 2.4.17, and the problems have been fixed. But the "even number == stable reliable" rule failed us that time.
Since then, I've read that "the entire VM system in 2.4 was replaced around 2.4.10". This really scares me. I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.
Cryptnotic
Problem more serious in Business Computing (Score:2, Informative)
For home use, I really don't find a lot of problem with 2.4 except minor driver problems. But at work, things are very different. I run a few high load critical servers at work that are still on 2.2, the lab attempt to upgrade 2.4 (at early stage) failed because of lock up and performance issues (yes, some due to VM)
It was till recently, I tried again with 2.4.16 that I am getting some reasonable results with the 2.4 series. For your information, performance are about the same on 2.4 with my application, I cannot confirm high load stability issue yet as I need more time to test. But initial results tells me 2.4.17 are resonably stable, only one lockup so far (for two weeks).
Re:Au contraire (Score:4, Informative)
Also
nice -n -10
helps quite a lot on an average desktop linux
Re:Alphas (Score:4, Informative)
Re:Au contraire (Score:5, Informative)
There are serious VM stability issues with these systems. Ever wonder why Redhat hasn't released a >2.4.9 kernel? It's because 2.4.10 is where the new VM system went in. Redhat is busily porting Rick van Riel's 2.4.9 VM up to the later kernels so that they can use it.
Re:Kernel too big? (Score:2, Informative)
Uh... You can compile USB and many other parts as a module.
Re:Oh, stop it! (Score:2, Informative)
No, that's wrong. Red Hat, for instance, which is generally designed to be an industrial-strength server distribution, applies something like 200 patches [lwn.net] to Linus's kernel. Red Hat knows that its customers expect a solid, stable server operating system, so they will do what it takes to build one. Mandrake, on the other hand, knows that its customers are mostly desktop users, so it has other priorities (providing games, etc.) than testing and patching the kernel.
Re:Similar problem here... (Score:3, Informative)
Neither Linus, nor Alan Cox maintain 2.4 at the moment. Marcelo Tosatti does, and from what I read on LKML some ppl thought that to be a bad move at the beginning, but I think it works out just great (the first release he made was 2.4.17 IIRC)
Re:Observations & Experiences (Score:4, Informative)
Um... 1.3.x is, indeed, the stable version. From the website:
2.0.x is the unstable tree at the moment.
Comment removed (Score:3, Informative)
Re:Alphas (Score:2, Informative)
2.2.18 worked grand, and I believe 2.2.19 will as well. I'm running 2.4.16 now which has given me very little trouble (bar a broken network driver)... the machine is rarely stressed so I can't say for sure. One day I'll fire up 10 seti processes and see what happens.
Hope that's of some use.
One year to stability (Score:1, Informative)
Our servers running 2.2.19 and 2.4.16/17 still lock up from time to time, usually every 30-60 days, but compared to using Rick's VM we'd only get about 16 days uptime.
We were forced to upgrade to the 2.4.x kernels because 2.2.x no does not support the chipsets that our servers use.
Re:Au contraire (Score:2, Informative)
The KDE team and Trolltech have done a great job and when KDE 3 is released in the next month or so it will be definitly worth checking out.
Linux 2.4 on our router (Score:4, Informative)
Well, I put 2.4.14 on the box and I haven't rebooted since. I have 61 days of uptime and that's the most I've seen on that box ever. It is finally stable. The only thing I can conclude is that it's AA's VM that is doing the trick. And in hindsight, it makes sense. The behaviour of the box was that it was thrashing, but at the time it didn't seem that way because I hadn't noticed the HDD light was disconnected from the box and I couldn't hear the disk in the noisy server room.
So, Linux 2.4 is (knock on wood) stable for my servers, now.
Jason.
Re:Au contraire, agree (Score:1, Informative)
that large machines are completely different beasts than your desktop. Your x86 based machine is going to behave quite differently than say a 2, 4, 8 or more cpu machine with gigs and gigs of ram and 100s maybee 1000s of gigs of drive space. I am a linux advocate myself, but I would not put it on the IBM F50 I used to work on. This is where the community needs to pay attention. We are on what, near 0% of the dekstop, so are we pushing for this? Linux used to be making inroads into the server community, lets keep it that way and fix these issues...
Re:Why Linux? (Score:2, Informative)
Regardless of your other points, this is simply not true. There is a large amount of driver development happening in the FreeBSD project. Most hardware that people actually use is supported. Even the nVidia binary module for linux is being ported(in some obscure way) to FreeBSD.
Also, why would you want a linux kernel module running on a FreeBSD kernel?
--xPhase
NFS and 2.2 (Score:3, Informative)
There were some lingering problems with NFS (even v2 using UDP) in the 2.2.x kernel series until 2.2.19.
I recommend that you upgrade the machine that's running 2.2.17, or else apply the NFS patches. If you're using NFS v3 or TCP, you definitely want to upgrade to the latest version, and get the latest NFS utils.
Turn on your hard drives (Score:2, Informative)
hdparm -u1 -ci -d1
I can't believe I was running without it. Does anyone know why this is not turned on by default?
Use
man hdparm to learn what these settings do.
However, your problems sound more like Xwindows problems than kernel problems.
Re:Au contraire (Score:2, Informative)
Re:Au contraire (Score:5, Informative)
As the author of a window manager and big hunks of GTK, I don't think your analysis is quite right.
The primary problem is synchronization, not delay. GTK 1.2 is very fast, its geometry code is not causing any slowness. You are confusing slow with flicker. Flicker looks slow but slow is not the problem; no matter how fast code is, if it flickers, you will see it, and it will look slow.
Similarly when opaque resizing a window; it has nothing to do with quantization or speed, the problem is that the window manager frame and the client are not resized/drawn at the same time resulting in a "tearing" effect. This would be visible no matter how fast you make things.
As you say, putting the toolkit in the server or putting the WM in the toolkit are overradical ways to fix this. It's not even necessary to backing store all X windows. It could be done with an extension allowing us to push a backing store on a single X window during a resize, for example. However fixing it 100% pretty clearly requires server changes, and that's why you haven't seen a fix yet.
While Linux remains superior to Windows (Score:5, Informative)
And now for some arm chair quarterbacking, all that having been said, I really think Linus needs to excersize some self discipline and stay away from maintaining even-numbered kernel releases (x.0.x, x.2.x, x.4.x, etc.). By his own admission he isn't good at being a stable kernel maintainer and prefers the more interesting work done in development kernels, and his track record in 2.2 wasn't fantastic (particularly in comparison to 2.0, where he did a fantastic job) and was pretty abysmal in 2.4. As someone who's been using GNU/Linux since the early pre 1.0 days I hope he'll put his efforts where his talents are (managing changes in odd numbered development releases) and leave stable maintenance to Cox and Marcelo (who are very good at maintaining and improving stable releases). But enough commentary from the peanut gallery...
Re:Why didn't he downgrade immediately? (Score:1, Informative)
As the owner of the machine in question, let me answer that question directly.
It's been virtually impossible to downgrade from 2.4 to 2.2 for a variety of reasons, including file system incompatibility and inability of 2.2 to install with the raid controller in that box (lack of compatible drivers).
Believe me, if downgrading were as easy to do as to say, we'd have done it long long ago.
We're fed up with getting up at 3 AM, or 5 AM or whatever to check and see if the server has choked and whether or not someone will have to go and reset it.
- 2.4Insomniac
Re:Unfortunately I have to agree (Score:4, Informative)
At about the time the 2.4 kernel was first released, we were bulding a server for serving out large media files for encoding. We were on a limited budget, so we put together a PC with about 256 MB RAM running on a K6-2/500. Set it up with a combination of RAID 1 and RAID 5 with 2x40GB and 2x80 GB IDE drives. While running with the stock RH 6.2 kernel we had no problems. But we needed the 2.4 kernel for large files, so we waited until we couldn't wait any longer.
This turned out to be problematic to say the least. While we had 7 servers running RH 6.2 and never had a crash, the machine serving up the media files would lock up whenever copying large files, or whenever many files were being copied. Kept me working through a few weekends trying the latest kernel and then stress testing the server with large file copies. We wound up reverting back to a 2.2 kernel because the crashes were too frequent.
I haven't tried the RH kernels for 2.4 on anything other than desktop systems. I can say that, on RH 7.1 at least, the 2.4 kernel in use is rock solid and has never crashed for me at home or on desktop systems at work. I never got the chance to try the kernels on RH 7.1, but I suspect Redhat kernels would probably be more stable. They've got the resources to stress test and modify kernels for specific needs.
I liked the article. He's not a kernel hacker and writes from his experience of the 2.4 kernel with clients. Only problem I see is WTH was he thinking using Mandrake 8.0 for a server? That version of Mandrake, more than any other I've used, I've found to be very unstable on 2.4.
Re:Guess I've been lucky (Score:2, Informative)
the above message may be a troll (Score:3, Informative)
I saved this one for last:
You see, that's what we call not in the linus kernel. Your impressions of importance and maturity of the patch are really something you should take up with Linus himself. I, for one, wish Ingo's TUX subsystem makes it into the linus tree sometime soon. But you have no basis to say that just b/c a kernel patch is out, and linus hasn't integrated it into his stable tree, the linux process is flawed. Get a clue! Independent patches come out much faster than anyone can pull them into the core; they are usually conflictive and compete with other patches to solve the same problem. So it takes a while. If you want it in the linus tree sooner, help out. Welcome to open source.2.4, fb, usb and cd write (Score:2, Informative)
-2.4.5 usb-storage was flakey.
2.4.6 usb stablized.
2.4.15 broken
2.4.16 first stable version with ext3
2.4.16 ide-scsi hangs with some cd-recorders
2.4.17 seems stable for me.
Re:Why Linux? (Score:1, Informative)
FreeBSD is more secure.
FreeBSD is as fast, or faster depending on the task.
FreeBSD is more free.
FreeBSD is much easier.
There's your solution, and with FreeBSD the BSD'd codebase keeps getting richer!
Re:Turn on your hard drives (Score:2, Informative)
-u Get/set interrupt-unmask flag for the drive. A
setting of 1 permits the driver to unmask other
interrupts during processing of a disk interrupt,
which greatly improves Linux's responsiveness and
eliminates "serial port overrun" errors. Use this
feature with caution: some drive/controller combi
nations do not tolerate the increased I/O latencies
possible when this feature is enabled, resulting in
massive filesystem corruption. In particular,
CMD-640B and RZ1000 (E)IDE interfaces can be unre
liable (due to a hardware flaw) when this option is
used with kernel versions earlier than 2.0.13.
Disabling the IDE prefetch feature of these interfaces
(usually a BIOS/CMOS setting) provides a
safe fix for the problem for use with earlier ker
nels.
"My experience" (Score:2, Informative)
Although I'm sure this is already lost in the flood of comments :)
I started out running Debian 2.2r2(3?), which had a 2.2 kernel. I never had any problems with it, and I didn't have very fancy hardware. I did have a fun ride getting integrated i815 audio/video to work (that was all I had then). Upgrading to 2.4.4 didn't really help with that issue ... I had impeccable stability, but I never really pushed the envelope :)
Things got more interesting around 2.4.10, partly because my HD crashed (the Deathstar effect) and I rebuilt my system. Lack of MIDI for the SB Live! finally drove me to use ALSA drivers (SB Live! MIDI hasn't worked for me with OSS drivers). No problems support-wise for the Radeon-based card I had for kernel-related issues (kernel DRI mainly). Nice USB support except for the cheap Visioneer 4400 scanner I had (which isn't the kernel's fault). Stability and performance under medium loads has been generally good. I flirted with the pre-emptive and lock-breaking patches for a while, but things got really messy under 2.4.17 (stuff happened that I stopped using Windows to get away from, ie, sudden hard system freezes), so I dropped back to 2.4.16 with no patches. I'll probably try again with .18 or something. The only speed complaint I've really had is gmc takes a long time to scan directories with lots of stuff in them :)
Now, I haven't really had many problems. 2.4.13-ac8 oopsed, but since I was panicking about midterms at the time I didn't really look at it that closely, and never got around to actually reading the oops screen (I took a pic of it with someone else's camera ... ). I haven't taken time to find out why yet, but XMMS' memory footprint has this habit of growing unbounded ^^; The ALSA output plugins make it worse.
For the curious: my system is a P3 866 with 128 megs of RAM to start. I upgraded to 512 megs in July when I was running 2.4.4; that's the most memory my motherboard can take, alas. Heaviest loads include running XMMS, apache, xinetd, xchat, and compiles at the same time, so not too bad. Apache was mainly serving stuff to one or two people at maybe 20-30k/sec. I've been generally happy with the 2.4 series, but since I haven't really pushed my system hard yet I may experience problems later when I do :3