Forgot your password?
typodupeerror
IBM Software Linux

Linux Kernel Gets Fully Automated Test 159

Posted by CmdrTaco
from the just-like-a-real-project dept.
An anonymous reader writes "The Linux Kernel is now getting automatically tested within 15 minutes of a new version being released, across a variety of hardware and the results are being published for all to see. Martin Bligh announced this yesterday, running on top of IBM's internal test automation system. Maybe this will enable the kernel developers to keep up with the 2.6 kernel's rapid pace of change. Looks like it caught one new problem with last night's build already ..."
This discussion has been archived. No new comments can be posted.

Linux Kernel Gets Fully Automated Test

Comments Filter:
  • by 3seas (184403) on Sunday June 05, 2005 @11:48AM (#12729387) Journal
    code generation...
    • Actually, that could be done, could it not? Throw in some random functions, if/while/do loops in, return random variables, etc. It could create some funky new software. :)
      • I already got 1 million monkeys in my basement working on it.
      • by Curtman (556920) on Sunday June 05, 2005 @12:26PM (#12729592)
        Actually, that could be done, could it not?

        Apparently it works for Samba [samba.org]. :)
        • code generation is good for repetitive stuff especially if your language doesn't have much in the way of a built in preprocessor

          say for example producing similar load on demand wrappers for a load of functions in a dynamic library.

          p.s. /. seems to be restricting me to one post every 15 mins right now dunno why (the error says Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posting a comment.

          It's been 14 minutes since you last success
          • code generation is good for repetitive stuff especially if your language doesn't have much in the way of a built in preprocessor

            There's a fair bit of repetitive code in the kernel. I had to do some hacking to make some RS-422 cards we had work properly, and found that a lot of the char drivers especially contain very similar code, and structure. Code generation might help with older drivers that nobody cares about until they break. They tend to rot from the looks of things.
    • No problem. The following is an automated code generator. It generates a hello world program in C and writes it to stdout. (untested)

      #include <stdio.h>
      int main()
      {
      char const* program_pattern = "%s%s";
      char const* include_pattern = "#include <%s>\n";
      char const* function_declaration_pattern = "int %s(%s)";
      char const* function_definition_pattern = "%s\n{\n %s;\n}\n";
      char const* print_pattern = "printf(%s)\n";
      char const* string_pattern = "\"%s\"";

      char const* stdio_header_name = "stdi

    • by Anonymous Coward
      Its called lisp ;-)
    • Whats so funny about it, code generation is used left and right in modern projects, this stuff is great to shift the grundwork away from the developers and not having to go into outsourcing hell.
    • Why hasn't anyone mentioned lex or yacc yet?
    • I think the kick in teh butt humor found in this is that of the computer auto generating the code, auto compiling it and auto testing it and regenerating code for improvement based on test results, ... loop it Johnny Mnemonic... uhhh ertrrr Neo..

      Being one fully aware of the possiblities of auto coding or using code generators, both of which exist today in one form or another, just not so completely available wide scope on much of any user/consumer platform..

      I was being serious but certainly found the hum
    • May not be 100% or even hardcore, but you can go from use case to code if you put in some time. It will also write Java code using your UML diagrams.

      It's based off of Eclipse. Check it out if you can.
  • Why has it taken so long?
  • by LCookie (685814)
    "The Linux Kernel is now getting automatically tested within 15 minutes of a new version being released"

    Would be much better to test it BEFORE a new version is being released, otherwise this is completely useless...
    • Great idea. You should ask IBM to integrate their test platform into Linus' processes. He might be dubious after BitKeeper (that idiot) about another company helping him, but in this case I think it's a great idea.

      There may be (and probably are) other test beds out there, testing releases. It would be better for Linus (and the world) if he could release already-tested code to the world, instead of having the world duplicate all the testing effort, and IBM seems like a perfect solution.

      • by oxfletch (108699) on Sunday June 05, 2005 @12:02PM (#12729463)
        I automatically test every nightly -git snapshot release, so it's fairly well tied in anyway. This also means my heaviest usage of our machines is at night, when most of the (US) developers are asleep.

        So it's fairly well tied in already ... and the whole -rc cycle should enable us to catch a lot of stuff.

      • In any case, most people, especially in mission-critical processes, don't compile a new kernel as soon as it's released. Myself, I try kernels after a while, when no major issues are found. Even then, I test them out first in different test machines. So 15 minutes before, 15 minutes after, it's all the same.
        • But it's not all the same, though. Once it's "blessed" by Linus, it's released. If he had access to the test machines prior to releasing it, he could release higher-quality code.

          And since the entire test run only takes 15 minutes, IBM (and the world) would benefit from allowing him multiple tests per release.

    • by DigiShaman (671371) on Sunday June 05, 2005 @11:57AM (#12729432) Homepage
      Sounds like the solution to this problem is clear. Always use the second to latest kernel released. Stay away from the new one untill it's fully tested to your satisfaction.
    • "Release" in the open source world has a broader sense than in commercial software. In open source not all "released" versions are meant for general public consumption; they include unstable versions targeted mostly at developers, so that severe isues can be detected and patched quickly.

      Taking this into account, I believe this is meant to catch bugs mainly in nightly (unstable) builds and release candidates, not in "final" versions (those should, at least in theory, have no serious bugs left around as th
    • by Metteyya (790458) on Sunday June 05, 2005 @12:06PM (#12729489)
      because they are nightly builds, that is - versions with applied patch, but untested yet.
    • So let me summarize wether I understood it right:

      You say it's "completely useless" because you have to wait 15 minutes when a kernel is released.

      And this is modded "insightful".

  • Question: (Score:5, Interesting)

    by bogaboga (793279) on Sunday June 05, 2005 @11:50AM (#12729396)
    How were the previous kernels being tested? Were sources for improvement/change/modification, bugs and areas requiring refactoring being discovered by chance?
  • This is good, and long overdue (I'm surprised it hasn't been around for years), but just how much testing is being done? Compiling? Booting? Or are there actual functional and reliability tests which are being performed?
    • Re:How much testing? (Score:5, Informative)

      by oxfletch (108699) on Sunday June 05, 2005 @12:06PM (#12729483)
      Compiles, boots, runs dbench, tbench, kernbench, reaim, fsx. If one test fails, it'll highlight it
      in yellow, rather than green or red. I have a few of those in the internal tests, but not the external set.

      This is only the tip of the iceberg as to what can be done. We're already running LTP, etc internally, and several other tests. Some have licensing restrictions on results release (SPEC) ... LTP is a pain because some tests always fail, and I have to work out the differential against baseline. Will come later.
  • What took so long (Score:4, Interesting)

    by Timesprout (579035) on Sunday June 05, 2005 @11:53AM (#12729405)
    Most projects of any complexity use automated continuous build and testing as a standard development practise.
    • Presumably... (Score:5, Insightful)

      by Kjella (173770) on Sunday June 05, 2005 @11:57AM (#12729433) Homepage
      ...the cross-platform, cross-hardware part? Setting up one machine to build automatically is easy. Setting up a whole bunch of them (and all unique, read administration nightmare) and tie them together to a system, that's quite a bit of work.

      Kjella
      • Re:Presumably... (Score:5, Informative)

        by oxfletch (108699) on Sunday June 05, 2005 @12:10PM (#12729514)
        Indeed. The automation system I wrote is just a wrapper around an internal harness called ABAT that has a massive amount of work behind it. If systems crash it can detect that, power cycle them, etc.

        Going from 90% working to 99.9% working is frigging hard. I had all this working 3-6 months ago, but the results weren't good enough quality to be published. Several people internally put a massive amount of work into improving the quality and stability of the harness.
        • Re:Presumably... (Score:3, Insightful)

          by Bob_Robertson (454888)
          I don't remember who said it first:

          The first 90% takes 10% of the time.

          The last 10% takes 90% of the time.

          I expect one could substitute "money", "labor", "effort" for "time" in the above.

          Bob-
          • It's generally known as the 80/20 rule. 80% takes 20% of the effort, while the other 20% takes 80% of the effort.

            The idea is the same though.
      • ...the cross-platform, cross-hardware part?

        It's magic [netbsd.org]! A single script and I can build a complete operating system for a big-endian 64bit architecture on a 32bit little-endian architecture, or any of the other 48 supported archs. More than that, I can build a complete NetBSD for any arch on any halfway POSIXish system.

        build.sh bootstraps its own contained build utils (compiler, binutils et al) and builds the system with that. You can even build the complete system as non-root and get tarballs that you ca
      • We've been playing with some IBM tools at work that automate server setup and provisioning... its pretty amazing stuff.

        You can basically retask servers in something like 10-60 minutes depending on what you are doing, and its a completely automatic process.
      • by nietsch (112711)
        http://aegis.sf.net/ [sf.net]aegis.sf.net
        and it can do a lot of other things too, like making sure that each change has an accompagning test and that all tests pass before anybody else is bothered with that change.

        The biggest downside for aegis (as I see it) is that it needs to run on a central development server, it is not server based like CVS or the others(it has a cvs-like interface for reading). But OTOH, would it be so hare to have the kernel developers log into a central compile farm where the linux kernel i
    • They've been doing that the whole time, they call them "users".
  • Maybe... (Score:2, Interesting)

    by ratta (760424)
    automated performance regression tests may be useful too.
    • Re:Maybe... (Score:5, Informative)

      by oxfletch (108699) on Sunday June 05, 2005 @12:19PM (#12729552)
      The results are all there if anyone wants to play with them. Go to the results matrix, and click on the numerical part of the green box. Pick a test, and drill down to the results directory.

      The numbers are there, it's just a question of drawing graphs, etc. I have some for kernbench already, but I'm not finished automating them. If anyone wants to email me code to generate them from the directory structure published there, feel free ;-) Preferably python or perl into gnuplot.
      • Instead of just reading a bunch of complaints, let me be 1 Slashdotter to thank you for your efforts.

        It's too bad the Stanford Checker can't be integrated into your system.
  • This is awesome (Score:5, Insightful)

    by jnelson4765 (845296) on Sunday June 05, 2005 @11:54AM (#12729410) Journal
    But it can't catch everything - the 1394 bus was screwed in 2.6.11. There are a lot of regressions that show up - and even that healthy cluster of systems will not show every problem.

    Sound issues? Older network and SCSI cards? There are a lot of drivers that break, and no one notices it because there is nobody with the hardware testing the -rc or -mm kernels.

    Wouldn't it make more sense to package these tools for someone to install on their collection of oddball equipment, and assist in the debugging/testing?

    Where's the ARM, MIPS, and SH?
    • Re:This is awesome (Score:5, Insightful)

      by Meshach (578918) on Sunday June 05, 2005 @12:12PM (#12729520)
      But it can't catch everything...
      But that is not the point of automated testing. As a member of a qa team who is developing automated tests I get comments like that every day

      Automated tests are not intended to catch everything or test strange permutations of pre-conditions. There purpose is to provide a mechanism for verifying that a build satisfies the basic requirements of the project.

      More exotic configs need to be tested manually as usual but automated tests can provide a "failsafe" just in case a basic part of the build is broken.
      • by xant (99438) on Sunday June 05, 2005 @03:01PM (#12730389) Homepage
        Reliable, repeatable testing is a great way to prevent fixes in one area from causing bugs in another. When I fix A, I generally only test A manually. I don't test every other conceivable code path, even though my fix for A might well impact them.

        An automated test for B will catch regressions caused by my fix in A, making it harder to backslide. Backsliding is very expensive because bugs are far removed from their cause. If an automated test sees that changes in A caused a regression in B, the cause is immediately obvious.
      • Automated tests are not intended to catch everything or test strange permutations of pre-conditions. There purpose is to provide a mechanism for verifying that a build satisfies the basic requirements of the project.

        Isn't that what a compiler is for? ;)

    • I agree with jnelson4765, new buids would be well served to be tested on a great many machines with a wide variety of hardware setups.

      Who should map the hardware testing platforms? I don't know, but I do know that if the new kernel builds are tested for a generic group of hardware and released, then other testers report on their tests using hardware X, you would end up with a relatively quick listing of a new build against many variants of hardware. Published correctly, it would allow people to search for
    • Unfortunately, organizing that kind of odd ball testing would be a management nightmare unless you want to go out and collect all of the hardware. Remember, some people do post patches and whole driver releases without stepping inside of the kernel team's realm.

      The only real way to automate something like that would be a dummy load facility. Some software which would emulate the hardware being in place. Something conceptually similar to that effect anyway.

      So then, for every driver for a device, you have a
    • Where's the ARM, MIPS, and SH?

      IBM doesn't sell any ARM, MIPS or SH-based systems. So, they don't test them.

      The Debian buildd system is an automatic building and semi-testing system for, of course, all the archs that Debian supports, and that includes ARM, MIPS, and SH.
    • Wouldn't it make more sense to package these tools for someone to install on their collection of oddball equipment, and assist in the debugging/testing?

      That's how the PostgreSQL build farm [pgbuildfarm.org] works. People with wierd hardware [onlamp.com] apply to be added to the automated test farm. ARM, MIPS, PARISC, Alpha, PowerPC, Sparc, etc. are all represented well in the postgresql automated tests.

  • by kyllikki (88559) on Sunday June 05, 2005 @11:54AM (#12729411) Homepage

    ARM Linux has had something similar in Kautobuild [simtec.co.uk] for some time.

    Although the testing and building is limited to the ARM platform.

    The site also has a whos who thats worh looking at ;-)

  • News Flash (Score:5, Informative)

    by sirReal.83. (671912) on Sunday June 05, 2005 @12:02PM (#12729464) Homepage
    Red Hat (and probably Novell/SuSe, since they use over one thousand kernel patches) runs a myriad of tests on each of its own kernel builds nightly - and has been doing so for years. On more than just the 3 architectures covered by this test.

    That said, pushing tests upstream is a great idea. Just not revolutionary or anything.
    • News Flash #2:

      Redhat has several engineers that *are* upstream.
    • Man, I wish they'd test Fedora kernel releases on their test farm. Of a dozen different machines I've run 2.6 Fedora kernel releases on, I've lost 1394 on one, USB on another, the hardware clock, on a third, parallel port probing on the third, serial ports on a fifth, and the Compaq Smart Array on the sixth.

      The other six machines seem OK. But that's a 50% buggered rate from various flavors of 2.6 upgrades, mostly from nightly 'yum update's. These are all IBM, Compaq, HP, and Dell machines, so somebody's
  • Long uptimes (Score:5, Interesting)

    by rice_burners_suck (243660) on Sunday June 05, 2005 @12:02PM (#12729465)
    This is a very smart system. The Samba team uses something very similar. The key to finding regressions with this method is to create tests for every piece of functionality, and to integrate it with the rest of the testing suite, so that each function of the kernel will be continuously tested. For new features, it is preferable to create these tests as the features are being coded. For existing millions of lines of code, it is necessary for some brave souls to go in and create these tests.

    I hope they are using code from the Linux testing suite. That piece of work has already formed a nice set of tests. Also, I hope that the kernel is automatically built with many different combinations of options. And with time, I hope this will become better. The more tests, with the more hardware configurations, with the more kernel configurations, with the more types of input data (including many imaginative forms of incorrect input data to test that the kernel handles it gracefully and thwarts attacks based on such methods), the better quality we will have in the kernel, and it is likely that Linux will be unmatched in quality, stability, efficiency (well, maybe not efficiency necessarily), and long uptimes.

  • by moviepig.com (745183) on Sunday June 05, 2005 @12:06PM (#12729485) Homepage
    With an automated test suite, what happens when a class of bug is discovered to be untested-for? Presumably, the suite is modified to detect it. Then, is the resulting new suite itself subjected to an automated test suite? And, then...[divide-by-zero error...]
  • by blixel (158224) on Sunday June 05, 2005 @12:09PM (#12729506)
    Does this mean we'll get back to 2.6.x releases? Instead of new version of 2.6.x being released as 2.6.x.x every third day?
  • by DruggedBunny (703795) on Sunday June 05, 2005 @12:23PM (#12729569) Homepage

    Martin Bligh announced this yesterday, running on top of IBM's internal test automation system.

    Hope he doesn't fall off and hurt himself.

  • I got to work on part of this system, which IBM calls Autobench, for my senior project at PSU. The system is a highly configurable framework which can download, compile, and run various benchmarks and profilers (for example while compiling a kernel). Its all centrally administered, so IBM can run a battery of tests on a variety of different machines at once.

    I think Martin Bligh said that IBM has been using this for a while now, automatically downloading kernels upon release and testing them. The new thin
  • needs work! The latest builds all failed!
  • Years later and finally it is getting some *basic* QA testing done! What will they think of next!
    • I'd expect the community to start advocating unit testing, an agile development practice, at some point to increase the reliabilty of code before it is even merged into the nightly builds.

      I realize that this is not the same as testing the entire package on dissimilar hardware like he is doing here; For instance, there are bound to be a few issues when developers of code and its underlying code base both submit updates the same evening. IMHO, it'd especially help new developers if there existed unit tests

    • Individual distros have been doing this for years. Red Hat is one company that is known for its extensive testing of the kernel (as well as many other OSS projects). Don't use a vanilla kernel if you're running a production environment.
      Regards,
      Steve
  • One of the main goals appears to be whether the kernel builds or not. I shouldn't have to tell slashdot that build errors are among the most trivial of OS programming errors. They certainly exist, as the chart shows, but whoever is in charge of this project has a long way to go, by adding real tests of functionality. Consider it job security ;)
    • For one, did you actually bother to look at the results at all, and what tests are being run, and
      published?

      For another, this is only the tip of the iceberg as to what can be done, but I'm not going to lock whatever I have now in some dingy dungeon until it's "finished". What's there is useful, ableit incomplete. Testing is *never* complete.

      The main goal, as you put it, is to improve the quality of the linux kernel. If we can ensure the kernel builds, boots, and runs basic tests ... in a fully automated wa
  • I think the PostgreSQL buildfarm [pgbuildfarm.org] is one of the coolest ones I've seen. It's distributed across a bunch of volunteer-run machines representing a broader selection of architectures than most any other automated-test projects I'm aware of. A nice article on it can be found here [onlamp.com]

    Any other projects out there with similar transparency in their automated testing?

  • NetBSD has about the same thing - compiling of the whole operating system (kernel, userland, X) for ~50 platforms. Logs are available [netbsd.org] for developers to fix things.

    - Hubert

Reality must take precedence over public relations, for Mother Nature cannot be fooled. -- R.P. Feynman

Working...