Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Operating Systems Software Linux

Linux Kernel Surpasses 10 Million Lines of Code 432

javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
This discussion has been archived. No new comments can be posted.

Linux Kernel Surpasses 10 Million Lines of Code

Comments Filter:
  • Isn't that normal? (Score:4, Interesting)

    by arizwebfoot ( 1228544 ) * on Wednesday October 22, 2008 @01:34PM (#25471019)
    That the line count increases with each new version unless you are starting from scratch?

    --
    Oh Well, Bad Karma and all . . .
  • by bubulubugoth ( 896803 ) on Wednesday October 22, 2008 @01:34PM (#25471025) Homepage

    And how much of this lines are for core functions (Memory Managements, Scheduler, etc) and for drivers (USB, Filesystem)

  • Re:Lines of Code (Score:5, Interesting)

    by theaveng ( 1243528 ) on Wednesday October 22, 2008 @01:40PM (#25471145)

    I used to have GEOS on my Commodore 64. I have absolutely no idea how many lines of code it used, but it could squeeze itself into just 20 kilobytes of RAM, and yet had lots of functionality (as good as an 80s-era Mac). I consider "how much RAM occupied" to be a FAR more useful metric.

    I would love to see someone develop an OS that followed a similar philosophy of using as little RAM as possible.

  • by EraserMouseMan ( 847479 ) on Wednesday October 22, 2008 @01:40PM (#25471153)
    I'm a developer and was wondering what kind of testing is done to verify the code. Do they use unit testing? Regression testing?

    I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
  • by linuxmeepster ( 1383107 ) * on Wednesday October 22, 2008 @01:43PM (#25471193) Homepage
    It's significantly easier to hide a malicious backdoor inside a huge software project than a small one. Linux has already had a near miss [theregister.co.uk] back in 2003, when the CVS repository was compromised. Considering how many mission-critical applications run under Linux, there's a huge financial incentive to hide a backdoor somewhere in those 10 million lines.
  • by apathy maybe ( 922212 ) on Wednesday October 22, 2008 @01:47PM (#25471245) Homepage Journal

    May I suggest that large parts of this shouldn't be in the kernel at all? That there should be independent sub-systems so that in the event of a crash or panic, the entire OS doesn't come tumbling down?

    So that badly written drivers (especially graphic card drivers) don't affect the stability of the entire system?

    May I suggest that flame-wars are good and the EMACS is also bloated?

    (And lots of other folks have already talked about the bad metric that lines of code is...)

  • by vally_manea ( 911530 ) on Wednesday October 22, 2008 @01:52PM (#25471353) Homepage
    there is at least the linux test project http://ltp.sourceforge.net/ [sourceforge.net] I see a lot of unit, regression testing and stress tests.
  • I Wonder? (Score:4, Interesting)

    by TheNecromancer ( 179644 ) on Wednesday October 22, 2008 @01:56PM (#25471417)

    I wonder what the breakdown is of the almost 4 million lines that were omitted in the count, for blank lines, comments, etc.? I've always said that commenting your code is a very good thing to do, so it would be interesting to see what the percentage of this is comments, as opposed to blank lines (which isn't a bad thing for readability).

  • by Anonymous Coward on Wednesday October 22, 2008 @01:57PM (#25471443)

    If you think that's amazing, check out one of the BSDs sometime. In particular, look at NetBSD's codebase. Compared to the hodge-podge that is the Linux kernel in which it's very obvious sometimes that it's being thrown together by multiple developers, the BSDs' cohesive source code is like looking at the Mona Lisa. No wonder Linux has a reputation for being used by sugar-charged 14-year-olds who want to appear cool to their Windows-using friends by installing EZ-mode Ubuntu and suddenly thinking they're sysadmins because of it. Meanwhile, the BSDs have seasoned UNIX developers with experience spanning decades, working on a codebase with roots in academia in which solid algorithms and peer review rule the day.

    Benchmarks show FreeBSD 7 is faster than the Linux 2.6 kernel. It's pretty obvious why when comparing their source code.

  • Re:Um (Score:3, Interesting)

    by Yfrwlf ( 998822 ) on Wednesday October 22, 2008 @02:06PM (#25471605)
    And what would be better, a kernel that you could simply include or not include certain modules without the need for compilation, making the kernel truly modular, and hot-swapping them in or out based on your needs. That would make the kernel much more powerful and also useful for "normal" users/admins who might not want to mess with compiling. But, I'm sure my argument will be slapped at by some leave-things-be get-off-my-lawn fanboy who hates the idea of scary new features like true/better modularity.

    Save a tree. Let the actual devs do compiling unless someone really actually wants to see the code.
  • by Kjella ( 173770 ) on Wednesday October 22, 2008 @02:09PM (#25471655) Homepage

    From what I've gather, pretty damn near "all of the above". One of the nicer things about being a high-profile open source tool is that a lot of people are interested in researching automated code analysis on it, be it unit testing, regression testing, static analysis, dynamic analysis or whatever. And having a quality nazi on top helps. Here's what happened a few days ago on the dri-devel list from Linus:

    "Grr.

    This whole merge series has been full of people sending me UNTESTED CRAP.

    So what's the excuse _this_ time for adding all these stupid warnings to
    my build log? Did nobody test this? (...)"

    In many places, you can do a pretty lousy job and still collect a paycheck. Something tells me you won't get many patches in the kernel that way.

  • by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Wednesday October 22, 2008 @02:13PM (#25471721) Homepage Journal

    Yes, but it can go down with optimizations and refactoring (finding duplicated code and pushing it into a function or macro, for example) and with eliminating dead code. Ideally, code size should be asymptotic to an optimal size. As you approach the optimal size, more and more of what you need to do is already available to you. As you approach the limit, the amount of special-case logic and hardcoding approaches zero, and the amount of data-driven logic approaches 100%. Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs. Simulating a computer is always going to be slower than actually using the real computer directly. In most cases, this is considered "acceptable" because your virtual machine is simply too advanced for any physical hardware to support at this time. (There is also the consideration of code changes, but as you approach the limit, your changes will largely be to the data and not to the codebase. At the limit, you will change the codebase only when changing the hardware, so if you could hardwire the code, it would not impact maintenance at all. All the maintenance you could want to do would be at the data level, given this level of abstraction.)

    Linux is clearly nowhere near the point of being that abstract, although some components are probably getting close. It would be interesting to see, even if it could only be done by simulation, what would happen if you moved Linux' VMM into an enlarged MMU, or what would happen if an intelligent hard drive supported Linux' current filesystem selection and parts of the VFS layer. Not as software running on a CPU, but as actual hard-wired logic. Software is just a simulation of wiring, so you can logically always reverse the process. Given that Linux has a decent chunk of the server market, and the server market is less concerned with cost as it is with high performance, high reliability and minimal physical space, it is possible (unlikely but possible) that there will eventually be lines of servers that use chips specially designed to accelerate Linux by this method.

  • Re:Lines of Code (Score:5, Interesting)

    by rumblin'rabbit ( 711865 ) on Wednesday October 22, 2008 @02:14PM (#25471727) Journal
    A better metric is the number of semicolons. Thus this

    for (int i = 0; i < n; i++) a[i] = b[i];

    is the same length as this...

    for (int i = 0;
    i < n;
    i++)
    {
    a[i] = b[i];
    }

  • by Anonymous Coward on Wednesday October 22, 2008 @02:17PM (#25471765)
    If something has worked flawlessly for you, you might consider it to be "bug free", even if it has tons of bugs that affect plenty of other people.
  • by Ungrounded Lightning ( 62228 ) on Wednesday October 22, 2008 @02:23PM (#25471853) Journal

    The better metric would be how many Libraries of Congress the kernal is.

    Perhaps better would be number of times the size of the Unix System 6 kernel.

    That's the one that the University of Waterloo printed as a textbook, half of a two book set. (The other book was the OS course text using it as the example.) They printed it at 50 lines per page column and added (lots of) whitespace and adjusted comments so routines fell on nice page boundaries. Even padded this way it came out to a total of ten thousand lines (of which I think 2 thousand were still in assembly code). Just right for one person to maintain full-time by the then-current rule-of-thumb.

    So the linux kernel is a thousand times the size of that (whitespace-padded) version of the Unix kernel.

  • by PeterPlan ( 1304775 ) on Wednesday October 22, 2008 @02:23PM (#25471863)
    http://www.youtube.com/watch?v=L2SED6sewRw [youtube.com] Your question is answered in this talk. Briefly: The only way to test a kernel is to actually use it.
  • by hierophanta ( 1345511 ) on Wednesday October 22, 2008 @02:32PM (#25472009)
    i believe a more appropriate measure of the 'bloat' (i.e. useless functions) or the size of any software package is through function point analysis--

    http://en.wikipedia.org/wiki/Function_point [wikipedia.org]

    http://www.softwaremetrics.com/fpafund.html [softwaremetrics.com]

    the lines of code metric has long been considered an inadequate measure of software cost, complexity, or size - here is an article on why:
    http://www.creativyst.com/Doc/Articles/Mgt/LOCMonster/LOCMonster.htm [creativyst.com]

    but LOC is without question one of the easiest measurement (aside from total package size in bytes, which is nearly as uninformative)
  • Re:Tell us Bill (Score:3, Interesting)

    by Poltras ( 680608 ) on Wednesday October 22, 2008 @02:36PM (#25472075) Homepage

    Kernel only or included soft included? Because if you count included soft, it doesn't make it a fair comparison. And note that the whole graphic subsystem is included in there also, so add X11 to the lot... but whatever, comparing the number of lines of code is akin to comparing the number of bolts in a car.

    it's interesting information nonetheless. Divide the number of bugs by the number of LoC and you get a better-than-industry ratio in both cases. Which says a lot.

  • by Anonymous Coward on Wednesday October 22, 2008 @03:12PM (#25472595)
    I downloaded the latest 2.6.27.2 tarball, untarred it, removed all except the "x86" folder from under the "arch" folder and ran this in the source root:

    find . -exec grep -v "^$\|^\*\|^#" {} \; | wc -l

    to exclude blank lines, lines starting with "#" for comments and lines starting with "*", again for comments. I realize that this excludes the "#include" statements but there number should be negligible in the overall count.
    The result is 6,022,957.
  • Re:"Actual" code? (Score:3, Interesting)

    by bonch ( 38532 ) on Wednesday October 22, 2008 @04:02PM (#25473383)

    Source code is meant to be read by a compiler. Comments are not code; they're documentation ignored by the compiler. By your standards, anything that makes source code human-readable should be counted as source code, including white space or even external documentation files!

  • Re:Meh (Score:4, Interesting)

    by Dutch Gun ( 899105 ) on Wednesday October 22, 2008 @05:28PM (#25474801)

    That reminds me of a story about my early programming attempts:

    My first computer was an Apple II+, and I learned AppleBASIC from a book that appeared to be written to teach kids how to program*. I was writing a graphical maze-crawler fantasy game (a bit like Wizardry, but much more primitive, of course). I knew nothing of data-driven programming, of course. Everything was hard-coded, every room a function, etc. AppleBASIC used line numbers, of course, and in laying out the dungeon, I started incrementing rooms by 1000 to make sure I had enough space.

    Sure enough, I ran into a strange issue when I tried to create a room at line number 66000. Through trial and error, I eventually determined that the maximum line number was 65535. I couldn't figure out why they would use such a crazy number as the maximum limit.

    Years later, when learning about the binary nature of computers, I saw that number again, and *click*. So, I'm not sure if 640K lines are enough, but 64K lines certainly were not for me!

    * If anyone remembers what the name of that book was, I'd be in your debt. I think it had a red cover, and it had great little illustrations of a robot that made it very kid-friendly. That book launched me on my current career path. I now program games for a living, and would love to find an old copy.

  • Re:"Actual" code? (Score:2, Interesting)

    by joeman3429 ( 1288786 ) on Wednesday October 22, 2008 @06:43PM (#25475745)
    then you might as well take the lines of assembled code as the real count.
  • Re:Lines of Code (Score:3, Interesting)

    by smellotron ( 1039250 ) on Wednesday October 22, 2008 @07:52PM (#25476523)
    Check out cyclomatic complexity [wikipedia.org]. It basically measures the number of different execution paths you can go through in a given function. It's not quite what you're looking at, but it's close. It's also closely related to the nesting depth of conditionals/loops, which is a good way to eyeball conceptual "size".
  • by symbolset ( 646467 ) on Wednesday October 22, 2008 @11:49PM (#25478179) Journal

    /* 3k lines of workaround for 8 lines of code. WTF were they thinking? */

    //This might work.

    //Blocks undocumented interface used only by WordPerfect.

    //Passes test. Ship it. I'm done. <Allchin>

Say "twenty-three-skiddoo" to logout.

Working...