Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Operating Systems Software Linux

Linux Kernel Surpasses 10 Million Lines of Code 432

Posted by timothy
from the nice-round-figures dept.
javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
This discussion has been archived. No new comments can be posted.

Linux Kernel Surpasses 10 Million Lines of Code

Comments Filter:
  • Isn't that normal? (Score:4, Interesting)

    by arizwebfoot (1228544) * on Wednesday October 22, 2008 @12:34PM (#25471019)
    That the line count increases with each new version unless you are starting from scratch?

    --
    Oh Well, Bad Karma and all . . .
    • by jd (1658) <<moc.oohay> <ta> <kapimi>> on Wednesday October 22, 2008 @01:13PM (#25471721) Homepage Journal

      Yes, but it can go down with optimizations and refactoring (finding duplicated code and pushing it into a function or macro, for example) and with eliminating dead code. Ideally, code size should be asymptotic to an optimal size. As you approach the optimal size, more and more of what you need to do is already available to you. As you approach the limit, the amount of special-case logic and hardcoding approaches zero, and the amount of data-driven logic approaches 100%. Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs. Simulating a computer is always going to be slower than actually using the real computer directly. In most cases, this is considered "acceptable" because your virtual machine is simply too advanced for any physical hardware to support at this time. (There is also the consideration of code changes, but as you approach the limit, your changes will largely be to the data and not to the codebase. At the limit, you will change the codebase only when changing the hardware, so if you could hardwire the code, it would not impact maintenance at all. All the maintenance you could want to do would be at the data level, given this level of abstraction.)

      Linux is clearly nowhere near the point of being that abstract, although some components are probably getting close. It would be interesting to see, even if it could only be done by simulation, what would happen if you moved Linux' VMM into an enlarged MMU, or what would happen if an intelligent hard drive supported Linux' current filesystem selection and parts of the VFS layer. Not as software running on a CPU, but as actual hard-wired logic. Software is just a simulation of wiring, so you can logically always reverse the process. Given that Linux has a decent chunk of the server market, and the server market is less concerned with cost as it is with high performance, high reliability and minimal physical space, it is possible (unlikely but possible) that there will eventually be lines of servers that use chips specially designed to accelerate Linux by this method.

      • Re: (Score:3, Insightful)

        by Abreu (173023)

        ...still, we should think about adding Asimov's three laws before we reach such an event horizon, no?

      • Re: (Score:3, Insightful)

        by RAMMS+EIN (578166)

        ``Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs.''

        I don't see that. Not all abstraction makes things slower. In many cases, abstraction lets you write code at a higher level, while still compiling down to the code you would have written if working at a lower level.

  • by bubulubugoth (896803) on Wednesday October 22, 2008 @12:34PM (#25471025) Homepage

    And how much of this lines are for core functions (Memory Managements, Scheduler, etc) and for drivers (USB, Filesystem)

  • Meh (Score:4, Funny)

    by alexborges (313924) on Wednesday October 22, 2008 @12:35PM (#25471053)

    AND???

    In other news, trees tend to grow up unless they tend to grow down or sideways. Sharks tend to eat anything they can, unless they are not hungry.

    Anonymous will beat me to FP for sure, unless they dont.

    • Re:Meh (Score:5, Funny)

      by V!NCENT (1105021) on Wednesday October 22, 2008 @01:06PM (#25471595)

      Yeah so!? Cars are also getting bigger and more complex over time, so Linux must be heading in the right direction!

      Did I just... ? Oh sh-

    • by bonch (38532)

      What happened is some bored tech author didn't have anything to write about, so they decided to do a Git checkout and count the lines of the Linux kernel, which would likely be over 10 million lines at this point if you include blank lines, comments, and text files. It's a completely meaningless story, especially because of the fact that actual code is almost 6.5 million lines, but it got them a Slashdot post and some ad views on their site.

      • "Actual" code? (Score:5, Insightful)

        by TuringTest (533084) on Wednesday October 22, 2008 @01:51PM (#25472295) Journal

        Comments are also code.

        If you only count as code what can be feed to the machine, you should look at the size of the compiled binary. Source code is meant to be read by *humans*, so comments do count. That's why the GPL requires them to be left in the files (the "preferred form" to edit), otherwise it wouldn't be source code.

        • Re: (Score:3, Interesting)

          by bonch (38532)

          Source code is meant to be read by a compiler. Comments are not code; they're documentation ignored by the compiler. By your standards, anything that makes source code human-readable should be counted as source code, including white space or even external documentation files!

  • Stolen code (Score:5, Funny)

    by CRCulver (715279) <crculver@christopherculver.com> on Wednesday October 22, 2008 @12:35PM (#25471055) Homepage
    Too bad 9,999,999 lines of that code were ripped off from SCO.
  • assembler? (Score:5, Informative)

    by TheRealMindChild (743925) on Wednesday October 22, 2008 @12:37PM (#25471077) Homepage Journal
    *cough*assembly*cough*

    "assembler" is the tool, not the language.
    • by fm6 (162816)

      As long as you're getting all usage Nazi, it's "assembly language", 'cause "assembly" is an adjective. But in informal usage, it's OK to leave off the noun and use the adjective as a noun. (I prefer to say "noun the adjective" just to piss off POS Nazis.) And as for confusing the language with the tool: WTFC? This is Slashdot, where lose lips sink looser ships!

  • by OrangeTide (124937) on Wednesday October 22, 2008 @12:38PM (#25471121) Homepage Journal

    Because we'd all like to know how many man-months something a big as the linux kernel should take to implement. And laugh at the huge price tag sloccount will put on it.

  • by EraserMouseMan (847479) on Wednesday October 22, 2008 @12:40PM (#25471153)
    I'm a developer and was wondering what kind of testing is done to verify the code. Do they use unit testing? Regression testing?

    I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
  • by linuxmeepster (1383107) * on Wednesday October 22, 2008 @12:43PM (#25471193) Homepage
    It's significantly easier to hide a malicious backdoor inside a huge software project than a small one. Linux has already had a near miss [theregister.co.uk] back in 2003, when the CVS repository was compromised. Considering how many mission-critical applications run under Linux, there's a huge financial incentive to hide a backdoor somewhere in those 10 million lines.
    • by Microlith (54737) on Wednesday October 22, 2008 @01:08PM (#25471629)

      While Linux is huge, for a backdoor to be successful it would need to hit a huge number of systems. The majority of the kernel at this point tends to be drivers, not all of which are used in a given kernel.

      For it to be even remotely worthwhile, it'd have to be placed into something that was both heavily used AND given little attention. These two positions are almost mutually exclusive.

      Can anyone think of a place that would fall into these two categories? Even the more seemingly obscure parts of the kernel get attention fairly often and malicious changes wouldn't go unnoticed for long.

  • by Drakkenmensch (1255800) on Wednesday October 22, 2008 @12:44PM (#25471199)
    Now, where do we find a birthday cake with ten million candles?
  • by damn_registrars (1103043) <damn.registrars@gmail.com> on Wednesday October 22, 2008 @12:47PM (#25471235) Homepage Journal

    96,4% of them developed in C, and 3,3% using assembler

    That leaves .3% that is unaccounted for. What was it written in?

    • Re: (Score:2, Funny)

      by jd (1658)
      0.1 was written in APL, and the remaining 0.2% was in SNOBOL.
    • by glavenoid (636808) on Wednesday October 22, 2008 @01:02PM (#25471533) Journal
      Makefiles, build scripts, etc., perhaps?
    • Re: (Score:2, Funny)

      by mx119 (1374607)

      That leaves .3% that is unaccounted for. What was it written in?

      Asgard?

    • It's a GUI interface in Visual Basic to see if they can track an IP address. Yes that does take approximately 3000 lines of code, granted 2982 of them are comments explaining why vb was chosen.
    • Re: (Score:3, Informative)

      by gravis777 (123605)

      From Wikipedia:

      Programming languages
      Linux is written in the version of the C programming language supported by GCC (which has introduced a number of extensions and changes to standard C), together with a number of short sections of code written in the assembly language (in GCC's "AT&T-style" syntax) of the target architecture. Because of the extensions to C it supports, GCC was for a long time the only compiler capable of correctly building Linux. In 2004, Intel claimed to have modified the kernel so that its C compiler also was capable of compiling it.[24]
      Many other languages are used in some way, primarily in connection with the kernel build process (the methods whereby the bootable image is created from the sources). These include Perl, Python, and various shell scripting languages. Some drivers may also be written in C++, Fortran, or other languages, but this is strongly discouraged. Linux's build system only officially supports GCC as a kernel and driver compiler.

      So I am assuming that the answer is Perl, Python, various scripting languages, and Fortran

  • by apathy maybe (922212) on Wednesday October 22, 2008 @12:47PM (#25471245) Homepage Journal

    May I suggest that large parts of this shouldn't be in the kernel at all? That there should be independent sub-systems so that in the event of a crash or panic, the entire OS doesn't come tumbling down?

    So that badly written drivers (especially graphic card drivers) don't affect the stability of the entire system?

    May I suggest that flame-wars are good and the EMACS is also bloated?

    (And lots of other folks have already talked about the bad metric that lines of code is...)

  • This raises the question - will Linus run out of magic powder?

  • I Wonder? (Score:4, Interesting)

    by TheNecromancer (179644) on Wednesday October 22, 2008 @12:56PM (#25471417)

    I wonder what the breakdown is of the almost 4 million lines that were omitted in the count, for blank lines, comments, etc.? I've always said that commenting your code is a very good thing to do, so it would be interesting to see what the percentage of this is comments, as opposed to blank lines (which isn't a bad thing for readability).

  • This story (Score:3, Funny)

    by bonch (38532) on Wednesday October 22, 2008 @01:01PM (#25471503)

    Basically, this story is "Linux kernel surpasses 10 million lines of code! Just kidding."

    • I thought the same thing.
      Headline: Linux Kernel Surpasses 10 Million Lines of Code
      Summary: Actually, it's just over 6 Million
  • by qoncept (599709) on Wednesday October 22, 2008 @01:01PM (#25471511) Homepage
    Funny that the summary calls attention to the fact that the number of lines includes comments and whitespace without any mention of how worthless lines of code is as a metric. Someone could easily go in and add or remove newlines wherever they wanted and without changed a bit of code make it 50 million or 50 thousand.
  • Sorry everyone, that was me! Silly push %ebp ... Apologies to all...

  • by hey! (33014) on Wednesday October 22, 2008 @01:22PM (#25471847) Homepage Journal

    This article summary is not very informative. The very least they could do is tell us which ten million lines of code Linux has surpassed.

"Why should we subsidize intellectual curiosity?" -Ronald Reagan

Working...