Slashdot Log In
Linux Kernel Surpasses 10 Million Lines of Code
Posted by
timothy
on Wed Oct 22, 2008 12:32 PM
from the nice-round-figures dept.
from the nice-round-figures dept.
javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Isn't that normal? (Score:4, Interesting)
--
Oh Well, Bad Karma and all . . .
Re:Isn't that normal? (Score:5, Interesting)
Yes, but it can go down with optimizations and refactoring (finding duplicated code and pushing it into a function or macro, for example) and with eliminating dead code. Ideally, code size should be asymptotic to an optimal size. As you approach the optimal size, more and more of what you need to do is already available to you. As you approach the limit, the amount of special-case logic and hardcoding approaches zero, and the amount of data-driven logic approaches 100%. Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs. Simulating a computer is always going to be slower than actually using the real computer directly. In most cases, this is considered "acceptable" because your virtual machine is simply too advanced for any physical hardware to support at this time. (There is also the consideration of code changes, but as you approach the limit, your changes will largely be to the data and not to the codebase. At the limit, you will change the codebase only when changing the hardware, so if you could hardwire the code, it would not impact maintenance at all. All the maintenance you could want to do would be at the data level, given this level of abstraction.)
Linux is clearly nowhere near the point of being that abstract, although some components are probably getting close. It would be interesting to see, even if it could only be done by simulation, what would happen if you moved Linux' VMM into an enlarged MMU, or what would happen if an intelligent hard drive supported Linux' current filesystem selection and parts of the VFS layer. Not as software running on a CPU, but as actual hard-wired logic. Software is just a simulation of wiring, so you can logically always reverse the process. Given that Linux has a decent chunk of the server market, and the server market is less concerned with cost as it is with high performance, high reliability and minimal physical space, it is possible (unlikely but possible) that there will eventually be lines of servers that use chips specially designed to accelerate Linux by this method.
Parent
Re: so freaking what? (Score:5, Funny)
the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler.
Personally I thought the news was that no one knows what 0.3% of the linux kernel is written in. THAT'S news! (I'm betting it's BASIC).
Parent
Re: so freaking what? (Score:5, Funny)
It's COBOL, that crap is still just everywhere.
Parent
Core functions vs Drivers? (Score:4, Interesting)
And how much of this lines are for core functions (Memory Managements, Scheduler, etc) and for drivers (USB, Filesystem)
Re:Can this be converted into kloc ? (Score:5, Funny)
You could try:
DIVIDE SLOC BY 1000 GIVING KLOC.
Parent
Meh (Score:4, Funny)
AND???
In other news, trees tend to grow up unless they tend to grow down or sideways. Sharks tend to eat anything they can, unless they are not hungry.
Anonymous will beat me to FP for sure, unless they dont.
Re:Meh (Score:5, Funny)
Yeah so!? Cars are also getting bigger and more complex over time, so Linux must be heading in the right direction!
Did I just... ? Oh sh-
Parent
"Actual" code? (Score:5, Insightful)
Comments are also code.
If you only count as code what can be feed to the machine, you should look at the size of the compiled binary. Source code is meant to be read by *humans*, so comments do count. That's why the GPL requires them to be left in the files (the "preferred form" to edit), otherwise it wouldn't be source code.
Parent
Re:Meh (Score:4, Interesting)
That reminds me of a story about my early programming attempts:
My first computer was an Apple II+, and I learned AppleBASIC from a book that appeared to be written to teach kids how to program*. I was writing a graphical maze-crawler fantasy game (a bit like Wizardry, but much more primitive, of course). I knew nothing of data-driven programming, of course. Everything was hard-coded, every room a function, etc. AppleBASIC used line numbers, of course, and in laying out the dungeon, I started incrementing rooms by 1000 to make sure I had enough space.
Sure enough, I ran into a strange issue when I tried to create a room at line number 66000. Through trial and error, I eventually determined that the maximum line number was 65535. I couldn't figure out why they would use such a crazy number as the maximum limit.
Years later, when learning about the binary nature of computers, I saw that number again, and *click*. So, I'm not sure if 640K lines are enough, but 64K lines certainly were not for me!
* If anyone remembers what the name of that book was, I'd be in your debt. I think it had a red cover, and it had great little illustrations of a robot that made it very kid-friendly. That book launched me on my current career path. I now program games for a living, and would love to find an old copy.
Parent
Stolen code (Score:5, Funny)
Re:Stolen code (Score:5, Funny)
Parent
Re:Stolen code (Score:5, Funny)
only in the Debian version
Parent
Re:Stolen code (Score:5, Funny)
Take one down, pass it around, 9,999,998 lines of code from SCO
Parent
assembler? (Score:5, Informative)
"assembler" is the tool, not the language.
Re:assembler? (Score:5, Funny)
I realize English is hard for you, but you can usually use verbs as nouns, and nouns as verbs.
It's better if you don't. Verbing weirds language.
Parent
Re:assembler? (Score:5, Funny)
Sure it is, why, I was assembly some assembler code just the other day. I was using my assemble to do it.
Parent
Re:assembler? (Score:5, Informative)
Parent
What did sloccount say the kernel was worth? (Score:3, Insightful)
Because we'd all like to know how many man-months something a big as the linux kernel should take to implement. And laugh at the huge price tag sloccount will put on it.
Re:What did sloccount say the kernel was worth? (Score:5, Informative)
Ohloh has a COCOMO calculator, which spits out ~$181M if you pay coders $55,000 a year.
http://www.ohloh.net/projects/linux [ohloh.net]
http://en.wikipedia.org/wiki/COCOMO [wikipedia.org]
Parent
Reply from actual kernel developer please . . . (Score:5, Interesting)
I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
Re:Reply from actual kernel developer please . . . (Score:5, Funny)
Almost completely bug free? What are you smoking?
Parent
Re:Reply from actual kernel developer please . . . (Score:4, Interesting)
From what I've gather, pretty damn near "all of the above". One of the nicer things about being a high-profile open source tool is that a lot of people are interested in researching automated code analysis on it, be it unit testing, regression testing, static analysis, dynamic analysis or whatever. And having a quality nazi on top helps. Here's what happened a few days ago on the dri-devel list from Linus:
"Grr.
This whole merge series has been full of people sending me UNTESTED CRAP.
So what's the excuse _this_ time for adding all these stupid warnings to
my build log? Did nobody test this? (...)"
In many places, you can do a pretty lousy job and still collect a paycheck. Something tells me you won't get many patches in the kernel that way.
Parent
Re:Reply from actual kernel developer please . . . (Score:5, Insightful)
I'm a developer and was wondering what kind of testing is done to verify the code.
Guinea pigs. Millions of us.
Parent
Re:Reply from actual kernel developer please . . . (Score:5, Funny)
>>There are literally thousands of men runnning the code on even more setups regularly
Plus upwards of 7 women!
Parent
Line Count Not Always a Good Thing? (Score:5, Interesting)
Re:Line Count Not Always a Good Thing? (Score:4, Insightful)
While Linux is huge, for a backdoor to be successful it would need to hit a huge number of systems. The majority of the kernel at this point tends to be drivers, not all of which are used in a given kernel.
For it to be even remotely worthwhile, it'd have to be placed into something that was both heavily used AND given little attention. These two positions are almost mutually exclusive.
Can anyone think of a place that would fall into these two categories? Even the more seemingly obscure parts of the kernel get attention fairly often and malicious changes wouldn't go unnoticed for long.
Parent
Happy Ten Million, Linux! (Score:5, Funny)
Re:Happy Ten Million, Linux! (Score:5, Funny)
Now, where do we find a birthday cake with ten million candles?
At John McCain's Birthday Party?
Parent
What about the other .3% ? (Score:5, Funny)
96,4% of them developed in C, and 3,3% using assembler
That leaves .3% that is unaccounted for. What was it written in?
Re:What about the other .3% ? (Score:5, Funny)
Visual Basic 6.
Parent
Re:What about the other .3% ? (Score:4, Insightful)
Parent
Micro-kernel vs massive kernel? (Score:3, Interesting)
May I suggest that large parts of this shouldn't be in the kernel at all? That there should be independent sub-systems so that in the event of a crash or panic, the entire OS doesn't come tumbling down?
So that badly written drivers (especially graphic card drivers) don't affect the stability of the entire system?
May I suggest that flame-wars are good and the EMACS is also bloated?
(And lots of other folks have already talked about the bad metric that lines of code is...)
Re: (Score:3, Insightful)
Re:Micro-kernel vs massive kernel? (Score:5, Funny)
Tanenbaum, is that you? If so, give it up! It's been 16 years and you're not fooling anybody!
Parent
I Wonder? (Score:4, Interesting)
I wonder what the breakdown is of the almost 4 million lines that were omitted in the count, for blank lines, comments, etc.? I've always said that commenting your code is a very good thing to do, so it would be interesting to see what the percentage of this is comments, as opposed to blank lines (which isn't a bad thing for readability).
Lines of code as a metric (Score:5, Insightful)
Not very informative. (Score:5, Funny)
This article summary is not very informative. The very least they could do is tell us which ten million lines of code Linux has surpassed.
Re:Lines of Code (Score:5, Interesting)
I used to have GEOS on my Commodore 64. I have absolutely no idea how many lines of code it used, but it could squeeze itself into just 20 kilobytes of RAM, and yet had lots of functionality (as good as an 80s-era Mac). I consider "how much RAM occupied" to be a FAR more useful metric.
I would love to see someone develop an OS that followed a similar philosophy of using as little RAM as possible.
Parent
Re:Lines of Code (Score:5, Insightful)
Parent
Re:Lines of Code (Score:5, Funny)
Exactly. The better metric would be how many Libraries of Congress the kernal is.
Parent
Not as much as you'd think (Score:4, Informative)
Parent
A thousand Unix System 6 kernels. (Score:5, Interesting)
The better metric would be how many Libraries of Congress the kernal is.
Perhaps better would be number of times the size of the Unix System 6 kernel.
That's the one that the University of Waterloo printed as a textbook, half of a two book set. (The other book was the OS course text using it as the example.) They printed it at 50 lines per page column and added (lots of) whitespace and adjusted comments so routines fell on nice page boundaries. Even padded this way it came out to a total of ten thousand lines (of which I think 2 thousand were still in assembly code). Just right for one person to maintain full-time by the then-current rule-of-thumb.
So the linux kernel is a thousand times the size of that (whitespace-padded) version of the Unix kernel.
Parent
Re:Lines of Code (Score:5, Insightful)
Parent
Re:Lines of Code (Score:5, Funny)
I'm in a software engineering class listening to how to use metrics on code.
No, you're in a software engineering class posting on Slashdot.
Parent
Re:Lines of Code (Score:5, Funny)
I'm in a software engineering class listening to how to use metrics on code.
No, you're in a software engineering class posting on Slashdot.
You are likely to be eaten by a GNU.
Parent
Re:Lines of Code (Score:5, Interesting)
is the same length as this...
Parent
Re:Um (Score:5, Informative)
Yeah but you can customize the Linux kernel. If you don't want features, just don't compile them in.
It's easy, there's even a gui interface.
Good luck compiling a custom NT kernel. :)
Parent
Re:Kernel Modules (Score:4, Informative)
Uh, you don't compile modules. The distribution vendor does.
If you want a stable kernel module ABI, that only matters for binary-only modules (which are a bad idea). See vmware for how source-distributed modules can work fairly painlessly.
What are you talking about?
Most vendors compile generic kernels with just about all functionality put into kernel modules. What more do you want than modprobe, rmmod? Pretty buttons?
If you want a micro-kernel, go use QNX, hack on herd, or watch as Linux slowly steps in that direction. Maybe read some of the various flame wars on the topic and consider why herd hasn't made any significant progress in 15 years.
Yeah...[/sarcasm]
Parent
Re: (Score:3, Funny)
15. The Residents - Not Available [wikipedia.org]
If Obama is missing that record, I'd be glad to lend him my copy.