Linux Kernel Surpasses 10 Million Lines of Code 432
javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
Isn't that normal? (Score:4, Interesting)
--
Oh Well, Bad Karma and all . . .
Re:Isn't that normal? (Score:5, Interesting)
Yes, but it can go down with optimizations and refactoring (finding duplicated code and pushing it into a function or macro, for example) and with eliminating dead code. Ideally, code size should be asymptotic to an optimal size. As you approach the optimal size, more and more of what you need to do is already available to you. As you approach the limit, the amount of special-case logic and hardcoding approaches zero, and the amount of data-driven logic approaches 100%. Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs. Simulating a computer is always going to be slower than actually using the real computer directly. In most cases, this is considered "acceptable" because your virtual machine is simply too advanced for any physical hardware to support at this time. (There is also the consideration of code changes, but as you approach the limit, your changes will largely be to the data and not to the codebase. At the limit, you will change the codebase only when changing the hardware, so if you could hardwire the code, it would not impact maintenance at all. All the maintenance you could want to do would be at the data level, given this level of abstraction.)
Linux is clearly nowhere near the point of being that abstract, although some components are probably getting close. It would be interesting to see, even if it could only be done by simulation, what would happen if you moved Linux' VMM into an enlarged MMU, or what would happen if an intelligent hard drive supported Linux' current filesystem selection and parts of the VFS layer. Not as software running on a CPU, but as actual hard-wired logic. Software is just a simulation of wiring, so you can logically always reverse the process. Given that Linux has a decent chunk of the server market, and the server market is less concerned with cost as it is with high performance, high reliability and minimal physical space, it is possible (unlikely but possible) that there will eventually be lines of servers that use chips specially designed to accelerate Linux by this method.
Re: (Score:3, Insightful)
...still, we should think about adding Asimov's three laws before we reach such an event horizon, no?
Re: (Score:3, Funny)
Well! FOR.GIVE.ME for not having read your previous one sentence interpretaion of an article based on someones opinion of a piece of literature which was authored based on a decades old view of technology. We should all now proceed to read all of AnyoneEBs comments and be enlightened by his genius insights into our world.
Re: (Score:3, Insightful)
``Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs.''
I don't see that. Not all abstraction makes things slower. In many cases, abstraction lets you write code at a higher level, while still compiling down to the code you would have written if working at a lower level.
Re: so freaking what? (Score:5, Funny)
the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler.
Personally I thought the news was that no one knows what 0.3% of the linux kernel is written in. THAT'S news! (I'm betting it's BASIC).
Re: so freaking what? (Score:5, Funny)
It's COBOL, that crap is still just everywhere.
Selected comments from the Windows Kernel (Score:3, Interesting)
/* 3k lines of workaround for 8 lines of code. WTF were they thinking? */
//This might work.
//Blocks undocumented interface used only by WordPerfect.
//Passes test. Ship it. I'm done. <Allchin>
Core functions vs Drivers? (Score:4, Interesting)
And how much of this lines are for core functions (Memory Managements, Scheduler, etc) and for drivers (USB, Filesystem)
Re:Can this be converted into kloc ? (Score:5, Funny)
You could try:
DIVIDE SLOC BY 1000 GIVING KLOC.
Meh (Score:4, Funny)
AND???
In other news, trees tend to grow up unless they tend to grow down or sideways. Sharks tend to eat anything they can, unless they are not hungry.
Anonymous will beat me to FP for sure, unless they dont.
Re:Meh (Score:5, Funny)
Yeah so!? Cars are also getting bigger and more complex over time, so Linux must be heading in the right direction!
Did I just... ? Oh sh-
Re: (Score:2)
What happened is some bored tech author didn't have anything to write about, so they decided to do a Git checkout and count the lines of the Linux kernel, which would likely be over 10 million lines at this point if you include blank lines, comments, and text files. It's a completely meaningless story, especially because of the fact that actual code is almost 6.5 million lines, but it got them a Slashdot post and some ad views on their site.
"Actual" code? (Score:5, Insightful)
Comments are also code.
If you only count as code what can be feed to the machine, you should look at the size of the compiled binary. Source code is meant to be read by *humans*, so comments do count. That's why the GPL requires them to be left in the files (the "preferred form" to edit), otherwise it wouldn't be source code.
Re: (Score:3, Interesting)
Source code is meant to be read by a compiler. Comments are not code; they're documentation ignored by the compiler. By your standards, anything that makes source code human-readable should be counted as source code, including white space or even external documentation files!
Re: (Score:3, Insightful)
I don't really care much about theoretical programming paradigms. "Code" refers to the instruction statements written in a programming language for a compiler to interpret, not the comments written off to the side that the compiler ignores.
Re:Meh (Score:4, Interesting)
That reminds me of a story about my early programming attempts:
My first computer was an Apple II+, and I learned AppleBASIC from a book that appeared to be written to teach kids how to program*. I was writing a graphical maze-crawler fantasy game (a bit like Wizardry, but much more primitive, of course). I knew nothing of data-driven programming, of course. Everything was hard-coded, every room a function, etc. AppleBASIC used line numbers, of course, and in laying out the dungeon, I started incrementing rooms by 1000 to make sure I had enough space.
Sure enough, I ran into a strange issue when I tried to create a room at line number 66000. Through trial and error, I eventually determined that the maximum line number was 65535. I couldn't figure out why they would use such a crazy number as the maximum limit.
Years later, when learning about the binary nature of computers, I saw that number again, and *click*. So, I'm not sure if 640K lines are enough, but 64K lines certainly were not for me!
* If anyone remembers what the name of that book was, I'd be in your debt. I think it had a red cover, and it had great little illustrations of a robot that made it very kid-friendly. That book launched me on my current career path. I now program games for a living, and would love to find an old copy.
Comment removed (Score:5, Funny)
Re:Stolen code (Score:5, Funny)
Re:Stolen code (Score:5, Funny)
only in the Debian version
Re:Stolen code (Score:5, Funny)
Take one down, pass it around, 9,999,998 lines of code from SCO
assembler? (Score:5, Informative)
"assembler" is the tool, not the language.
Re: (Score:2)
As long as you're getting all usage Nazi, it's "assembly language", 'cause "assembly" is an adjective. But in informal usage, it's OK to leave off the noun and use the adjective as a noun. (I prefer to say "noun the adjective" just to piss off POS Nazis.) And as for confusing the language with the tool: WTFC? This is Slashdot, where lose lips sink looser ships!
Re: (Score:2)
Sure, you are right, but that has nothing to do with the softness of my toilet paper. "Assembly" is a proper noun, specifying a specific language. "Assembler" is a generic noun, indicating any number of tools that can convert "Assembly" source code into compiled machine code. Both are nouns, regardless.
Re:assembler? (Score:5, Informative)
Re: (Score:3, Funny)
I agree. That's why I always write "anal-retentive" as a single word, with a hyphen.
Re: (Score:2)
Actually, it's the people who consider themselves English wonks that get all bent out of shape when you verb nouns and noun verbs.
Re:assembler? (Score:5, Funny)
I realize English is hard for you, but you can usually use verbs as nouns, and nouns as verbs.
It's better if you don't. Verbing weirds language.
Re:assembler? (Score:5, Funny)
Sure it is, why, I was assembly some assembler code just the other day. I was using my assemble to do it.
Re: (Score:2)
Agreed. for what it's worth I meant to say Assemble. It's easy to get words switched around if they are typed nearly the same and you have been typing on form repeatedly for most of your life. C'est la vie
What did sloccount say the kernel was worth? (Score:3, Insightful)
Because we'd all like to know how many man-months something a big as the linux kernel should take to implement. And laugh at the huge price tag sloccount will put on it.
Re:What did sloccount say the kernel was worth? (Score:5, Informative)
Ohloh has a COCOMO calculator, which spits out ~$181M if you pay coders $55,000 a year.
http://www.ohloh.net/projects/linux [ohloh.net]
http://en.wikipedia.org/wiki/COCOMO [wikipedia.org]
Reply from actual kernel developer please . . . (Score:5, Interesting)
I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
Re:Reply from actual kernel developer please . . . (Score:5, Funny)
Almost completely bug free? What are you smoking?
Re: (Score:2, Interesting)
Re: (Score:2)
Re:Reply from actual kernel developer please . . . (Score:5, Funny)
>>There are literally thousands of men runnning the code on even more setups regularly
Plus upwards of 7 women!
Re:Reply from actual kernel developer please . . . (Score:2)
Oh, was this a troll? I'm sorry. Seriously though, I believe the strategy for testing is 'if you make a change, you are responsible for making sure it works'. Most of the code is driver code, which means it is modularized, so a change in one place won't break something in a different place. The core code is actually significantly smaller. It does run on 4MB devices, after all.
Re:Reply from actual kernel developer please . . . (Score:4, Interesting)
From what I've gather, pretty damn near "all of the above". One of the nicer things about being a high-profile open source tool is that a lot of people are interested in researching automated code analysis on it, be it unit testing, regression testing, static analysis, dynamic analysis or whatever. And having a quality nazi on top helps. Here's what happened a few days ago on the dri-devel list from Linus:
"Grr.
This whole merge series has been full of people sending me UNTESTED CRAP.
So what's the excuse _this_ time for adding all these stupid warnings to
my build log? Did nobody test this? (...)"
In many places, you can do a pretty lousy job and still collect a paycheck. Something tells me you won't get many patches in the kernel that way.
Re: (Score:3, Funny)
I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
Yes, that would be amazing.
Re: (Score:2, Interesting)
Re:Reply from actual kernel developer please . . . (Score:5, Insightful)
I'm a developer and was wondering what kind of testing is done to verify the code.
Guinea pigs. Millions of us.
Line Count Not Always a Good Thing? (Score:5, Interesting)
Re:Line Count Not Always a Good Thing? (Score:4, Insightful)
While Linux is huge, for a backdoor to be successful it would need to hit a huge number of systems. The majority of the kernel at this point tends to be drivers, not all of which are used in a given kernel.
For it to be even remotely worthwhile, it'd have to be placed into something that was both heavily used AND given little attention. These two positions are almost mutually exclusive.
Can anyone think of a place that would fall into these two categories? Even the more seemingly obscure parts of the kernel get attention fairly often and malicious changes wouldn't go unnoticed for long.
Happy Ten Million, Linux! (Score:5, Funny)
Re: (Score:2)
The cake is a LIE!
Re:Happy Ten Million, Linux! (Score:5, Funny)
Now, where do we find a birthday cake with ten million candles?
At John McCain's Birthday Party?
What about the other .3% ? (Score:5, Funny)
96,4% of them developed in C, and 3,3% using assembler
That leaves .3% that is unaccounted for. What was it written in?
Re:What about the other .3% ? (Score:5, Funny)
Visual Basic 6.
Re: (Score:2, Funny)
Re:What about the other .3% ? (Score:4, Insightful)
Re: (Score:2, Funny)
That leaves .3% that is unaccounted for. What was it written in?
Asgard?
Re: (Score:2)
Re: (Score:3, Informative)
From Wikipedia:
Programming languages
Linux is written in the version of the C programming language supported by GCC (which has introduced a number of extensions and changes to standard C), together with a number of short sections of code written in the assembly language (in GCC's "AT&T-style" syntax) of the target architecture. Because of the extensions to C it supports, GCC was for a long time the only compiler capable of correctly building Linux. In 2004, Intel claimed to have modified the kernel so that its C compiler also was capable of compiling it.[24]
Many other languages are used in some way, primarily in connection with the kernel build process (the methods whereby the bootable image is created from the sources). These include Perl, Python, and various shell scripting languages. Some drivers may also be written in C++, Fortran, or other languages, but this is strongly discouraged. Linux's build system only officially supports GCC as a kernel and driver compiler.
So I am assuming that the answer is Perl, Python, various scripting languages, and Fortran
Re: (Score:2)
Rewrite it in Lisp, that way it'll work in emacs.
Then re-write emacs in Javascript so that it'll work as a Firefox extension.
Re: (Score:2)
And then run FF on Linux.
WARNING INFINITE RECURSION DETECTED!!!
P.S. I hate the lameness filter.
Micro-kernel vs massive kernel? (Score:3, Interesting)
May I suggest that large parts of this shouldn't be in the kernel at all? That there should be independent sub-systems so that in the event of a crash or panic, the entire OS doesn't come tumbling down?
So that badly written drivers (especially graphic card drivers) don't affect the stability of the entire system?
May I suggest that flame-wars are good and the EMACS is also bloated?
(And lots of other folks have already talked about the bad metric that lines of code is...)
Re: (Score:3, Insightful)
Re:Micro-kernel vs massive kernel? (Score:5, Funny)
Tanenbaum, is that you? If so, give it up! It's been 16 years and you're not fooling anybody!
Obvious (Score:2)
This raises the question - will Linus run out of magic powder?
Re: (Score:2)
No, he uses Torgo's Executive Powder [wikipedia.org]. There's a near-infinite supply of that.
I Wonder? (Score:4, Interesting)
I wonder what the breakdown is of the almost 4 million lines that were omitted in the count, for blank lines, comments, etc.? I've always said that commenting your code is a very good thing to do, so it would be interesting to see what the percentage of this is comments, as opposed to blank lines (which isn't a bad thing for readability).
This story (Score:3, Funny)
Basically, this story is "Linux kernel surpasses 10 million lines of code! Just kidding."
Re: (Score:2)
Headline: Linux Kernel Surpasses 10 Million Lines of Code
Summary: Actually, it's just over 6 Million
Lines of code as a metric (Score:5, Insightful)
Re: (Score:2)
Rating programmers on Lines of Code in a commercial environment seems like a way to promote bloat and slow, inefficient code.
Re: (Score:2)
More LoC does not always mean less-efficient code. For example, in Java using Apache Commons' Logger:
log.debug();
Is slower than:
if (log.isDebugEnabled()) {
log.debug()
}
http://twit88.com/blog/2008/03/07/why-you-should-use-isdebugenabled/ [twit88.com]
the straw that breaks... (Score:2)
Sorry everyone, that was me! Silly push %ebp ... Apologies to all...
Not very informative. (Score:5, Funny)
This article summary is not very informative. The very least they could do is tell us which ten million lines of code Linux has surpassed.
Re:Lines of Code (Score:5, Interesting)
I used to have GEOS on my Commodore 64. I have absolutely no idea how many lines of code it used, but it could squeeze itself into just 20 kilobytes of RAM, and yet had lots of functionality (as good as an 80s-era Mac). I consider "how much RAM occupied" to be a FAR more useful metric.
I would love to see someone develop an OS that followed a similar philosophy of using as little RAM as possible.
Re:Lines of Code (Score:5, Insightful)
Re: (Score:3, Insightful)
No but a modern PC running windows uses 1000 times more RAM than GEOS Commodore 64, but doesn't really do anything extra. The OS needs to go on a diet.
GEOS supported thousands of printers, hundreds of hard drive adapters, hundreds of video cards, streaming network video, 3d gaming, virtual memory, several CPU vendors, hundreds of mice, and all that in 20KB of memory? Impressive!
Less sarcastic answer: modern computers do a whole awful lot more than GEOS did.
Re:Lines of Code (Score:5, Funny)
Exactly. The better metric would be how many Libraries of Congress the kernal is.
Not as much as you'd think (Score:4, Informative)
Re: (Score:2)
Lines of Code
Libraries of Congress
What's the difference?
Re:Lines of Code (Score:5, Insightful)
A thousand Unix System 6 kernels. (Score:5, Interesting)
The better metric would be how many Libraries of Congress the kernal is.
Perhaps better would be number of times the size of the Unix System 6 kernel.
That's the one that the University of Waterloo printed as a textbook, half of a two book set. (The other book was the OS course text using it as the example.) They printed it at 50 lines per page column and added (lots of) whitespace and adjusted comments so routines fell on nice page boundaries. Even padded this way it came out to a total of ten thousand lines (of which I think 2 thousand were still in assembly code). Just right for one person to maintain full-time by the then-current rule-of-thumb.
So the linux kernel is a thousand times the size of that (whitespace-padded) version of the Unix kernel.
Re: (Score:2)
Lines of code is not a good metric for performance.
True, but it is a good indication of bloat. Ten million lines of code at 100 characters per line is a gig (unless I got decimal places wrong); that's a lot of source. maybe somebody should be working to pare it down some?
Re: (Score:2)
Sure, but you'll find very few 100-character lines in Linux. The average line is maybe ten or fifteen characters.
Re: (Score:3, Funny)
>at 100 characters per line
No no, you are thinking of Java. Linux is written in C
Re:Lines of Code (Score:5, Interesting)
is the same length as this...
Re: (Score:3, Interesting)
Re:Lines of Code (Score:5, Funny)
I'm in a software engineering class listening to how to use metrics on code.
No, you're in a software engineering class posting on Slashdot.
Re:Lines of Code (Score:5, Funny)
I'm in a software engineering class listening to how to use metrics on code.
No, you're in a software engineering class posting on Slashdot.
You are likely to be eaten by a GNU.
Re: (Score:2)
Function Point Analysis (Score:3, Interesting)
http://en.wikipedia.org/wiki/Function_point [wikipedia.org]
http://www.softwaremetrics.com/fpafund.html [softwaremetrics.com]
the lines of code metric has long been considered an inadequate measure of software cost, complexity, or size - here is an article on why:
http://www.creativyst.com/Doc/Articles/Mgt/LOCMonster/LOCMonster.htm [creativyst.com]
but LOC is without question one of the easiest measuremen
Re: (Score:3, Insightful)
i believe a more appropriate measure of the 'bloat' (i.e. useless functions) or the size of any software package is through function point analysis
I recall many years ago, a PHB (this is long enough ago that nobody called them that yet) was talking about developer productivity metrics; he announced that the powers that be were considering either KLoC or Function Points. The guy sitting next to me said "I have no idea what function points are, but they've got to be better than KLoC". The remark made one of those wonderful whooshing sounds as it sailed straight over the PHB's head...
LOC is without question one of the easiest measurement (aside from total package size in bytes, which is nearly as uninformative)
+1 - Fundamental Law Of Physics.
LoC's only redeeming feature as a metri
Re:Um (Score:5, Informative)
Yeah but you can customize the Linux kernel. If you don't want features, just don't compile them in.
It's easy, there's even a gui interface.
Good luck compiling a custom NT kernel. :)
Re: (Score:3, Interesting)
Kernel Modules (Score:3, Informative)
If you're actually serious, (sarcasm is kind of hard to detect in plain text): man modprobe. Since Linux 2.0.
Re:Kernel Modules (Score:4, Informative)
Uh, you don't compile modules. The distribution vendor does.
If you want a stable kernel module ABI, that only matters for binary-only modules (which are a bad idea). See vmware for how source-distributed modules can work fairly painlessly.
What are you talking about?
Most vendors compile generic kernels with just about all functionality put into kernel modules. What more do you want than modprobe, rmmod? Pretty buttons?
If you want a micro-kernel, go use QNX, hack on herd, or watch as Linux slowly steps in that direction. Maybe read some of the various flame wars on the topic and consider why herd hasn't made any significant progress in 15 years.
Yeah...[/sarcasm]
Re: (Score:3, Informative)
In addition there is also ksplice, to swap the actual kernel too.
Re: (Score:3, Funny)
15. The Residents - Not Available [wikipedia.org]
If Obama is missing that record, I'd be glad to lend him my copy.
Re: (Score:2)
It's called a lameness filter because it's pretty lame. Try pasting the definition of a word from reference.com, or the lyrics to a longish song. Or a joke that relys on caps to be funny.
The lame mess filter won't let you.
Re: (Score:3, Informative)
Re: (Score:3, Interesting)
Kernel only or included soft included? Because if you count included soft, it doesn't make it a fair comparison. And note that the whole graphic subsystem is included in there also, so add X11 to the lot... but whatever, comparing the number of lines of code is akin to comparing the number of bolts in a car.
it's interesting information nonetheless. Divide the number of bugs by the number of LoC and you get a better-than-industry ratio in both cases. Which says a lot.
Kernel or Kernel + Userspace? (Score:2)
Remember, the 10M lines is just the kernel in Linux, not an entire distro (ie: kernel + GNU stuff + X + apps + all the other stuff), so a total count of Windows LOC would be comparing apples and oranges.
IE: How many LOC are in NTOSKRNL + Drivers would be a better comparison.
Re: (Score:2, Informative)
Ship Date Product Dev Team Size Test Team Size Lines of code (LoC)
Jul-93 NT 1.0 (released as 3.1) 200 140 4-5 million
Sep-94 NT 2.0 (released as 3.5) 300 230 7-8 million
May-95 NT 3.0 (released as 3.51) 450 325 9-10 million
Jul-96 NT 4.0 (released as 4.0) 800 700 11-12 million
Dec-99 NT 5.0 (Windows 2000) 1,400 1,700 29+ million
Oct-01 NT 5.1 (Windows XP) 1,800 2,200 40 million
Apr-03 NT 5.2 (Window
Re: (Score:2)
The implied point of your original comment is that Windows is a closed-source, proprietary system. You're presumably an opponent of MS, which is perfectly fine; I totally get it. You presumably have philosophical objections to MS. Again, totally understandable. But the story is about the linux kernel, not MS. It took all of, what, 10-15 comments -- including the usual "first!" post crap -- for there to be some sort of anti-MS comment posted. If the story is about MS or something that MS somehow affects, then go ahead and rip into them. But it's a little tiring to see these MS digs in stories that, when you get right down to it, are about something else entirely.
Phew. You sure did a read a lot into one little sentence. Deep breath in... hold it... let it out. Relax. Deep breath in...
Re: (Score:2)