Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Operating Systems Programming Linux

The Linux Kernel Has Grown By 225,000 Lines of Code This Year, With Contributions From About 3,300 Developers (phoronix.com) 88

Here's an analysis of the Linux kernel repository that attempts to find some fresh numbers on the current kernel development trends. He writes: The kernel repository is at 782,487 commits in total from around 19.009 different authors. The repository is made up of 61,725 files and from there around 25,584,633 lines -- keep in mind there is also documentation, Kconfig build files, various helpers/utilities, etc. So far this year there has been 49,647 commits that added 2,229,836 lines of code while dropping 2,004,759 lines of code. Or a net gain of just 225,077 lines. Keep in mind there was the removal of some old CPU architectures and other code removed in kernels this year so while a lot of new functionality was added, thanks to some cleaning, the kernel didn't bloat up as much as one might have otherwise expected. In 2017 there were 80,603 commits with 3,911,061 additions and 1,385,507 deletions. Given just over one quarter to go, on a commit and line count 2018 might come in lower than the two previous years.

Linus Torvalds remains the most frequent committer at just over 3% while the other top contributions to the kernel this year are the usual suspects: David S. Miller, Arnd Bergmann, Colin Ian King, Chris Wilson, and Christoph Hellwig. So far in 2018 there were commits from 3,320 different email addresses. This is actually significantly lower than in previous years.

This discussion has been archived. No new comments can be posted.

The Linux Kernel Has Grown By 225,000 Lines of Code This Year, With Contributions From About 3,300 Developers

Comments Filter:
  • This doesn't seem surprising, given that new architecture comes out all the time. What's great is that this amazing piece of work is still FLOSS, powering my Macbook as I type this comment.

  • by SCVonSteroids ( 2816091 ) on Sunday September 16, 2018 @12:26PM (#57323744)

    Why do we measure in lines of code? Serious question.

    • What's a good alternative? Serious answer.

      Function points?

      At least LOC gives us a rough estimate of how much work has been done writing source code. Of course it may be bad code, or completely wasted, and of course it must be calibrated by the language used.

      The problem IMHO is not the use of LOC but the foolish assumptions some people make based on that metric.

      • The question isn't Why measure in Lines of Code, but why not put those numbers in a context. It is necessary to understand how LOC increase does, and more importantly perhaps, doesn't increase overall complexity, vulnerability / attack surface, etc. I don't think there is any intention to deceive here ... I just think the author forgot how much they know that the target audience doesn't.
      • by arth1 ( 260657 )

        What's a good alternative? Serious answer.

        Function points?

        At least LOC gives us a rough estimate of how much work has been done writing source code.

        Z factor is a better one - the size of the code after being fed through compression (like compress, giving .Z files). Large amounts of extra spacing or unnecessary line breaks won't be factored in, while code originality gives a higher score than copy/paste jobs.

      • Re: Lines of code (Score:5, Informative)

        by jd ( 1658 ) <imipakNO@SPAMyahoo.com> on Sunday September 16, 2018 @03:09PM (#57324334) Homepage Journal

        Lines of code tells you how much work was put in.

        The ratio of lines of code to code blocks tells you how maintainable the code is.

        Defect density tells you the quality of the code.

        A triple of these would give you a reasonable analysis.

    • by Anonymous Coward

      Because we are lacking better ways to measure code.

      Sure, lines of code doesn't have to be a good thing and neither does binary size, but those are easy to quantify.
      Would you rather that we used electro-psychometry to measure it?

    • Re:Lines of code (Score:5, Insightful)

      by ShanghaiBill ( 739463 ) on Sunday September 16, 2018 @12:43PM (#57323800)

      Why do we measure in lines of code? Serious question.

      LOC is an important metric, for quantifying both progress and complexity.

      The mistake is assuming that more is better.

      • Re:Lines of code (Score:4, Insightful)

        by fahrbot-bot ( 874524 ) on Sunday September 16, 2018 @01:23PM (#57323940)

        Why do we measure in lines of code? Serious question.

        LOC is an important metric, for quantifying both progress and complexity.

        And yet, LOC doesn't necessarily quantify either progress or complexity.

      • by gTsiros ( 205624 )

        considering that the best code is the one you don't write, i'd say LOC is a horrible, horrible metric.

        at work, our main product is roughly 2 million LOC (not counting blank lines, lines with only comments, preprocessor statements...)

        considering that so far any class i've tried to simplify was down by *at least* 95% (not exaggerating, in some cases i straight up *removed* classes because they were duplicated), i'd say the actual code is less than 100k. Possibly on the order of 10k.

    • by Kjella ( 173770 )

      Why do we measure in lines of code? Serious question.

      Lack of a better metric? Though at this level of abstraction I think classifying a project as small = 1 kLOC, medium = 10 kLOC, big = 100 kLOC and huge = 1000 kLOC project works just fine. I would think the number of maintainers you need scales pretty linearly with LOC. But it doesn't mean it's a useful measure of productivity....

    • ... Why do we measure in lines of code? Serious question. ...

      Some metrics have as their base the number of lines of code. In and of itself, LoC is pretty meaningless. Other views which use LoC, however, can be quite useful.

    • Why do we measure in lines of code? Serious question.

      I would like to know a decent alternative. For the past 20 years whenever I was involved in a contractual transfer of intellectual property of software I would invariably get asked "how many lines of code?"

      Generally the question is not too hard to answer as long as you don't get too picky. After all we are talking about a "find" command piped into "wc" on various checked out directory trees. Who knows what percentage of that actual compile-active source code versus everything else. The legal and ac

    • by AHuxley ( 892839 )
      The idea was for small amount of code to do DOTADIW, or "Do One Thing and Do It Well."
      That would see a lot of really well understood code working hard to make a really great OS.
    • Because that's what diff measures.

  • LOC != kernel bloat (Score:5, Informative)

    by Zero__Kelvin ( 151819 ) on Sunday September 16, 2018 @12:26PM (#57323746) Homepage
    An important caveat here is that increase in LOC count does not mean a linear increase in loaded kernel memory usage. For example, for every new driver each line of code is counted, but that driver may or may not compiled in or loaded as a module. If a driver for wireless card X is 1200 lines of code, but your system doesn't have that card and it was compiled as a module, then zero of those added lines of code generated machine code get loaded at runtime .

    There are more than 1000 .config options, and over 30 supported hardware architectures, so your code mileage *will* vary.
    • by Anonymous Coward

      The kernel is bloat. Most of the new code should be outside and indenpent of of the kernel. Look at ZFS it is outside of the kernel and proves that NO file system code is needed in the kernel except for boot file system so the kernel can load first needed modules like a FS so rest will follow.

      Break the kernel up and we can get to the point of true replace parts and modules. Even having 2 or more of the âoesameâ FS so you do not have to do Big Bang upgrades.

    • the Linux kernel is a mature piece of code. The amount of changes to core architecture should be limited to planned events and take years to mature to the point of inclusion. What we are looking at is just the amount of interest in making sure that the Linux kernel is being maintained at a prodigious rate so that we can wake up knowing that our bugs are being fixed and our security patches are on time. If there are people out there thinking that any amount of "lines of code" are some kind of development
  • by Anonymous Coward

    Never had the critical mass to win over desktop users (real workstation desktops, not hobbyists). Now it's just a bunch of change for the sake of change, plus a ton of meltdown/spectre bloat.

    It's too bad, it could have been a contender had the chips fallen their way and they had make a few critical decisions differently. Now with the systemd malware being stuffed down everyone's throat, it's assuredly game over.

  • by Anonymous Coward

    If/When you look at the linux embedded scene, you'll notice that there is way too much duplication of functionality going on.
    Then you have to look at the linux embedded kernel forks which haven't been integrated into the mainstream kernel, and you find even more duplication going on.
    It is horrible, and it is getting worse every day.

    Also, the newly integrated Cryptocoin miner for the x86 SMM EC is a real LOC hog, but that's what you get when you need to hide its functionality.

  • by Anonymous Coward

    Is all written in C by men? That's horrible!

  • by QuietLagoon ( 813062 ) on Sunday September 16, 2018 @01:11PM (#57323886)
    What's the rate of bug occurrence per 10k LoC in the Linux kernel? I'm less concerned about additional kernel bloat than I am about additional kernel bugs.
    • Re: (Score:3, Informative)

      by Anonymous Coward

      For example, this: https://scan.coverity.com/proj... [coverity.com]

      They state 0.45 defects/kLOC. Of course, they won't "find" all defects... and there might be some false positives in there. But you get the ballpark.

      Use your favourite search engine (hopefully not Google and its ilk).

      Kids, these days. When I was young, I queried Altavista with telnet. Tsk, tsk.

      • They state 0.45 defects/kLOC. Of course, they won't "find" all defects...

        They won't find most defects.

        But you get the ballpark

        Not really because the kernel developers know how to avoid the kind of bugs Coverity scans for (Coverity has been haranguing them over it for nearly two decades now).

    • If kernel code is like most code, and it probably is, there is about 1 bug per line of code. So 10,000 or so.

  • by Gravis Zero ( 934156 ) on Sunday September 16, 2018 @04:12PM (#57324552)

    What really matters here is if we are talking about if this is a lot of code that has a direct effect on the functionality of all kernels or if this is really about code for specific kernel drivers. Last I read, the kernel core was like 2M LOC while kernel drivers made up 31M LOC.

  • is Sievers still banned?

  • by aberglas ( 991072 ) on Sunday September 16, 2018 @07:12PM (#57325076)

    If those statistics are really true. Over 200,000 NEW lines in one year in an existing, complex system is as unmanageable as 3000 contributers. As products age, the rate of additions goes down as things have to be integrated into a complex system.

    Sure, Linux is not actually as monolithic as described. But a bug in any one of those lines could bring down the whole kernel.

    It is a credit to skill of the maintainers that they can make this work. And a debit that they try.

    • You're one of those people that compiles kernels with every available option enabled, regardless of the use case, aren't you?
    • But a bug in any one of those lines could bring down the whole kernel.

      Most of it is drivers, and most of those drivers are for devices not running on your computer, so if there is a bug, it will be in a code path that is impossible to reach on your system (are you using JFS?). The core kernel is a lot smaller.

  • Linus Torwalds commits 5 commits per day

  • As a person who's been doing UNIX since 1984, ATT SVR3.2 was my favorite, although I've used other variants. I'm tired of Linux crap. I'm tired of systemd and the systemd wars. I'm tired of having to learn nuanced differences in various distros just to do basic, common, tasks. I'm tired of package repositories that suck when it comes to good maintenance (although they are still better than rpm hell). I'm tired of half-baked security measures that are badly designed and beyond human understanding. (IMO, admi

  • And how many lines did they removed? :)
  • 25 million lines of code is inexcusable at any level.

    The amount of code required to make Linux work is not even a million. Let's assume you can get every feature of interest into a LOT less than that and then depend on modules for everything else.

    Consider that the Linux Kernel as it stands today is one massive repository of trash on trash on trash on something less trashy.

    Linus mad Linux and he made Git and Git has things like submodules and there are things like GVFS as well in some environments. Why not s
  • The summary seems to indicate that part of those changes are in documentation, which is not code.

Keep up the good work! But please don't ask me to help.

Working...