Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Linux Software Technology

The Linux Kernel Archives 154

Jeremy Andrews writes "KernelTrap offers an interesting look at the history behind the Linux Kernel Archives, home of the Linux kernel. They start from the beginning in 1997, when kernel.org ran on a generic "white box PC" using a shared T1, to the present where it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet. Much of the article is based on an interview with Peter Anvin, also including quotes from Linus Torvalds, Paul Vixie of Internet Systems Consortium, Inc who donates the bandwidth, and Matt Taggart of Hewlett-Packard who donated the hardware."
This discussion has been archived. No new comments can be posted.

The Linux Kernel Archives

Comments Filter:
  • Hmm... (Score:2, Funny)

    by Anonymous Coward
    Boy what I could do with that and BitTorrent... *rubs palms together*
    • Re:Hmm... (Score:1, Funny)

      by Anonymous Coward
      If you are like the typical, ethical P2P-advocate on Slashdot, you will probably download the free-as-in-beer-and-free-as-in-free-sex-Linux-dist ro-of-the-Day or sharing your newest scientifical discoveries, not stealing movies.

      We all know that.
  • ...especially having dealt with something like this (on a much smaller scale) recently.

    We were having bandwidth limitations on RubyForge [rubyforge.org]; it was getting up to 80 GB per month [rubyforge.org] at the end of 2004. Mirroring out releases helped get usage back down to 15 GB per month. Many thanks to our mirror [rubyforge.org] providers!
  • Good Grief that's a lot of pipe! Saturating a PAIR of gig links? Certainly tends to make one stop and consider how many people are actually USING linux nowadays. Good to see!

    • Meh, it sounds like a lot until you consider the number of people with broadband. For example, 2000 people with 1Mbit connections could saturate a pair of gigabit links.
      • Yes, but for how long?

        With a 1 MBit/s link, you can download 1 GB in roughly 2.5 hours. Thus, 2000 people with 1Mbit connections could saturate a pair of gigabit links for maybe 5 hours when downloading FC.
        (Sidenote: I have no idea how large FC 4 is actually going to be.)
        The article says they expect the links to be saturated for about 3 to 4 days.
  • by Anonymous Coward
    Way to slow.

    Mod +5 funny -5 irreverant
  • Yes but... (Score:5, Funny)

    by PR_Alistair ( 819738 ) on Tuesday May 03, 2005 @12:07PM (#12421103)
    multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet
    ...does it run Linux?
  • Slashdot history! (Score:5, Interesting)

    by Dante ( 3418 ) * on Tuesday May 03, 2005 @12:08PM (#12421114) Journal

    This was a great article! I can attest the there is quite a difference with the new hardware, I got a 500KBps download last night while downloading rc3-mm2.

    Can we please have the same kind of article about slashdot hardware?

  • Interesting quote (Score:5, Interesting)

    by LiENUS ( 207736 ) <slashdot&vetmanage,com> on Tuesday May 03, 2005 @12:09PM (#12421119) Homepage
    The 'kernel.org' domain name was picked because by that time in 1997 the more logical seeming Linux dot names were already taken. The Transmeta domain was intentionally not used to avoid creating the false perception that Transmeta owned Linux.

    I wonder what would have happened with Transmeta and Linux if they had used the Transmeta domain to host the kernel archives. Would IBM have gotten involved with Linux? Would SCO have sued Transmeta instead of IBM? Would Linus have left Transmeta?
  • When Linus Torvalds purchased his first computer on which he began writing the Linux kernel, the state-of-the art PC with 4 megabytes of RAM and running at 33 megahertz was too expensive for him to buy outright.

    Oh my god, it's a diesel!
  • by TheKubrix ( 585297 ) on Tuesday May 03, 2005 @12:23PM (#12421249) Homepage
    ...it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet...

    Do I smell a challenge?
    • Do I smell a challenge?
      No, you smell your average home machine in about 4 years.

      After all, 4 years ago, a pimped-out machine had less than 20gig of hd space, 128meg of ram, and 500-800 mhz. DVD burner? forget it, not at $3000 each.

      Nowadays even entry-level boxes are better than that.

      • I seriously doubt the average home machine is going to cost > $50K in 4 years. While a future home PC might have that kind of compute power, it won't cost ANYWHERE near that much.

        A quick check at HP.com shows a ProLiant DL585 with 2 800-series dual-core Opteron processors starts at $12K. Add in 24GB of RAM and 10TB (two MSA30 each with 14 drives) and you're north of $50K in a hurry...
        • Costs are coming down way quicker than Moores Law would have predicted.

          How much would a terrabyte of storage have cost you 4 years ago? You would have needed to stack 50 x 20gig drives, plus the cases, controllers, and power supplies, etc. Today? Under a grand, as an off-the-shelf item. Terabyte drives will be in people's boxes by Christmas of next year, if not this year. You can buy 2 TB of storage today for less than what I paid for an 80 meg hd 15 years ago.

          Same thing for cpus. You'll have a hard time

        • I remember my PC of 4 years ago , Brand spanking new Athlon thunderbird with an awe-inspring 512MB of ram and a great GF2 pro , and a hugh twin -30GB HDD in a raid array (that and my iMac g3 , guess which one i still use as a media server).
          it cost around 3 grand in total . Today i could build a system that with an 80GB HDD , an athlon-xp 2500+ , 512MB DDR RAM a better graphics card all for around 1/15th the price , so what cost me then 3000 will today (for something alot better) cost 150-200 (self built
      • "you smell your average home machine in about 4 years"

        Multiple 4 way opterons? 24 GB memory? At least 6 years way, assuming that the average home machine does not simply drop in cost at the expense of specs.
      • Actually, I bought my wife's machine 4 years ago and it came with a 80GB HD,
        1.6GHz AMD CPU, 128meg RAM, CDRW drive, and a DVD drive. Cost about $1500.
        Except for the RAM, it's actually quite comparable to today's entry level
        boxes. Still runs like a champ, too.
    • No, it's the smoke from their server.
  • I'm in awe of that box. It just pushes so much data, all the time. And 1000Mb/s of bandwidth?! That's more bandwidth than Google!*


    * I strongly suspect this not to be true.
    • Re:Awesome (Score:3, Interesting)

      by moz25 ( 262020 )
      Seeing as google has thousands of boxes, my estimate would be that the combined google services pump out over 10Gb/s, rather than just 1Gb/s.
      • What amazes me is that on the new cisco ultra-router, the CRS-1, the slowest port it has is 10GigE. That's silly.

        --
        lds
        • "What amazes me is that on the new cisco ultra-router, the CRS-1, the slowest port it has is 10GigE. That's silly."

          How so? That thing is designed for serious backbone duty, and should thus be built to handle insane amounts of data. That thing makes a business-class router on a GigE line look like a Linksys on a DSL line (not knocking Linksys, I love my WRTs).
          • I meant silly as a synonym for "awesome" :).

            I'm very excited by having gotten to touch an engineering model of the CRS-1!

            --
            Phil
  • by teeker ( 623861 ) on Tuesday May 03, 2005 @12:37PM (#12421381)
    Referring to 32-bit systems, Peter noted, "we learned that the Linux load average rolls over at 1024. And we actually found this out empirically."

    Can you even get the server to TELL you what the load is when it's that high?? That's INSANE!
    • by Dante ( 3418 ) *

      sure;
      The proccessors could actualy be doing very little other then waiting on the disks.. I'd be more interested in output from vmstat.
      • Yup, they should put up some pretty graphs for kicks. I wonder whether that would be even doable - maybe some of the counters in /proc are just overflowing all the time?

        Would definately look funny. The scale alone would be good for some jawdropping at second glance...
    • It depends on how fast the processes are making it to runtime. If you have 1000 jobs that will each take a few msec, but new ones are created at that same rate, you will still get responsiveness out of the system. If they each take a few sec, on the other hand, you have a problem... Oh how I wish everyone understood the difference between load average and load on the cpu.
    • Can you even get the server to TELL you what the load is when it's that high??

      Of course you can. "Normal people" often get high load values when they run out of mem and the box starts swapping and you can't control the box. Try running some thousands of "cp /dev/zero /dev/null" or something like that to get high load value, you still will be able to control the box - specially with a reniced root shell.
    • I've seen load averages well over 1500 on Solaris boxen. The CS dept at my college has some issues with giving us sane limits, so programming assignments occasionally go a little crazy.

      Uptime usually works well into 2k, but "ps" and even ls in the /proc folder crap out around 500.
    • Can you even get the server to TELL you what the load is when it's that high?? That's INSANE!

      Someone needs to check up on what the load average actually means :).

  • what does the increase in traffic say about the level of Linux adoption out there...

    As we have some figures for the numbers of machines in the early days and surely we have the traffic figures for then as well...

    We should be able to make a reasonable guess at the number of machines out there with Linux on them...

  • Hat's off to HP (Score:5, Insightful)

    by oni ( 41625 ) on Tuesday May 03, 2005 @12:40PM (#12421411) Homepage
    I'm really happy to see HP giving so much support. I'll definitely remember this the next time someone asks my opinion about what server hardware to buy.
    • Second that here, their generosity to Linux users should be recognized. We all know that IBM contributes heavily to Linux, but they are far from the only ones.

      BTW-- I can attest that the dual Opteron DL145's from HP [hp.com] are rocking boxes for Linux.
    • I'll second this. Their support for many things is outstanding. HP really does seem to care about the end users.

      I have seen them support networking equipment purchased second (third?) hand off of eBay without asking "Where did you buy it?" or anything like that.

      In another instance, one of their supprot people searched through several dozen models of laptops and found one in a different product line that had compatible drivers so that an OS besides Windows XP could be used. No other company has come clos

    • I'm really happy to see HP giving so much support. I'll definitely remember this the next time someone asks my opinion about what server hardware to buy.

      I believe that was the reaction HP's marketing department also expected. Admittedly, providing the hardware was a very nice gesture, but in reality, it's a brilliant marketing move.

      Furthermore, I hope you will take other factors and datapoints into consideration when someone asks you for your advice, though. The servers donated were relatively high-end -
  • by moz25 ( 262020 ) on Tuesday May 03, 2005 @12:40PM (#12421421) Homepage
    It doesn't surprise me that being linked from slashdot is just a minor effect. A kernel package is tens of megabytes, while a single visit will likely consume less than 100KB.
  • With the increased bandwidth generated by the git repositories, better than ftp mirroring, distributed sharing would come handy, releaving feed pipe and average load too.

    However, in this post [iu.edu] from hpa, it looks like the tools are not ready [yahoo.com].

  • noatime interesting (Score:5, Interesting)

    by redelm ( 54142 ) on Tuesday May 03, 2005 @01:47PM (#12422210) Homepage
    More people should look into `noatime` for file-intensive systems. Peter said all the access time updates doubled his load average, and I've seen worse. Try running the `updatedb` to freshen the locate database. Takes minutes. remount FS noatime, flush buffers with a grepbomb, and it takes seconds. Remount with atime, back to minutes.

  • I worked at Globix when we offered free bandwidth to kernel.org. In the beginning, when things were going well and we had hundreds of millions of dollars to spend, we used this to leverage our poision in the open source community. Of course, when the bubbled bursted Globix tried to get rid of all the free riders first. It was done very selectively, though. While some were cut loose as fast as possible (like kernel.org), others were kept because they had better connections to some of the executives. I don't
  • "where it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet"
    Can you image a cluster of these...no wait...
  • by trb ( 8509 ) on Tuesday May 03, 2005 @02:26PM (#12422732)
    Serving data with http and ftp is is not very CPU intensive, but over time the amount of rsync traffic being fed by the kernel.org server continued to increase, and rsync is CPU intensive. "That's what rsync does" Peter said, "it trades bandwidth for CPU horsepower...

    I don't have occasion to use rsync, and I'm not too familiar with its design, but I think it synchs directories by checksumming the files in them to see if they differ. So Peter is saying above that the server's bottleneck is checksumming. I would think that on a server like this, checksums could be cached - why checksum a stable file more than once? Once you have a checksum for linux-2.6.0.tar.bz2, why calculate it again?

    This would require a bit of bookkeeping when files change, but wouldn't it be worth it on such a busy system? (Or am I confused?)

    • There's also the overhead of compression, which admittedly doesn't make much sense on a file that's already bzipped. What they need is a caching rsync proxy that checks the file to see if it's changed since the last access. :)
    • rsync uses rolling hashes for segments of the file. iirc segment boundaries are decided on by the client (i could be wrong on that part)

    • Once you have a checksum for linux-2.6.0.tar.bz2, why calculate it again?

      What's different about rsync [anu.edu.au] is that it does not ordinarily use a single file checksum (and therefore copy whole files if changed). Instead, to save bandwidth, it uses a more sophisticated system to ensure that only changed parts of a file are transmitted - and it detects changed parts by comparing (many) checksums, I believe. The report [anu.edu.au]sums it up like this:

      The algorithm identifies parts of the source file which are identical

  • thats where we need somekind of link between cvs and a peer to peer system.

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein

Working...