Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage Software Linux

Linux Kernel Archives Struggles With Git 45

NewsFiend writes "In May, Slashdot discussed Kerneltrap's interesting feature about the Linux Kernel Archives, which had recently upgraded to multiple 4-way dual-core Opterons with 24 gigabytes of RAM and 10 terabytes of disk space. KernelTrap has now followed up with kernel.org to learn how the new hardware has been working. Evidently the new servers have been performing flawlessly, but the addition of Linus Torvalds' new source control system, git, is causing some heartache by having increased the number of files being archived sevenfold."
This discussion has been archived. No new comments can be posted.

Linux Kernel Archives Struggles With Git

Comments Filter:
  • This is normal. (Score:5, Insightful)

    by A beautiful mind ( 821714 ) on Monday June 20, 2005 @01:21PM (#12864631)
    GIT is focused on trading more filespace for less bandwith. This is important for a lot of scattered developers who can afford 1-2 GB more on a harddrive, but 200-300 mb more would suck on a dsl or dialup connection.
    • Also, another necessity was to have files which can be handled without a lot of binary hacking, for example in a case when doing merges, recovery, rollback, etc. This is one of the reasons why there are a lot of files, not one big binary blob.
    • Which is why they have 10 TB of space. Is the server only for kernel development/source code, or is this also a mirror for downloading snapshots/compiled sources?
      • Well, i was talking about GIT as a developer's tool, not about the git services offered by kernel.org, but the reasoning above is the cause for the increased file-count.

        Answering your question, kernel.org holds a lot of stuff, not only kernel related things, but everything from distributions to various utilities, so yes.
  • Then he would be able to <comic name="Larry the Cable Guy">Git-R-Done</comic>
  • `grep -r`ing source code under Subversion takes much longer than with CVS, due to all the .svn files.
    • by Anonymous Coward
      Just exclude .svn from your grepped files. For further details, look at 'man grep' or 'info grep'.

      -d ACTION, --directories=ACTION If an input file is a directory, use ACTION to process it. By default, ACTION is read, which means that directories are read just as if they were ordinary files. If ACTION is skip, directories are silently skipped. If ACTION is recurse, grep reads all files under each directory, recursively; this is equivalent to the -

      • How does that work? I said `grep -r`. If I set the ACTION to not read directories, then it won't recursively grep. I do `| grep -v .svn` to remove output of grep for .svn directories, but it still greps the directories which takes a long time.
  • reiser4 + VCS? (Score:3, Interesting)

    by OmniVector ( 569062 ) <see my homepage> on Monday June 20, 2005 @01:52PM (#12864935) Homepage
    (sightly) offtopic. wasn't reiser4 supposed to have 'plugin' support, so things like version control could be built directly into the file system? the prospect of being able to say type:

    touch bar
    echo 'foo' > bar
    revisions bar
    output of revision history
    cp bar/revision/1 bar-version-1.0.backup

    granted yes, the storage requirements and cpu usaged might be horrific, but i think something like this is inevitable in file systems, and certainly i welcome the day it becomes a reality.
  • Really? (Score:2, Funny)

    by TheAngryMob ( 49125 )
    I've been struggling with stupid gits for years now. (Da-dum-dum). Thank you! I'll be here all week.
  • Aren't file system scalability issues why people start using databases?

    Sounds like a software engineering issue.

    • I think someone suggested on the LKML in the early development talks, to use an SQL database. According to my foggy memory, Linus replied something along the lines of that solution being much worse in terms of productivity and speed than using simple files. Basically it would be adding another (unnecessary) layer.
      • Linus ideas on CM are really really bad. While I think he's a great leader in terms of the kernel I really hope he doesn't end up having much influence on the CM side. Having used database driven CM systems (Rational, Borland's StarTeam) they are far and away better at just about everything than file based systems. There is simply no comparing the level of complexity of what you can pull out and how you can configure merges and changes.
        • Except that you're ignoring speed, the need to be decentralised(i cannot stress this enough, it is very needed in an environment like the kernel is developed in) and low system requirements. Currently git needs only a few basic c libraries and bash.

          Actually i was spending hours to grasp his ideas about GIT, it clearly shows that he gave it a lof of though. Actually i think another SCM already started integrating GIT code into their SCM.
          • I agree he's given this a lot of thought. Linus wouldn't have such non mainstream views if he didn't care. Bad ideas can be well thought out.

            Next, I'm not ignoring speed you can scale a database system up infinitely large. Since database systems support acid transactions (i.e. line/file source code locking during transaction) you can have multiple merges going on at once and thus effective speed is much much better. For example Amazon.com uses Oracle as their backend. Think about the number of users
  • If kernel.org is running over ext2 or ext3, then it would seem to be a format problem, not a Git problem. These are not designed to be high-performance filesystems.

    On the flip-side, if kernel.org is using XFS, JFS, Reiserfs (I doubt they'd risk Reiser4 yet) or any other very high-performance filesystem, then maybe the problem is one of organization.

    It is rare that you actually need large numbers of files holding very small amounts of data or metadata. What is probably wanted is a virtual layer that all

    • http://kerneltrap.org/node/5070 [kerneltrap.org]

      this interview with the maintainers has a comment from sombody who claims he asked by email and got the reply that ext3 is used

      if thats not a good enough perhaps guessing that as "At this time, the servers run Fedora Core and use the 2.6 kernel provided by RedHat." they might be using ext3 that is the default.
      • by jd ( 1658 )
        Actually, I believe Fedora Core 3 has most of the other filesystems compiled in, you just won't get the main partition formatted with them.

        Since the "smart" way to run such a server is to have the main FS on one disk and the data on another (this avoids tracking the head back and forth), the data partition can be just about anything.

        Now, the fact that the maintainers have said they are using Ext3 is rather more convincing to me. Foolish beyond belief, but convincing. I would rather use a "less reliable

    • Gripes about ext3 performance are probably outdated.

      We did some tests comparing reiser3, xfs, and ext3 with the dir_index option on 2.6 kernels. We were writing thousands (ok tens of thousands) of small files into a couple of directories (specialized app, you don't want to know.)

      When directories got large, ext3 with the hash lookups (between 800 and 1500 creations per second on newish hardware) ran much faster than xfs, oh and several orders of magnitude faster than ext3 without the directory hashing.

  • by Larry McVoy ( 893563 ) on Monday June 20, 2005 @02:02PM (#12865032)
    Bow before my might, l1nux l00s3rs!
  • Filesystem? (Score:5, Interesting)

    by RealBorg ( 549538 ) <thomaszNO@SPAMhostmaster.org> on Monday June 20, 2005 @02:19PM (#12865232) Homepage
    Maybe kernel.org should finally consider moving to a more appropriate filesystem than ext3, preferably reiserfs for it beeing optimized to handle a lot of small files. Tail packing not only saves disk space but more important a lot of memory in block cache.
    • Re:Filesystem? (Score:5, Informative)

      by Yenya ( 12004 ) on Monday June 20, 2005 @04:06PM (#12866229) Homepage Journal
      Disclaimer: I run one [linux.cz] of the kernel.org mirrors.

      Ext3 vs. Reiser is not an issue here. FWIW, I use XFS on my mirror volume, and I have also noticed how the git repository increases load on my server. See the CPU usage graph [linux.cz] of ftp.linux.cz - look especially at the yearly graph and see how the CPU system time has been increasing for last two months.

      The problem is in rsync - when mirroring the remote repository it has to stat(2) every local and remote file. So the directory trees have to be read to RAM. Hashed or tree-based directories (reiserfs or xfs) can even be slower than plain linear ext3 directories, because you have to read the whole directory anyway, so linear read is faster.

  • 10 TB (Score:3, Funny)

    by jo42 ( 227475 ) on Monday June 20, 2005 @02:29PM (#12865325) Homepage
    That's a pretty decent sized pr0n collection they gots there...

    Kernel sources take up, what, only a handful of gigabytes?

  • So the rate of files being archived was multiplied by 128?
  • Are you reading this man?

    You're responsible for all the world's problems! The linux kernel, bitrot on my cds, war in Iraq, Guantanamo Bay, and now git!

    Come on Linus, clean up your act!

    (Sorry if this offends *anyone*)

For God's sake, stop researching for a while and begin to think!

Working...