Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Get HideMyAss! VPN, PC Mag's Top 10 VPNs of 2016 for 55% off for a Limited Time ×
Debian Open Source Software IT Linux

APT Speed For Incremental Updates Gets a Massive Performance Boost 162

jones_supa writes: Developer Julian Andres Klode has this week made some improvements to significantly increase the speed of incremental updates with Debian GNU/Linux's APT update system. His optimizations have yielded the apt-get program to suddenly yield 10x performance when compared to the old code. These improvements also make APT with PDiff now faster than the default, non-incremental behavior. Beyond the improvements that landed this week, Julian is still exploring other areas for improving APT update performance. More details via his blog post.
This discussion has been archived. No new comments can be posted.

APT Speed For Incremental Updates Gets a Massive Performance Boost

Comments Filter:
  • by Anonymous Coward

    The speed got a performance boost?

    It's not too late to change the title.

    • by Anonymous Coward on Saturday December 26, 2015 @05:24PM (#51187327)

      Holy fuck, I just read the blog article and this is what it said:

      The reason for this is that our I/O is unbuffered, and we were reading one byte at a time in order to read lines.

      Writes are still unbuffered, and account for about 75% to 80% of our runtime.

      How the fuck did that all happen to begin with?! Who the fuck wrote the code like that initially?! How fucking long has this been the case?!

      I understand that bugs will happen. I really do. But this is like a total breakdown in process. Why the fuck weren't these problems detected sooner, like when the code was committed and reviewed? The code was reviewed, right?!

      We aren't talking about some minor software project here. This is apt, for fuck's sake! This is one of the core pieces of Debian's (and Ubuntu's, and many other distros') basic infrastructure. This is the kind of shit that has to be done properly, yet it clearly wasn't in this case.

      The Debian project needs to address how this shit even happened to begin with. This is fucking unbelievable. The entire Debian community deserves a full explanation as to how this debacle happened.

      I thank God every day that I moved all of my servers from Debian to OpenBSD after the systemd decision. I know that the OpenBSD devs take code reviews and code quality extremely seriously, not just with the core OS itself, but even with software written by others. The OpenBSD project will even create and maintain their own custom forks of third party software if the original developers can't get their shit together!

      • Re: (Score:3, Interesting)

        by zopper ( 4044367 )

        IMO, this originates in early days of APT. And because it worked, nobody wanted to touch it later when the speed become an issue, because nobody wanted to risk breaking it, because, as you said, it is a critical part of the infrastructure.

        And now, though, when you explained us how BSD is better, go back and pretend there are no such old lines hacked together long time ago, with nobody dusting them in years. (By the way, how long there was, for example, HeartBleed in BSD? Until discovered and fixed also on e

        • This is basically right. The reads were always unbuffered, we just added a ReadLine() in 2011 - that was actually more efficient and read entire blocks at once and then seeked back if it read to much - but this broke pipes, so it was switched to read a single byte at a time. Anyway, the code was a real disaster due to it's historic growth of adding various compression formats - but it is now nicely refactored, so we actually know what's going on, and can easily add new formats, buffers or anything we like.
      • I understand that bugs will happen. I really do. But this is like a total breakdown in process. Why the fuck weren't these problems detected sooner, like when the code was committed and reviewed? The code was reviewed, right?!

        That's what's great about Open Source and all those millions of eyeballs!

      • by Kjella ( 173770 )

        The reason is that while apt is a critical piece of infrastructure, incremental apt really isn't. The use case is that you're supposed to download a smaller diff, apply the diff and install the updated version either faster or cheaper if you're on a metered line. If you don't have a compatible earlier version, you have to use the traditional package anyway. So you can always fall back to that, it's only nice-to-have. The people on broadband don't care, the ones on really slow lines probably want their updat

      • Indeed. I actually read TFA, expecting to see some neat algorithmic improvements in how they did incremental updates. Instead, the post read like a first-year undergraduate student assignment writeup ('we tried doing the really naive thing, it turned out to be really slow, so we did the obvious thing').
        • Buffered reading my seem obvious too you, but the code in question had some users outside the class that worked directly on the file descriptor, and thus will not work correctly with buffered reading. Also, it's used in other code that implements its own buffering or does not need one, because it simply copies page-size blocks from one file to another. As another example, I have not figured out how to do go completely buffered without messing everything up yet: test suite fails all over the place if I swit
          • Which sounds like it's really poorly structured code, with the kind of layering that would make me fail a student project, and bounce back in code review anything that was intended to go anywhere near production.
      • by ZosX ( 517789 )

        That's truly shocking. Even a moron like me knows that is horribly awful design that sounds like it came straight from writing code on a COCO3.

      • Nobody knows what systemd really means, or where it came from.

  • by mveloso ( 325617 ) on Saturday December 26, 2015 @04:50PM (#51187173)

    Wow, reading one byte at a time unbuffered? Who does that in real life? It's been well-known for like 30 years that buffered reading is an order of magnitude faster than byte-at-a-time - which matches the above result. The standard C library does buffered reads, unless you turn them off explicitly.

    Did someone really turn that off explicitly? Why?

    Jesus, someone should check the XML parsers. Maybe the same guy wrote an XML parser and it's doing byte reads.

    • ... it's not even a prefix-code "per document", so only using buffered reads when parsing XML wouldn't even allow you to avoid reading into the data following the end of the XML document you actually want to parse.

      If something is XML based and time criticial, I wouldn't bother to optimize the XML parsing, but rather exchange XML for a non braindead format to start with.

      • by hey! ( 33014 )

        There's nothing braindead about XML, it's just not the right tool for many jobs it is used for. It's over-engineering a solution to a simple problem because everyone else is doing it that way that's braindead.

        When XML became hot around the end of the 90s, people did what they always do with a hot technology; they used it whether or not it made sense just to have it on their resume. It never made sense for things like an over-the-network data serialization format; or as a configuration file format where som

    • by guruevi ( 827432 ) <evi AT smokingcube DOT be> on Saturday December 26, 2015 @05:39PM (#51187403) Homepage

      a) Back in the day we did because memory was expensive and these things were to run on 386'es and other platforms that might not have the room for a sizable buffer and memory/bus/CPU bus were all equally fast. You only need a buffer if your machine is busy doing other things)

      b) It might be a benefit on certain platforms but in certain situations it feels (without looking at the rest of the code) like the code might introduce a buffer overflow issue (he explicitly removes the zero-buffer option if the file read returns a null pointer as it's buffer).

      c) Ask the original developer or do a blame-search for that code before 'fixing' things.

    • by jaklode ( 3930925 ) on Saturday December 26, 2015 @06:04PM (#51187515)
      It's not using standard I/O function, but pure syscalls, which are obviously unbuffered. And the same code paths are also used for other stuff that maps files. Performance critical code implemented a buffer on top of that, and the ReadLine() function experiencing the main issue was only added as a convenience function and not used for anything critical until a few months ago (and we forgot that it was not optimized). Anyway, we implement buffering for ReadLine() now. I'll try to make it generic for both reads (all reads) and writes, but so far have not succeeded, probably because some code depends on unbuffered reads or writes.
    • by jabuzz ( 182671 )

      Don't know put Office 2012 on the Mac does single byte writes when you save a file. Causes a file save over SMB to take 15 minutes. Do a "save as" and the same file takes a few seconds. Of course saving to local disk and you don't notice the difference.

      So if Microsoft with a team of paid full time programmers can make such stupid mistakes a why do you expect better of a little used piece of code (because for a large section of the Debian user base bandwidth long ago ceased to be an issue) that is maintained

  • Many thanks (Score:4, Informative)

    by NotInHere ( 3654617 ) on Saturday December 26, 2015 @05:00PM (#51187231)

    Really looking forward to have apt-get speeds that can be compared with pacman. Julian Andres Klode, if you read this, please continue the great work!

  • by Anonymous Coward

    That's good.

    Now it reamins 2 enhancement I'd really like to see :
    low free space on hdd management.
    parallel installation

    low free space : walk the dependencies to find apt with no dependencies, install, clean [repeat]

    parallel installation : right now apt download all files then install them it will be better to (download/install) each apt

    • by Anonymous Coward

      low free space : walk the dependencies to find apt with no dependencies, install, clean [repeat]

      "sudo apt-get autoremove" will remove all packages that were not explicitly installed and are not depended on by any explicitly installed packages.

      parallel installation : right now apt download all files then install them it will be better to (download/install) each apt

      This would probably not actually cause a significant improvement in performance. If you're pointed at a fast mirror, downloading eight packages at once that max out your bandwidth will not be any faster than one package that maxes out your bandwidth. Similarly, when actually installing them, you'd need to determine what order they could be installed in to ensur

      • by jaklode ( 3930925 ) on Saturday December 26, 2015 @07:18PM (#51187747)
        Well, we do download in parallel if you use httpredir.debian.org and httpredir.debian.org returns different mirrors for different packages (which it does not do all the time, but reasonably often). I don't like installing in parallel, or downloading and installing at the same time, as they just make the error handling harder, for modest speedup.
  • by Anonymous Coward

    in windows land, windows updates slow to a friggin crawl, can take hours, and with the new windows 10{dot}shit, lacks any sort of user control... the exact opposite of debian and apt.

  • andre@atlas:~# time apt &> /dev/null

    real 0m0.002s
    user 0m0.001s
    sys 0m0.000s

    WOW fast as hell!

Time is an illusion perpetrated by the manufacturers of space.

Working...