APT Speed For Incremental Updates Gets a Massive Performance Boost 162
jones_supa writes: Developer Julian Andres Klode has this week made some improvements to significantly increase the speed of incremental updates with Debian GNU/Linux's APT update system. His optimizations have yielded the apt-get program to suddenly yield 10x performance when compared to the old code. These improvements also make APT with PDiff now faster than the default, non-incremental behavior. Beyond the improvements that landed this week, Julian is still exploring other areas for improving APT update performance. More details via his blog post.
not too late (Score:1)
The speed got a performance boost?
It's not too late to change the title.
How the fuck did this slowness even happen?! (Score:4, Interesting)
Holy fuck, I just read the blog article and this is what it said:
How the fuck did that all happen to begin with?! Who the fuck wrote the code like that initially?! How fucking long has this been the case?!
I understand that bugs will happen. I really do. But this is like a total breakdown in process. Why the fuck weren't these problems detected sooner, like when the code was committed and reviewed? The code was reviewed, right?!
We aren't talking about some minor software project here. This is apt, for fuck's sake! This is one of the core pieces of Debian's (and Ubuntu's, and many other distros') basic infrastructure. This is the kind of shit that has to be done properly, yet it clearly wasn't in this case.
The Debian project needs to address how this shit even happened to begin with. This is fucking unbelievable. The entire Debian community deserves a full explanation as to how this debacle happened.
I thank God every day that I moved all of my servers from Debian to OpenBSD after the systemd decision. I know that the OpenBSD devs take code reviews and code quality extremely seriously, not just with the core OS itself, but even with software written by others. The OpenBSD project will even create and maintain their own custom forks of third party software if the original developers can't get their shit together!
Re: (Score:3, Interesting)
IMO, this originates in early days of APT. And because it worked, nobody wanted to touch it later when the speed become an issue, because nobody wanted to risk breaking it, because, as you said, it is a critical part of the infrastructure.
And now, though, when you explained us how BSD is better, go back and pretend there are no such old lines hacked together long time ago, with nobody dusting them in years. (By the way, how long there was, for example, HeartBleed in BSD? Until discovered and fixed also on e
Re: (Score:1)
Re: (Score:2)
well the lol seems right and the well written wrong.
in early days you would certainly have cared. you know, installing debian in a day or 4 days would have mattered.
Re:How the fuck did this slowness even happen?! (Score:5, Informative)
Re: (Score:2, Interesting)
Gosh it was slow code. Not so much bad code.
I fail to see the distinction. This is painfully bad code. I, for one, would not enjoy working with people who think this is not really bad code.
Re: (Score:1)
I understand that bugs will happen. I really do. But this is like a total breakdown in process. Why the fuck weren't these problems detected sooner, like when the code was committed and reviewed? The code was reviewed, right?!
That's what's great about Open Source and all those millions of eyeballs!
Re: (Score:3)
The reason is that while apt is a critical piece of infrastructure, incremental apt really isn't. The use case is that you're supposed to download a smaller diff, apply the diff and install the updated version either faster or cheaper if you're on a metered line. If you don't have a compatible earlier version, you have to use the traditional package anyway. So you can always fall back to that, it's only nice-to-have. The people on broadband don't care, the ones on really slow lines probably want their updat
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That's truly shocking. Even a moron like me knows that is horribly awful design that sounds like it came straight from writing code on a COCO3.
Re: How the fuck did this slowness even happen?! (Score:2)
Nobody knows what systemd really means, or where it came from.
Re: (Score:1)
Re: (Score:3)
I'm missing something here. With all the flap about systemd, why the rush of all the distros to adopt it?
There has hardly been a "rush" to speak of. Fedora was first and switched in 2011, that's almost five years ago and we still have distros that haven't switched yet. This is one of the absolute slowest tech migration ever in the Linux community.
I'm on Mint, but even that is slated to go to systemd at the next major release. Binary logs, etc.? No thanks.
If you by "binary logs" mean regular text logs with a bit of metadata attached so that you can actually find stuff then yes, that's a good thing.
Re: (Score:2)
These logs need a completely different set of tools to manipulate the text.
Re:Now only... (Score:4, Interesting)
Re: (Score:1)
"Everyone who has ever had to clean up a filled up /var/log partition knows how broken logrotate is."
Eh?
I've been using happily using logrotate for more than a decade on all of my Linux systems. Logrotate is dead-simple and reliable. (And it handles logs /anywhere/, not just in /var/log.)
I've seen systemd proponents use fabricated slander against logrotate to try make journalctl look good. Thing is, experienced sysadmins see through the slander. :)
"It also makes it possible to actually search for things wit
Re: (Score:2)
What are you even talking about "rotate based on size"?
Re: (Score:2)
logrotate runs periodically. It doesn't continuously poll for size making it trivial for an "event" to fill up a log file. I've seen a system with logrotate that worked perfectly for years keeping logs under control go down during brute force attack (why there was no brute force prevention is a different discussion) all because the log directory (which shared a partition with more critical parts of the system but that's a different discussion) filled up the entire partition.
MOD UP (Score:1)
Re-inventing the wheel again (poorly) because Pottering can't be bothered to read a man page.
Re: (Score:2, Interesting)
[I apologize to the masses for responding to yet another offtopic systemd propagandafest thread.]
But could I grab any old 32 or 64 bit ubuntu live install usb stick from the last 5 years to access an unbootable Debian system? (from there intstalling raid admin tools is a one line apt-get)
Or before I can view the logs do I need to prepare and maintain a custom rescue boot disc for every debian version, with the same 32/64 bitness, same lib6c and other .so library versions, and same exact version of systemd?
Re: (Score:1)
So if this is actually a mission critical set up, 1) you will already be exporting logs to an actual log server which is still up, 2) you can turn on plain text logs to support your offline troubleshooting, but still have the superior logging with meta data while online and 3) if you're fixing production with bootable CD you really need to work on your version control and release procedures.
So I'm guessing you've never managed a server making/losing thousands dollars and/or actually used or configured journ
Re: (Score:1)
The story was about apt, not systemd. Therefore I was preemptively modding myself offtopic exactly so people like you who would be uninterested could quickly move on and not waste your time on it. Due to your wildly emotive response I guess I failed in that attempt.
Re: (Score:2)
If you're that attached to your text logs just set journald to output them in the classic way.
Is there journalctl for other operating systems? (Score:2)
You use journalctl to read them, or any other program that can read the well-defined on-disk format.
Say you're trying to troubleshoot problems with a machine by booting to a different operating system in order to read the logs of the machine's primary operating system. This works for plain text logs because support for ASCII text is ubiquitous, modulo some line ending weirdness in Windows. Do such "other programs" exist for all PC operating systems that can read ext2/3/4 file systems?
Re: (Score:3)
Is this a requirement for you? Set journald to output classic plain text logs as well. You get all the benefits of a nice utility to sort through your logs and the ASCII text files which you find so critical to your use-case.
Re: (Score:2)
Do try harder.
http://www.freedesktop.org/wiki/Software/systemd/journal-files/
Re: (Score:2)
Except stderr and syslog messages are not dropped. Better still they are recorded which is a step up from the old way and you can set them to not only replicate the old behaviour but keep logging the new way at the same time.
Talking out of one's ass and not reading the fucking manual is the problem most people have with systemd.
Re:Now only... (Score:5, Interesting)
Mainly because that flap involved very few of the actual technical people and people got upset long after the decision was already made. It also seems to be the case that some trolls went off on a misinformation campaign complete with fake bug reports. Quite frankly, I was terrified of systemd from what I was reading here, but then I actually read up on the subject and realized most of what people have been saying about it is false.
Now that I've actually tried it, my desktops boot faster and I have had a much easier time customizing the boot sequence of some of the servers I maintain.
Re: (Score:3)
I have no idea why people insist on boot times. You are aware that a machine spends little time booting and a hell of a lot more running things?
Sure, except when you do reboot it might be important to do it quickly some of the time.
It's absolutely true that it's more important to optimize the common case over the uncommon case, but that doesn't mean the uncommon case is necessarily insignificant.
Re:Now only... (Score:5, Insightful)
Text is a terrible format for efficient storage of and access to structured data
Access to binary logs is O(1) instead of O(n)
journalctl outputs a pixel-perfect copy of what
You can query more effectively and precisely than with awk, sed and grep
You can still use awk, sed and grep if you want
You can run syslogd in parallel and have your text file as well
The binary format is well documented
Traditional logs are binary as well as soon as they are rotated and compressed
For fucks sake already, can we not have a single Linux related discussion that has nothing to do with systemd without it spiraling into a systemd flame fest? Systemd is not the devil. All I read here from detractors are people who are regurgitating bullshit they overheard while riding the bandwagon they blindly jumped on without actually having a single clue what they are talking about. Talk about the blind leading the blind. Meanwhile anyone with a clue who tries to chime in with a voice of reason is simply drowned out. Does using the word binary in sentence where you also refer to logs make you feel like some kind of super hacker? Sometimes I really think that's what all this never ending bullshit is about.
Re: (Score:1)
What's wrong with binary logs?
Nothing, if they're well-designed and ACID compliant, which journald is not:
The only way to deal with journal corruptions, currently, is to ignore them: when a corruption is detected, journald will rename the file to .journal~, and journalctl will try to do its best reading it. Actually fixing journal corruptions is a hard job, and it seems unlikely that it will be implemented in the near future. [...]
Lennart Poettering 2014-06-25 09:51:01 UTC
Yupp, journal corruptions result in rotation, and when reading we try to make the best of it. they are nothing we really need to fix hence.
* https://bugs.freedesktop.org/show_bug.cgi?id=64116
I think if they had used SQLite or OpenLDAP's LMDB, then it would be less of a big deal. From Howard Chu of OpenLDAP:
* https://www.linkedin.com/pulse/20140924071300-170035-why-you-cannot-trust-lennart-poettering-systemd
Re:Now only... (Score:4, Interesting)
What's wrong with binary logs?
Nothing, if they're well-designed and ACID compliant, which journald is not:
It's rare that I choose to defend systemd, but ACID compliance does not mean you never have to deal with a corrupt database. It is a software technique to make sure transactions complete in an atomic, consistent, isolated and durable way but it still presumes a "perfect" system and if a bit flip happens in memory or on disk outside the programming logic then ACID will fail. That is why you have ECC, RAID1/5/6 and ZFS, but even they fail and sometimes you have genuinely "impossible" results like you've added $2+$2 to the account and the result is $5 (bit flip from 0x0100 to 0x0101). If you're using plain text and UTF-8 this can happen there as well, there are combinations that are simply illegal to use. You expect the parser to ignore the "impossible" and carry on, apparently that's what journald is doing too.
Bad code is everywhere (Score:4, Funny)
Wow, reading one byte at a time unbuffered? Who does that in real life? It's been well-known for like 30 years that buffered reading is an order of magnitude faster than byte-at-a-time - which matches the above result. The standard C library does buffered reads, unless you turn them off explicitly.
Did someone really turn that off explicitly? Why?
Jesus, someone should check the XML parsers. Maybe the same guy wrote an XML parser and it's doing byte reads.
XML is broken by design performance-wise... (Score:3, Informative)
If something is XML based and time criticial, I wouldn't bother to optimize the XML parsing, but rather exchange XML for a non braindead format to start with.
Re: (Score:3)
There's nothing braindead about XML, it's just not the right tool for many jobs it is used for. It's over-engineering a solution to a simple problem because everyone else is doing it that way that's braindead.
When XML became hot around the end of the 90s, people did what they always do with a hot technology; they used it whether or not it made sense just to have it on their resume. It never made sense for things like an over-the-network data serialization format; or as a configuration file format where som
SGML was there before XML... (Score:2)
Re: (Score:2)
I was doing buffered reads 20 years ago. You only need 1k of buffer to get a substantial performance improvement...and that was with floppies and tape.
I mean, don't they teach this stuff in school? The disk travels at x RPM, so every byte you read means you have to wait for the sector to come around again. It doesn't really matter what x is (unless it's an SSD), because it's slow. It's like forever slow. You might as well get coffee and go to the bathroom waiting.
This is like I/O 101.
Re: Bad code is everywhere (Score:2)
The buffering in question here is user-space buffering. The lousy unbuffered code still had the benefit of the kernel's page cache, but had to make a system call for each read. If it went to disk for each read, the performance hit would have been many orders of magnitude, not just one.
Re: (Score:1)
Comment removed (Score:5, Insightful)
Re:Bad code is everywhere (Score:5, Informative)
Re: (Score:1)
The standard library doesn't come with a ReadLine function. It has the ability to read I/O, and to buffer it, but it is up to the implementation to do it how they see fit. In the case of APT (I cannot say I am surprised. I would really love to hear the reason they made it unbuffered to begin with,) since it is in C and a core function, it will be using ONLY the C library. So they have to implement the ReadLine() themselves.
C was designed to be barebones. It's why you can easily shoot yourself in the foot (l
Re: Bad code is everywhere (Score:2)
C has had a buffered line reading function approximately forever. It is called fgets().
Re: (Score:1)
Yes, we have fgets(), but it's not exclusively a ReadLine() function, so to speak. Sane people would write a ReadLine() function using fgets() wrapped in some nice loops to read in buffered input. APT -- apparently not so much. I can only assume they were using getc or some such. I may dig in to apt-get source here in a few to see exactly what they were doing.
I've been looking for a project to dedicate some spare time to and, seeing someone earlier in the discussion saying it only has four people working on
Re: (Score:1)
Re: (Score:1)
I agree, 100% and, quite frankly, yes. That is the level of some engineering in linux.
I am die hard linux; I love the idea, and I use tons of distros every day. I love the freedom. But they reinvent the wheel so often, that some times people forget the wheel is meant to be round.
Re: (Score:1)
I hate to admit it, but a lot of "Linux Fans" lack common sense; they simply go where their friends/peers go. That whole hipster thing. Meh. I see things for their value, and their cons. I use windows AND linux, and I like both.
I am NOT a fan of Java. But I see it's worth quite well. It has its use. A tool for every job. I, personally, prefer C, but that's because I've always used C (it's what I started on all those years ago) and it's what i always come back to. It's like that abusive Ex that you just cann
Re: (Score:1)
You're fucking kidding me, right? Go away, Troll.
Just because there is a 'coding standard' does not mean it has good naming standards. I'm not -downing- C -- I've already stated several times that I -love- C and prefer it over anything else. I've been writing C applications since 1990. Both professionally ( as in, for PAY on major projects ) and on my own projects. I think I have earned a right to not be 'inexperienced.'
Re: (Score:2)
Go is targeted at this appliction area.
Re: (Score:2)
But that Linux, or Debian, doesn't have a core shared library somewhere that is able to read lines is, quite frankly, astonishing to me.
The question is whether there was such a library when apt was written, in the early 90s. And whether it was guaranteed to be present on a base install, or whether it was something that apt was going to be installing. If not, then apt couldn't use a shared library version so it would have to statically link it, which may have argued for implementing something very simple instead.
Remember that when apt was written installation was generally done from floppy disks so a little inefficiency in the parsing code
Re: (Score:2)
Don't know put Office 2012 on the Mac does single byte writes when you save a file. Causes a file save over SMB to take 15 minutes. Do a "save as" and the same file takes a few seconds. Of course saving to local disk and you don't notice the difference.
So if Microsoft with a team of paid full time programmers can make such stupid mistakes a why do you expect better of a little used piece of code (because for a large section of the Debian user base bandwidth long ago ceased to be an issue) that is maintained
Re: (Score:3)
Does DNF still use the same RPM handling code? (Score:2)
Re: (Score:1)
The production server that I work with runs CentOS and its YUM updates are fairly quick. Granted, it has a super, super fat pipe and some pretty nice hardware. But on a production system, I'm not looking for speed in updating, I am looking for consistency and "don't fuck my shit up, yo!"
Re: (Score:1)
It has nothing to do with apt. I was just replying to the original A/C who said YUM was slow.
Many thanks (Score:4, Informative)
Really looking forward to have apt-get speeds that can be compared with pacman. Julian Andres Klode, if you read this, please continue the great work!
Re:Many thanks (Score:5, Funny)
Re: (Score:1)
can you make portage faster? ;)
only two enhancements remaining (Score:1)
That's good.
Now it reamins 2 enhancement I'd really like to see :
low free space on hdd management.
parallel installation
low free space : walk the dependencies to find apt with no dependencies, install, clean [repeat]
parallel installation : right now apt download all files then install them it will be better to (download/install) each apt
Re: (Score:1)
low free space : walk the dependencies to find apt with no dependencies, install, clean [repeat]
"sudo apt-get autoremove" will remove all packages that were not explicitly installed and are not depended on by any explicitly installed packages.
parallel installation : right now apt download all files then install them it will be better to (download/install) each apt
This would probably not actually cause a significant improvement in performance. If you're pointed at a fast mirror, downloading eight packages at once that max out your bandwidth will not be any faster than one package that maxes out your bandwidth. Similarly, when actually installing them, you'd need to determine what order they could be installed in to ensur
Re:only two enhancements remaining (Score:5, Informative)
Re: (Score:1)
meanwhile.... (Score:1)
in windows land, windows updates slow to a friggin crawl, can take hours, and with the new windows 10{dot}shit, lacks any sort of user control... the exact opposite of debian and apt.
A lot faster! (Score:1)
andre@atlas:~# time apt &> /dev/null
real 0m0.002s
user 0m0.001s
sys 0m0.000s
WOW fast as hell!
Re: (Score:2)
We need a FREE INTERNET for every human beign
How about we just start with a free spell checker?
Re: (Score:1)
Re: (Score:2)
but what will it take for a single format to emerge ?
A use case?
Re: (Score:2)
Re: (Score:1)