Slashdot Log In
Multi-Threaded SSH/SCP
Posted by
kdawson
on Wed Feb 13, 2008 05:19 AM
from the recovering-wasted-bandwidth dept.
from the recovering-wasted-bandwidth dept.
neo writes "Chris Rapier has presented a paper describing how to dramatically increase the speed of SCP networks. It appears that because SCP relies on a single thread in SSH, the crypto can sometimes be the bottleneck instead of the wire speed. Their new implementation (HPN-SSH) takes advantage of multi-threaded capable systems dramatically increasing the speed of securely copying files. They are currently looking for potential users with very high bandwidth to test the upper limits of the system."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
A likely story (Score:5, Funny)
Re:A likely story (Score:5, Insightful)
VHS v Betamax comes to mind.
Parent
Re:A likely story (Score:5, Insightful)
Parent
Re:A likely story (Score:5, Insightful)
Parent
Re: (Score:3, Interesting)
Re:A likely story (Score:4, Insightful)
It is rare that you can completely separate every context of every step of your processing. There is always some data that needs to be shared between the threads and they become bottlenecks. The faster you serve your requests, the worse the contention (waiting for a resource) and thus the inefficiency.
It depends on the task at hand and on your architecture. A file or web server is less likely to encounter contention than for example an IRC server. The first requires some authentication and resource resolving through configuration data but the actual data can be send without interference from other requests. An IRC server requires constant lookups in the user database for routing information and this is likely to take longer than actually sending the messages (even without multi-threading). In these cases, you really have to think your locking scheme through or you will lose more time waiting for a lock than doing actual work - defeating much of the purpose of going MT.
When it comes to architecture, multi threading is an option in your architecture, not an architecture in itself. There is no problem doing a multi-threaded event-driven architecture or a MT message passing architecture -- these are actually very effective. For some interesting reading about this, I would suggest you check out the SEDA white paper [harvard.edu] for a pretty in depth list of options and their goals.
Why is it bad for programmers? Because locking is hard to do in itself and if your locking scheme is subobtimal it often requires a lot of work to change it afterwards.
Parent
Re: (Score:3, Informative)
Re:A likely story (Score:5, Insightful)
Which explains why football players and movie stars will get paid more than the innovators that carried out the research to develop the broadcast technology that helped to make those stars famous.
Parent
Re: (Score:3, Funny)
Re:FUNNY?! That's not funny, try for TRUE (Score:5, Insightful)
When your child first looks up at your face and you see actual recognition in her eyes... when you see all the blocks fall into place as she figures out how to do something for the first time... look, I know it sounds really sappy and smarmy, but seriously (srsly) it is absolutely indescribable. This thing started out as a bit of genetic code from two people, and now it is actually self-aware and sentient. How cool is that? What geek can't be astonished at these emergent properties, derived from a program more complicated than you can possibly imagine -- a program that has spontaneously evolved over time?
And you get to see her mental map evolve. You watch branches get added to her decision tree. You observe as she learns how to acquire information, process it, and decide how to act upon it. And all the while, you mold her view of the world based on your interactions with her. I don't know about you, but I find that not only fascinating, but incredibly rewarding.
Before my daughter was born, I was terrified too, and somebody had said these things to me, I would've said, "Yeah, okay, I'm sure it's great and all, but I'm sure you're exaggerating somewhat." That's because there is something that happens to you when it's your kid. There's some very ancient, very basic code that gets turned on in your brain that says "this life is your responsibility, and you must do everything you can to ensure its safety, survival, and growth". I can't explain it because I honestly believe it's something buried deep beneath the conscious mind.
Whatever the case, if you honestly don't want the baby, for it's sake, put it up for adoption. Don't make it live a life with a father who doesn't care for it. I'm being absolutely serious here. Find a loving couple who are unable to have kids of their own.
(Posting AC because this is way offtopic, and because there are a lot of single, selfish, bitter child-haters out there with mod points to burn... but I had to say something.)
Parent
Re:FUNNY?! That's not funny, try for TRUE (Score:5, Interesting)
Don't mistake my badly crafted joke for being completely ignorant of what's ahead of me; before the final decision came, I had consulted with friends who are also parents (carefully not discussing this with any of my single, singlemindedly free-roaming friends), and I am in no way in doubt that I will make this child a net benefit for the human race. There are simply too many rotten parents, spoilt children, miserable families and bad genes in the world for me to actually fail in that respect.
Plus, living in Denmark*, the baby will have pretty good odds for a good life, my involvement notwithstanding.
I am going to have a lot of fun making tech projects for my little one when that time comes, including audio books with his/her favourite bed time stories, video diaries of how the child evolves, and of course, teaching how to solder before the age of 5. How I survived until 15 without that knowledge eludes me to this day.
*: Studies have shown that there is a tie for Country With Best Quality of Life; Denmark and Iceland. I've been to Iceland, and it smelled like rotten eggs. Denmark takes the lead.
Parent
Re: (Score:3, Informative)
Re:A likely story (Score:4, Funny)
Parent
Oblig applicable Dilbert (Score:5, Funny)
(PHB rejects suggestion)
(later)
Wally: "I was this close to making it my job to download naughty pictures."
Dilbert : "It's just as well; I would have had to kill you."
( http://books.google.com/books?id=dCeVfKrZ-3MC&pg=PA77&source=gbs_selected_pages&cad=0_1&sig=xD5tmMhG1RcspLch8gCIJu8ro2U#PPA79,M1 [google.com] )
Parent
this is just what i've been looking for (Score:2, Funny)
Must be why rsync over ssh is much faster (Score:5, Insightful)
Big scp copies through my wifi router used to cause kernel panics under netbsd current of about a year ago. I never had that problem running rsync inside ssh.
Re: (Score:3, Informative)
Re:Must be why rsync over ssh is much faster (Score:5, Interesting)
And in my experience rsync is faster.
Parent
Re:Must be why rsync over ssh is much faster (Score:5, Interesting)
If you just want to copy some files from system to system in an encrypted fashion, then the BEST option by far is to use tar, and pipe it through ssh like so:
tar cvfpz - * | ssh user@host '( cdThis example will compress and encrypt your data before sending it; on the other end, the file is streamed to tar. This example requires GNU rar or a close facsimile.
Now, if you want to UPDATE a directory, use rsync:
rsync -av -e ssh * user@host:/destination/Because rsync will do partial checksums and send parts even of BINARY files if the whole file has not changed, and doesn't re-send unchanged files, rsync makes sense when updating a directory. But it provides no speedup benefit over using tar, and in fact the directory scans it does before the sync mean that it may actually be slower.
Use scp only for copying single files, because you're right, scp chokes between each file.
Parent
Re: (Score:3, Informative)
tar cfpz - . | ssh user@host '( cd /destination ; tar xfpvz - )'
I'd use a "." instead of *, it avoids shell line length problems, and will also copy hidden files... as someone who as learned this the hard way. Also in my experience, on anything faster then 10MB, don't bother with compression (it's really a CPU to network speed ratio, on transfers I did regularly that was the rule of thumb with P4 2.2Ghz Xeons). Also, I removed the "v" from the source tar, as it duplicates every file name twice and can
Re: (Score:3, Informative)
ssh user@host.com tar -C /remote/path -cpzf - remotefile1 remotefile2 | tar -C /local/path -xvzp -
Re:Must be why rsync over ssh is much faster (Score:4, Informative)
Parent
Can't we all just get along? (Score:3, Interesting)
Encrypted and tunneled over SSH, rsync is spawned by a login shell at the other side:
rsync
Not encrypted, rsyncs daemon must be running at other end:
rsync
rsync
Alternative solution for a trusted LAN (Score:5, Interesting)
Re:Alternative solution for a trusted LAN (Score:5, Informative)
-c blowfish|3des|des
Selects the cipher to use for encrypting the session. 3des is
used by default. It is believed to be secure. 3des (triple-des)
is an encrypt-decrypt-encrypt triple with three different keys.
blowfish is a fast block cipher, it appears very secure and is
much faster than 3des. des is only supported in the ssh client
for interoperability with legacy protocol 1 implementations that
do not support the 3des cipher. Its use is strongly discouraged
due to cryptographic weaknesses.
Parent
Re:Alternative solution for a trusted LAN (Score:5, Informative)
Copying 100MB of data over 100mbit ethernet to a P2 350Mhz box (the slowest I got) gives:
* 3des 1.9MB/s
* AES 4.8MB/s
* blowfish 4.4MB/s
Parent
Re: (Score:3, Informative)
-c cipher_spec
Selects the cipher specification for encrypting the session.
Protocol version 1 allows specification of a single cipher. The
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
SSH is one of those uberutilities that has a surprising amount of usefulness once you dig a bit. Sure, secure telnet functionality is great, and I use it a lot. But, I still use ssh on my own LAN where I don't really care about security. I use sshfs because it is easier and more convenient for me than bothering with Samba. SCP/SFTP to avoid bothering with ftp. I use it for forwarding ports between various machines, and I use i
Re: (Score:2)
Server: nc -l 1234 | tar -x
Client: tar -c file_list_here | nc localhost 1234
Re: (Score:3, Informative)
How will this affect application deployment (Score:2)
This is one of the most useful aspects of Mandriva, but as the number of nodes I have to manage increases, I find RPMS being SCPed to other nodes taking longer and longer. I think this is because even though with Kerberos Authentication is much faster, urpmi is waiting until one node finishes copying the files to start copying to the next node in the Domain.
Thoughts?
Sweet! (Score:5, Insightful)
Re:Sweet! (Score:4, Informative)
Parent
To *have* such problems... (Score:5, Interesting)
Between two devices on my gigabit home LAN, the CPU barely even registers while SCP'ing a large file (and that with every CPU-expensive protocol option turned on, including compression). What sort of connection do these guys have, that the CPU overhead of en/decryption throttles the transfer???
Coming next week: SSH compromised via a thread injection attack, thanks to a "feature" that only benefits those of us running our own undersea fiber.
Re:To *have* such problems... (Score:5, Informative)
Parent
Re:To *have* such problems... (Score:5, Informative)
Parent
Re: (Score:3)
Many apps set fixed window sizes (incl. apparently standard SSH - the webpage implies 64K.)
Linux can "autotune" window sizes, but most OSes don't, hence the need for an app to be able to specify a larger window.
Even with larger window sizes, TCP congestion control starts breaking on networks wit
Re:To *have* such problems... (Score:5, Interesting)
Have you measured your actual throughput on the file transfer? It tends to take a crapload of tuning to get anywhere near saturating gigabit, even if you're not using encrypted transfers.
I wrote the bit below which I'll keep because it might be interesting to someone, but dm(Hannu) already mentioned the claw flaw in the logic behind the PP and article summary: if the CPU is the bottleneck, how could adding more threads possibly help?
Just for a laugh I used scp to copy a 512 MB file from my file server to itself, an Athlon 3700+ running at 2.2ghz. I got about 18 megabytes / second out of it. I took a snapshot of top's output right at the end (97% complete) and the CPU usage was as follows:
ssh: 48.6%
sshd: 44.9%
scp: 3.7%
scp: 1.3%
pdflush: 0.7%
So this system was pretty much pegged by this copy operation, and it achieved less than a fifth the capacity of a gigabit network link. Obviously the system is capable of transferring data much faster than this; the source was a RAID-5 set of 5 new 500 GB drives, and the destination was a stripe across two old 40 GB drives. I'd also repeated the experiment a few times (and this was the fastest transfer I got) so it's likely the source file was cached, too.
I do agree that there's probably more interesting and useful things to optimise (and make easy to optimise) than scp's speed, but I know for sure that scp'ing iso images to our ESX servers in a crapload slower than using Veeam's copy utility or the upload facility in the new version of Infrastructure Client (at least I think it's new, never noticed it before).
Parent
Re:To *have* such problems... (Score:5, Interesting)
A possible problem source here is that you're also doing disk I/O, when transferring data on my home network I've noticed that rsyncing things for redundancy purposes I end up with a lot more CPU usage (even when reading from a RAID5 via a hardware controller) than if I just pump random data from one machine to another. I reommend you try just transferring random data and piping it directly to /dev/null on the receiving machine to see if there's any difference in CPU usage.
/Mikael
Parent
Linux kernel version 2.6.17 to 2.6.24.1 (Score:3, Funny)
Pretty much totaly incorrect summary (Score:5, Informative)
By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
Re: (Score:3, Informative)
Not really, for some of the stuff I do via SSH: eg logging into my webhost to untar a patch and apply it the only part of the transaction I want to be secure is my initial password/key-exchange post authentication I really don't give a stuff who sees me type
or any of the other commands I type in. However it should be down to the admin of the system in the first
Re: (Score:3, Insightful)
By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
I'd like it when I tunnel a new SSH or scp through another SSH tunnel. We call it a sleeve. I've had to sleeve within a sleeve's sleeve before to get through multiple SSH gateways and firewalls to an inner system. You can tell ssh to use XOR but I'm not sure you can in scp.
Of course, if speed is paramount, you can use netcat inside the sleeve(s) to copy files. No encryption of the netcat
Re:Pretty much totaly incorrect summary (Score:4, Informative)
Parent
Hardware acceleration (Score:5, Interesting)
News just in! (Score:5, Funny)
Some comments from one of the authors (Score:5, Informative)
A couple notes about the multi-threading: The main goal was to allow SSH to make use of multiple processing cores. The stock OpenSSH is, by design, limited to using one core. As such a user can encounter situations where they have more network capacity and more compute capacity but will be unable to exploit them. The goal of this patch was to allow users to make full use of the resources available too them. The upshot of this is that its best suited for high performance network and compute environments (The HPN in HPN-SSH stands for High Performance Networking). This doesn't mean it won't be useful to home users - only that they might not see the dramatic performance gains someone in a higher capacity environment might see. Its really going to depend on the specifics of their environment.
Based on our research we decided the most effective way to do this would be to make the AES-CTR mode cipher multi-threaded. The CTR mode is well suited to threading because there is no inter block dependency and, even better, the resulting cypher stream is indistinguishable from a single threaded CTR mode cypher stream. As a result, we retain full compatibility with other implementations of SSH - you don't need to have HPN-SSH on both sides of the connection. Of course, you won't see the same improvements unless you do.
We still see this as somewhat experimental because we've not yet implemented a way to allow users to choose between a single threaded AES-CTR and multi-threaded AES-CTR mode. As such users on single core machines - if using AES-CTR may see a decrease in performance. We suggest those users just make use of the AES-CBC mode instead (which is the default anyway). Also, you need to be able to support posix threads.
Future work will involve pipelining the MAC routine and that should provide us with another 30% or so improvement in throughput.
Also, its important to keep in mind that these improvements are *not* just for SCP but for SSH as a whole. People using HPN-SSH as a transport mechanism for rsync, tunnels, pipes, and so forth may also see considerable performance improvements. Additionally, the windowing patches don't necessarily require HPN-SSH to be installed on both ends of the connection. As long as the patch is installed on the receiving side (the data sink) you may (assuming you were previously window limited) see a performance gain.
We welcome any comments, suggests, ideas, or problem reports you might have regarding the HPN-SSH patch. Go the website mentioned above and use the email address there to get in touch with us. This is a work in progress and we are doing what we can to enable line rate easy to use fully encrypted communications. We've a lot more to do but I hope what we've done so far is of use and value to the community.
Re: (Score:3, Insightful)
Re: (Score:3, Informative)
BDP is the bandwidth-delay product. BDP is one of the main things these patches address. Loopback has very, very little delay. You could, I suppose, add artificial delay over loopback, but now you're diverging further from the actual deployment scenario.
The other thing is that when sender and receiver are the same host, you don't engage the full network stack (no ethernet queuing, for example, no dropped packets, etc. etc.), so you don't find out all the curve balls that TCP/IP will throw you.
And yet a