Multi-Threaded SSH/SCP 228
neo writes "Chris Rapier has presented a paper describing how to dramatically increase the speed of SCP networks. It appears that because SCP relies on a single thread in SSH, the crypto can sometimes be the bottleneck instead of the wire speed. Their new implementation (HPN-SSH) takes advantage of multi-threaded capable systems dramatically increasing the speed of securely copying files. They are currently looking for potential users with very high bandwidth to test the upper limits of the system."
A likely story (Score:5, Funny)
Re:A likely story (Score:5, Insightful)
VHS v Betamax comes to mind.
Re:A likely story (Score:5, Insightful)
Re:A likely story (Score:5, Insightful)
Re: (Score:3, Interesting)
Re:A likely story (Score:4, Insightful)
It is rare that you can completely separate every context of every step of your processing. There is always some data that needs to be shared between the threads and they become bottlenecks. The faster you serve your requests, the worse the contention (waiting for a resource) and thus the inefficiency.
It depends on the task at hand and on your architecture. A file or web server is less likely to encounter contention than for example an IRC server. The first requires some authentication and resource resolving through configuration data but the actual data can be send without interference from other requests. An IRC server requires constant lookups in the user database for routing information and this is likely to take longer than actually sending the messages (even without multi-threading). In these cases, you really have to think your locking scheme through or you will lose more time waiting for a lock than doing actual work - defeating much of the purpose of going MT.
When it comes to architecture, multi threading is an option in your architecture, not an architecture in itself. There is no problem doing a multi-threaded event-driven architecture or a MT message passing architecture -- these are actually very effective. For some interesting reading about this, I would suggest you check out the SEDA white paper [harvard.edu] for a pretty in depth list of options and their goals.
Why is it bad for programmers? Because locking is hard to do in itself and if your locking scheme is subobtimal it often requires a lot of work to change it afterwards.
Re: (Score:3, Informative)
Re:A likely story (Score:5, Insightful)
Which explains why football players and movie stars will get paid more than the innovators that carried out the research to develop the broadcast technology that helped to make those stars famous.
Re: (Score:2)
Porn powers computer science.
Re: (Score:2, Insightful)
Re: (Score:2, Funny)
Re: (Score:3, Funny)
Re: (Score:2)
Re:FUNNY?! That's not funny, try for TRUE (Score:5, Insightful)
When your child first looks up at your face and you see actual recognition in her eyes... when you see all the blocks fall into place as she figures out how to do something for the first time... look, I know it sounds really sappy and smarmy, but seriously (srsly) it is absolutely indescribable. This thing started out as a bit of genetic code from two people, and now it is actually self-aware and sentient. How cool is that? What geek can't be astonished at these emergent properties, derived from a program more complicated than you can possibly imagine -- a program that has spontaneously evolved over time?
And you get to see her mental map evolve. You watch branches get added to her decision tree. You observe as she learns how to acquire information, process it, and decide how to act upon it. And all the while, you mold her view of the world based on your interactions with her. I don't know about you, but I find that not only fascinating, but incredibly rewarding.
Before my daughter was born, I was terrified too, and somebody had said these things to me, I would've said, "Yeah, okay, I'm sure it's great and all, but I'm sure you're exaggerating somewhat." That's because there is something that happens to you when it's your kid. There's some very ancient, very basic code that gets turned on in your brain that says "this life is your responsibility, and you must do everything you can to ensure its safety, survival, and growth". I can't explain it because I honestly believe it's something buried deep beneath the conscious mind.
Whatever the case, if you honestly don't want the baby, for it's sake, put it up for adoption. Don't make it live a life with a father who doesn't care for it. I'm being absolutely serious here. Find a loving couple who are unable to have kids of their own.
(Posting AC because this is way offtopic, and because there are a lot of single, selfish, bitter child-haters out there with mod points to burn... but I had to say something.)
Re:FUNNY?! That's not funny, try for TRUE (Score:5, Interesting)
Don't mistake my badly crafted joke for being completely ignorant of what's ahead of me; before the final decision came, I had consulted with friends who are also parents (carefully not discussing this with any of my single, singlemindedly free-roaming friends), and I am in no way in doubt that I will make this child a net benefit for the human race. There are simply too many rotten parents, spoilt children, miserable families and bad genes in the world for me to actually fail in that respect.
Plus, living in Denmark*, the baby will have pretty good odds for a good life, my involvement notwithstanding.
I am going to have a lot of fun making tech projects for my little one when that time comes, including audio books with his/her favourite bed time stories, video diaries of how the child evolves, and of course, teaching how to solder before the age of 5. How I survived until 15 without that knowledge eludes me to this day.
*: Studies have shown that there is a tie for Country With Best Quality of Life; Denmark and Iceland. I've been to Iceland, and it smelled like rotten eggs. Denmark takes the lead.
Re: (Score:2)
Get the girl pregnant, get married, get to know the girl.
The "conventional" approach is the other way around of course
Of course for us slashdotters there's step 0) - find a girl...
Re: (Score:3, Informative)
Re: (Score:2, Interesting)
Re:A likely story (Score:4, Funny)
Oblig applicable Dilbert (Score:5, Funny)
(PHB rejects suggestion)
(later)
Wally: "I was this close to making it my job to download naughty pictures."
Dilbert : "It's just as well; I would have had to kill you."
( http://books.google.com/books?id=dCeVfKrZ-3MC&pg=PA77&source=gbs_selected_pages&cad=0_1&sig=xD5tmMhG1RcspLch8gCIJu8ro2U#PPA79,M1 [google.com] )
this is just what i've been looking for (Score:2, Funny)
Must be why rsync over ssh is much faster (Score:5, Insightful)
Big scp copies through my wifi router used to cause kernel panics under netbsd current of about a year ago. I never had that problem running rsync inside ssh.
Re: (Score:2, Insightful)
Rsync is only really useful as a synchronizing method between a source and a out of date copy.
Then its real benefits get shown.
Re:Must be why rsync over ssh is much faster (Score:4, Informative)
Re: (Score:2)
You can run it from inetd or as a daemon, but it's unrelated to rsh.
That connection may or may not be encrypted depending on the route it takes.. VPNs tend to be encrypted for example, but LAN connections not.
Re: (Score:2)
There are eight different ways of using rsync. They are:
* for copying from the local machine to a remote machine using a remote shell program as the transport (such as ssh or rsh). This is invoked when the destination path contains a single : separa-tor.
* (same as the above, but copy from remote to local machine)
Note that the remote machine could alternately have an rsync server. However, this is not required -- if the remote machine does not have an rsync server, transport is done vi
Can't we all just get along? (Score:3, Interesting)
Encrypted and tunneled over SSH, rsync is spawned by a login shell at the other side:
rsync
Not encrypted, rsyncs daemon must be running at other end:
rsync
rsync
Re: (Score:2)
Re: (Score:3, Informative)
Re:Must be why rsync over ssh is much faster (Score:5, Interesting)
And in my experience rsync is faster.
Re:Must be why rsync over ssh is much faster (Score:5, Interesting)
If you just want to copy some files from system to system in an encrypted fashion, then the BEST option by far is to use tar, and pipe it through ssh like so:
tar cvfpz - * | ssh user@host '( cdThis example will compress and encrypt your data before sending it; on the other end, the file is streamed to tar. This example requires GNU rar or a close facsimile.
Now, if you want to UPDATE a directory, use rsync:
rsync -av -e ssh * user@host:/destination/Because rsync will do partial checksums and send parts even of BINARY files if the whole file has not changed, and doesn't re-send unchanged files, rsync makes sense when updating a directory. But it provides no speedup benefit over using tar, and in fact the directory scans it does before the sync mean that it may actually be slower.
Use scp only for copying single files, because you're right, scp chokes between each file.
Re: (Score:2, Informative)
Re: (Score:3, Informative)
tar cfpz - . | ssh user@host '( cd /destination ; tar xfpvz - )'
I'd use a "." instead of *, it avoids shell line length problems, and will also copy hidden files... as someone who as learned this the hard way. Also in my experience, on anything faster then 10MB, don't bother with compression (it's really a CPU to network speed ratio, on transfers I did regularly that was the rule of thumb with P4 2.2Ghz Xeons). Also, I removed the "v" from the source tar, as it duplicates every file name twice and can
Re: (Score:2)
Re: (Score:3, Informative)
ssh user@host.com tar -C /remote/path -cpzf - remotefile1 remotefile2 | tar -C /local/path -xvzp -
Alternative solution for a trusted LAN (Score:5, Interesting)
Re:Alternative solution for a trusted LAN (Score:5, Informative)
-c blowfish|3des|des
Selects the cipher to use for encrypting the session. 3des is
used by default. It is believed to be secure. 3des (triple-des)
is an encrypt-decrypt-encrypt triple with three different keys.
blowfish is a fast block cipher, it appears very secure and is
much faster than 3des. des is only supported in the ssh client
for interoperability with legacy protocol 1 implementations that
do not support the 3des cipher. Its use is strongly discouraged
due to cryptographic weaknesses.
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:3, Interesting)
SSH is one of those uberutilities that has a surprising amount of usefulness once you dig a bit. Sure, secure telnet functionality is great, and I use it a lot. But, I still use ssh on my own LAN where I don't really care about security. I use sshfs because it is easier and more convenient for me than bothering with Samba. SCP/SFTP to avoid bothering with ftp. I use it for forwarding ports between various machines, and I use i
Re:Alternative solution for a trusted LAN (Score:5, Informative)
Copying 100MB of data over 100mbit ethernet to a P2 350Mhz box (the slowest I got) gives:
* 3des 1.9MB/s
* AES 4.8MB/s
* blowfish 4.4MB/s
Re: (Score:3, Informative)
-c cipher_spec
Selects the cipher specification for encrypting the session.
Protocol version 1 allows specification of a single cipher. The
Re: (Score:2)
$ cat
Protocol 2
Re: (Score:2)
Or just compile from source and enable the 'none' "cipher".
I surely missed having that option when copying files between hosts on my LAN. I don't need to hide data from myself. If someone else connects and encrypting data is a concern, I'll simply not use the 'none' "cipher".
-1 redundant, -1 fail at slashdot
from the linked article:
Dynamic Windows and None Cipher
This is a basis of the HPN-SSH patch set. It provides dynamic window in SSH and the ability to switch to a NONE cipher post authentication. Based on the HPN12 v20 patch.
Re: (Score:2)
Server: nc -l 1234 | tar -x
Client: tar -c file_list_here | nc localhost 1234
Re: (Score:3, Informative)
Re: (Score:2)
How will this affect application deployment (Score:2)
This is one of the most useful aspects of Mandriva, but as the number of nodes I have to manage increases, I find RPMS being SCPed to other nodes taking longer and longer. I think this is because even though with Kerberos Authentication is much faster, urpmi is waiting until one node finishes copying the files to start copying to the next node in the Domain.
Thoughts?
Sweet! (Score:5, Insightful)
Re: (Score:2)
I also wonder if one cant get similar performance gains with normal ssh and for example forwarded X-windows.
Probably not. The X11 protocol is very latency sensitive, so the bottleneck tends to be round-trip times rather than raw throughput.
I haven't read the article, so I don't know what it says about per-packet set-up times, but I wouldn't be surprised if latency was actually increased due to the overhead of having to at least decide to distribute encryption work across multiple CPUs.
Re:Sweet! (Score:4, Informative)
Are threading and security mutually exclusive? (Score:2)
AFAIK, the OpenBSD kernel has adopted the SMP approach of the Linux 2.2 kernel (i.e. one great big kernel lock), and threads are implemented in a userland library. I assume that there will be less of a performance benefit on OpenBSD.
Given this stance, is it very likely that either the core maintainers, or the maintainers for the portable releases, will integrate this code?
Given the danger of protecting critical sections of code from race conditions and other exploits, should we keep things simple?
p.s.
Re: (Score:2)
...assuming you do like seeing gcc lines go by for extended periods of time, no?
To *have* such problems... (Score:5, Interesting)
Between two devices on my gigabit home LAN, the CPU barely even registers while SCP'ing a large file (and that with every CPU-expensive protocol option turned on, including compression). What sort of connection do these guys have, that the CPU overhead of en/decryption throttles the transfer???
Coming next week: SSH compromised via a thread injection attack, thanks to a "feature" that only benefits those of us running our own undersea fiber.
Re:To *have* such problems... (Score:5, Informative)
Re:To *have* such problems... (Score:5, Informative)
Re: (Score:2)
The limitations of transfer rates for scp is often the round trip time that consumes time for confirmation of received packages. This is a serious issue for transfers from the Europe to the US West Coast (around 200 ms) or to Australia (around 400 ms).
Huh. I'm surprised TCP sliding window protocol doesn't take care of that. Shouldn't it account for filling up the pipeline between sender and receiver?
Re: (Score:3)
Many apps set fixed window sizes (incl. apparently standard SSH - the webpage implies 64K.)
Linux can "autotune" window sizes, but most OSes don't, hence the need for an app to be able to specify a larger window.
Even with larger window sizes, TCP congestion control starts breaking on networks wit
Re:To *have* such problems... (Score:5, Interesting)
Have you measured your actual throughput on the file transfer? It tends to take a crapload of tuning to get anywhere near saturating gigabit, even if you're not using encrypted transfers.
I wrote the bit below which I'll keep because it might be interesting to someone, but dm(Hannu) already mentioned the claw flaw in the logic behind the PP and article summary: if the CPU is the bottleneck, how could adding more threads possibly help?
Just for a laugh I used scp to copy a 512 MB file from my file server to itself, an Athlon 3700+ running at 2.2ghz. I got about 18 megabytes / second out of it. I took a snapshot of top's output right at the end (97% complete) and the CPU usage was as follows:
ssh: 48.6%
sshd: 44.9%
scp: 3.7%
scp: 1.3%
pdflush: 0.7%
So this system was pretty much pegged by this copy operation, and it achieved less than a fifth the capacity of a gigabit network link. Obviously the system is capable of transferring data much faster than this; the source was a RAID-5 set of 5 new 500 GB drives, and the destination was a stripe across two old 40 GB drives. I'd also repeated the experiment a few times (and this was the fastest transfer I got) so it's likely the source file was cached, too.
I do agree that there's probably more interesting and useful things to optimise (and make easy to optimise) than scp's speed, but I know for sure that scp'ing iso images to our ESX servers in a crapload slower than using Veeam's copy utility or the upload facility in the new version of Infrastructure Client (at least I think it's new, never noticed it before).
Re:To *have* such problems... (Score:5, Interesting)
A possible problem source here is that you're also doing disk I/O, when transferring data on my home network I've noticed that rsyncing things for redundancy purposes I end up with a lot more CPU usage (even when reading from a RAID5 via a hardware controller) than if I just pump random data from one machine to another. I reommend you try just transferring random data and piping it directly to /dev/null on the receiving machine to see if there's any difference in CPU usage.
/Mikael
Re: (Score:2)
Re: (Score:2, Interesting)
True enough, but my main point was that getting to actual gigabit speeds in the first place is actually pretty difficult. Plus, I couldn't find an easy way to copy only X amount of "random" data via scp which was the point of the article. Regardless, copying data is rarely if ever a useful thing to do with scp, anyway.
Re: (Score:2)
Re: (Score:2)
Pretty sure the article summary covered this - it is intended for multicore/multiprocessor systems.
i.e. a single CPU is a bottleneck, but multithreading allows the load to be distributed over multiple CPUs, removing the bottleneck a single CPU might provide.
Re: (Score:2, Informative)
Re: (Score:2)
My home server/internet gateway is a Pentium MMX at 200MHz, with a 100 Mb/s NIC.
With SCP (default options, server sending), I can transfer at 8Mb/s.
With RCP, at 25 Mb/s
Re: (Score:2)
Re: (Score:2)
They are a node on the Teragrid which has throughput over some segments of around 100Gb/s
Re: (Score:2)
the crypto can sometimes be the bottleneck instead of the wire speed.
Between two devices on my gigabit home LAN, the CPU barely even registers while SCP'ing a large file (and that with every CPU-expensive protocol option turned on, including compression). What sort of connection do these guys have, that the CPU overhead of en/decryption throttles the transfer???
Coming next week: SSH compromised via a thread injection attack, thanks to a "feature" that only benefits those of us running our own undersea fiber.
A worked on a program that is similar to what the summary describes. For various legacy reasons (legacy code-base) we only supported Triple-DES encryption. But the bottlenecks break down as follows:
Network bandwidth is easily over-come to a degree - in the manner the summary describes, it will easily fill the pipe if you can get past the other two issues.
CPU/encryption - this is really more the encryption and how much it affects you will depend on wha
Re: (Score:2)
Re: (Score:2)
Linux kernel version 2.6.17 to 2.6.24.1 (Score:3, Funny)
Re: (Score:2)
Re: (Score:2)
Pretty much totaly incorrect summary (Score:5, Informative)
By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
Re: (Score:3, Informative)
Not really, for some of the stuff I do via SSH: eg logging into my webhost to untar a patch and apply it the only part of the transaction I want to be secure is my initial password/key-exchange post authentication I really don't give a stuff who sees me type
or any of the other commands I type in. However it should be down to the admin of the system in the first
Re: (Score:2)
By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
Yes, you almost might as well just use telnet or rlogin.
The only advantage ssh with no cipher is that an attacker will not see your authentication details (password or key) to login to the remote machine.
Unfortunatly just like telnet, using ssh with the none cipher opens the connection up to tcp hijacking and injection of packets, so the attacker doesnt really need your password anymore, they can just execute commands as you on the server once you are authenticated.
My guess is with the dynamic tcp window s
Re: (Score:2)
Yes, you almost might as well just use telnet or rlogin.
Sorta. If you're on a private network sending a 4Gig ISO (or other large file/files) why do you need the data to be encrypted? Encrypting credentials is sufficient.
Re: (Score:2)
Sorta. If you're on a private network sending a 4Gig ISO (or other large file/files) why do you need the data to be encrypted? Encrypting credentials is sufficient.
Exactly. As long as you can trust (or at worse, assume) your LAN is secure.
Much easier to do on a home network where it is either just you, or you and family.
A rather safe assumption to make if your client machines and 'servers' are secured from eachother, limiting the potential damage from an infected wi^H^H^H client machine.
THe main use I see is that the scp command is damn handy compared to most any other command line method of transfering files. Especially so with RSA/DSA keys.
As far as people that u
Re:Pretty much totaly incorrect summary (Score:4, Informative)
Re: (Score:3, Insightful)
By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
I'd like it when I tunnel a new SSH or scp through another SSH tunnel. We call it a sleeve. I've had to sleeve within a sleeve's sleeve before to get through multiple SSH gateways and firewalls to an inner system. You can tell ssh to use XOR but I'm not sure you can in scp.
Of course, if speed is paramount, you can use netcat inside the sleeve(s) to copy files. No encryption of the netcat
Re: (Score:2, Informative)
Hardware acceleration (Score:5, Interesting)
Re: (Score:2, Informative)
So I guess they disappeared..
Re: (Score:2)
Re: (Score:2)
I've been wondering, does there exist hardware accelerators usable by OpenSSL or GnuTLS? I work in embedded systems, and our chip includes a crypto and hash processor. I'm surprised nothing equivalent exists on modern PCs, or have I just not been looking in the right places?
The VIA C7 processor has hardware crypto acceleration (AES and some helper functions) that's supported by OpenSSL out of the box. Applications still require some patching, for example OpenSSH. The reason seems to be that the application has to choose the encryption engine used by OpenSSL.
http://bugs.gentoo.org/show_bug.cgi?id=162967 [gentoo.org]
Re: (Score:2)
I've used a Hifn Crypto Accelerator [hifn.com] a year or three ago. Worked with OpenSSL for the most part.
Re: (Score:2, Interesting)
The Sun T1 and T2 processors in the T2000 and T5000 also have onchip crypto units 1 on the T2000 and 8 on the T5000 which accelerate OpenSSL traffic by offloading DES, AES, MD5 etc.
Re: (Score:2)
As others have mentioned, the Via cpus have built-in accelerators which avoid those memory copies
News just in! (Score:5, Funny)
why look for high bandwidth only ? (Score:2)
of course, if they want to test how well it scales, that's a different matter.
Comcast (Score:2, Funny)
You comcast users can forget about it, then.
Old news? (Score:2, Interesting)
Not that I'm that surprised to see this is old news, since they're apparently on major revision 13...
Multithreading waaay to go... (Score:2, Insightful)
WE NEED MULTITHREADING NOW BIG AND EVERYWHERE.
Multithreading is maybe the biggest change in software development. In contrast to advanced command sets like MMX, SSE and so on it is not about some peep hole optimization, about replacing a bunch of x86_32 commands with some SSE commands, it is about changing the whole approach, finding new algorith
Some comments from one of the authors (Score:5, Informative)
A couple notes about the multi-threading: The main goal was to allow SSH to make use of multiple processing cores. The stock OpenSSH is, by design, limited to using one core. As such a user can encounter situations where they have more network capacity and more compute capacity but will be unable to exploit them. The goal of this patch was to allow users to make full use of the resources available too them. The upshot of this is that its best suited for high performance network and compute environments (The HPN in HPN-SSH stands for High Performance Networking). This doesn't mean it won't be useful to home users - only that they might not see the dramatic performance gains someone in a higher capacity environment might see. Its really going to depend on the specifics of their environment.
Based on our research we decided the most effective way to do this would be to make the AES-CTR mode cipher multi-threaded. The CTR mode is well suited to threading because there is no inter block dependency and, even better, the resulting cypher stream is indistinguishable from a single threaded CTR mode cypher stream. As a result, we retain full compatibility with other implementations of SSH - you don't need to have HPN-SSH on both sides of the connection. Of course, you won't see the same improvements unless you do.
We still see this as somewhat experimental because we've not yet implemented a way to allow users to choose between a single threaded AES-CTR and multi-threaded AES-CTR mode. As such users on single core machines - if using AES-CTR may see a decrease in performance. We suggest those users just make use of the AES-CBC mode instead (which is the default anyway). Also, you need to be able to support posix threads.
Future work will involve pipelining the MAC routine and that should provide us with another 30% or so improvement in throughput.
Also, its important to keep in mind that these improvements are *not* just for SCP but for SSH as a whole. People using HPN-SSH as a transport mechanism for rsync, tunnels, pipes, and so forth may also see considerable performance improvements. Additionally, the windowing patches don't necessarily require HPN-SSH to be installed on both ends of the connection. As long as the patch is installed on the receiving side (the data sink) you may (assuming you were previously window limited) see a performance gain.
We welcome any comments, suggests, ideas, or problem reports you might have regarding the HPN-SSH patch. Go the website mentioned above and use the email address there to get in touch with us. This is a work in progress and we are doing what we can to enable line rate easy to use fully encrypted communications. We've a lot more to do but I hope what we've done so far is of use and value to the community.
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
BDP is the bandwidth-delay product. BDP is one of the main things these patches address. Loopback has very, very little delay. You could, I suppose, add artificial delay over loopback, but now you're diverging further from the actual deployment scenario.
The other thing is that when sender and receiver are the same host, you don't engage the full network stack (no ethernet queuing, for example, no dropped packets, etc. etc.), so you don't find out all the curve balls that TCP/IP will throw you.
And yet a
Re: (Score:2)
So all you need are three machines: client, "pipe", server.
Do a netcat or something to see how fast things are without scp.
The trouble is if your Gbps NICs don't actually do Gbps speeds, there are a fair number of those out there that can't sustain 1Gbps. Fortunately nowadays most onboard NICs aren't too crappy.