London Stock Exchange Tackles System Problem 237
DMandPenfold writes "The London Stock Exchange has taken steps to resolve a system problem that occurred at 4.30pm Tuesday, which saw a delay to the start of the closing auction and knocked out automatic trades during a 42 second period. The problem occurred a day after the high profile launch of its new matching engine on the main equities market, based on the SUSE Linux system from Novell."
42 seconds, that's eternity! (Score:5, Insightful)
Re:First posters are lame (Score:5, Insightful)
No test can ever be a 100% accurate representation or real use.
It's probably somewhere in the 0.1% mismatch where this problem occurred.
Re:Well, well, well... (Score:5, Insightful)
This just shows that it's hard to build these highly available, low latency, massive usergroup systems. Previously there was a lot of chatter about the platforms (.NET, MSSQL 2003, etc...)
Yes. And let us not forget that a lot of that chatter came from Microsoft's PR department.
Re:"Oh well I guess Linux sucks then (Score:5, Insightful)
To be fair, if this was Windows you *know* the Linux fans (myself included) would be berating them for choosing such a crappy platform. And it was Windows, and we did berate them for it...
Of course, the truth is somewhere in the middle - isn't it always? The most important part of a trading computer system set up is, well, the trading computer system software. The system's biggest problems aren't going to be due to Coolwebsearch or a bluescreen, they'll be with the trading software itself. The OS doesn't really matter. Buggy software is buggy software.
That being said, Linux is just a better platform to build something like this on. Sure, you can do it with Windows and make it work, but it's just more and unnecessarily difficult.
Re:First posters are lame (Score:3, Insightful)
Re:First posters are lame (Score:5, Insightful)
This works for a component with one set of inputs and one set of outputs.
A trading system is essentially chaotic in the way it processes data because it gets so many inputs and their relative arrival times determine the system's behavior. You'd have to be replacing the old system with an identical new one, and then add heavy and slow synchronization to all the inputs going to both systems (so e.g. a trade A hitting the old system one microsecond before trade B also hits the new system the same way).
So yes, it comes down to running a whole lot of offline tests using real data and then bringing it online.
Someone skimped on testing (Score:5, Insightful)
I've seen this too often in my 25+ professional years in IT. The system test manager produces an excellent plan, that fully simulates the anticipated workload. But it requires X testers, Y test case developers and Z machines. The program manager rejects the plan, "because he is under pressure to reduce costs." The program manager says, "The testing that the developers do should be enough." He then moves on, before the system goes into production.
The result? It always ends in tears.
Teething problems (Score:5, Insightful)
Previously there was a lot of chatter about the platforms (.NET, MSSQL 2003, etc...)
It's one thing to have a 42 seconds glitch in the first day a totally new system is powered up. That's perfectly normal, and had been predicted [computerworlduk.com]:
"Observers watching today's Linux-based launch will likely note that such a large change could bring about some teething problems, as with any technology overhaul."
It's a totally different thing to have it stop for a whole day after having been in operation for three months.
So, in conclusion, yes, it's about the platform. .NET, MSSQL 2003, etc aren't robust enough for this kind of job.