Linux at Supercomputing '98

Posted by sengan on Monday November 16, 1998 @11:14AM from the where-we-kick-ass dept.

John A. Turner writes "Haven't seen anything on /. about how much Linux-related stuff there was at Supercomputing '98 so thought I'd mention it. One of the best things was a panel discussion titled "Clusters, Extreme Linux, and NT". There's a nice summary of the Linux-related events at SC '98 at the Extreme Linux site " Note that reactions to Red Hat's support options announcements included One area in which Linux is far ahead of the pack is clustering . Has any participant written up a summary we could post? Update Rahul Dave has written a report for us.

I operate a beowulf cluster at Univ of Pennsylvania. I went to SC98 to attend the tutorials and see the exhibits, and learn more stuff. It was a good experience. I'll have picures soon(Wednesday). (see http://reno.cis.upenn.edu/~rahul/linuxatsc98.txt)

My cluster is here and here, if you are interested)

Linux at SC98

Beowulf BOF

The BOF had more than 30 people there. Some had to stand.

There was a Beowulf BOF, in which a emotional speech was given by Thomas Sterling, one of the original pioneers, in which he claimed that we have already won the battle as we have forced a change in the mindset of people doing computing today, as to the benefits of open source. He said it is amazing today the interest in Beowulf at SC98, and that now the Supercomputing community can harness the same distributed creative energy thats driven Linux.

The point came up--whats a beowulf? The answer, at 0th level was: COTS technology cluster using Open Source Software for scientific computation. Most Beowulves use commodity interconnects, and have one point of entry, with each branch of a job having a processor to itself. People do use it for databasing(we do) and web serving and so on.

With the advent of SMP's and the cheapness of Intel based machines, bigger installations with multiple departmental users, utilization is important, and job scheduling was one of the talked about topics. Some kind of scheduler will probably be on the next extreme Linux CD, to be burnt around end Feb or so. There was a paper on scaling in Beowulves, concluding that software routing had some scalability problems but a tree of switches provided good scalability at greater cost. They have made available a synthetic load generator.

The most wanted thing is Rollout and System management tools. The idea is to give as much of a single system image notion as useful. Job Migration was pinpointed as being particularly important as a bridge to full-fledged parallel programming. Unfortunately no open-source implementation exists (Job Migration source is not available from Mosix).

If you have Rollout, cluster administration, round robin web serving, etc prepackaged for beowulves, contact me (rdave@central.cis.upenn.edu). Currently all of us use our own rollout and administration mechanism's, and the Extreme Linux CD folks would like to have some offering on the CD so that there is a everything at one place solution.

Robert Hart from Red Hat made the point that the extreme linux CD was thought off by a lot of the press as a high availability clustering solution, and that we need to make it clear that its a scientic computing solution.

Lots of discussion was there about the next edition of the CD, to be based on RedHat 5.2. Someone is planning to provide debs of the add-on software too. Only open-source and non-export-controlled(write your idiotic govt!) software will be on the CD.

There was some discussion on what happened in the "loss of web site" crisis. The upshot of it was consult on your organizations software release policies before releasing. Export reviews will probably happen in the future.

Products

Paralogic was demoing bert

The Legion folks were showing of their object based "metasystem" for authentication, seamless filesystem access, scheduling, etc. Essentially in the words of Greg Lindahl, it allows you to concentrate for example on your plugin scheduler while taking care of the authentication, filesystem transparency, etc, instead of spreading yourself thin and doing a lousy job on all of these which are not your forte. Go and download it if you are interested.

Some company was demoing parallel Linda for Linux.

Objectivity was plugging their databases, LSF their cluster management software, and the portland group their compilers.

Totalview is considering porting their parallel debugger to Linux. Its a nice product. If you want it, holler to them. They are looking for consumers. They were there at the BOF and there was considerable demand.

Other groups

Ameslab had a booth with posters on a new network layer called Bobnet which provides 97MBps on Fast Ethernet ping pong, with lower latencies than TCP IP. They also have a lite version of MPI which provides a large amount of MPI's functionality with way more bandwidth than MPICH. It runs both on TCP and BobNet, which has a VIA compatibility layer.

Legion(see up) Fermilab--posters about their analysis farms.

The High Performance Debugging Forum of the Ptools consortium was interested in gdb's thread support for their parallel debugger. Whats the status of kernel thread debugging on Linux? I believe one has to use SmartGDB for user level thread support. Their reference implementations are going to be on SP-2 and SGI Origin2000. I believe there will be source. They will be using the debugging infrastructure in Nasa's p2d2 debugger which uses dbx and gdb to do the real work.

Clusters

Compaq demonstrated a 4-way alpha Beowulf cluster at their booth, running Xaos. This in itself, I thought was pretty important. They said that there was active consideration on porting the D-Unix compilers to Linux, and that we ought to watch for Fibre-Channel drivers from them.

Dell was trying to convince folks to use NT with Interix--SC98 being a unix-vendor dominated conference. I walked up to them and said that we'd like pre-installed Linux machines. They arent doing that for servers on a per server basis as yet, but I think they want feedback on this issue. So if you use Dell's in any measure, write to them.

Parlalogic and Alta were demonstrating commercial Beowulves. Parlalogic has a nice fortran based parallelizing tool called Bert. Douglas Eadline was there from Paralogic, and they hosted Robert Hart from Red Hat, who made the prediction that robust fail-over(wolfpack style) clustering is a year away.

Real World Computing Partnership(from Japan) had multiple Linux clusters, and were giving away there MPICH-PM and S-CoreD clustering software. Its not redistributible, but source is available. The S-CoreD cluster operating system layer implements monitoring and other such stuff, and provides gang- scheduling using SIGSTOP and SIGCONT signals. The cluster uses a Myrinet interconnect, and boasts upto 100MegaByte's per second bandwidth using their Active messages layer(PM).They had a gorgeous 3D loadmeter on their monitors. Very slick booth and stuff.

SPADE is a industry-academia partnership from Brazil making commercial Beowulves. They use myrinet and fast ethernet interfaces, and a PAPERS network(see next para) for synchronization. They are writing weather forecasting software and selling the machines commercially to weather stations. They expect to make some of their tools available open-source. They had a beautiful Java console for their network, involving SNMP, ping, and proprietary monitoring backends.

PAPERS from purdue demonstrated their parallel port low latency interconnect (you can construct one from Radio Shack Parts!). They have a API which does shared memory barrier synchrinization in 1.5microsec as opposed to overhead for a OS lock(4 microsecs). This API is extended to their low latency interconnect. They were using their interconnect for a video-wall--a set of projectors displaying different parts of a image computed in parallel. You could use mice to move little Tux's around on the background image and the positions would be recomputed in parallel with the edge communications over there interconnect. Cool stuff.

Panel

I left before the panel

Other

There was Cray, IBM, Sun, HP, Compaq, Fujitsu and others. The only interesting booth was Tera's. Their machine is $1million a processor, with No cache. Very good parallelizing compilers, and very good programming tools. Since there is no cache, the compilers are very important, as each processor can spawn 128 threads each thread with its own registers and counters. Whilst one is out fetching from memory, the other thread will compute, thus masking \ latency--and thats why each thread needs its own registers. A very interesting architecture.

Linux at Supercomputing '98

Linux at Supercomputing '98 More Login

Linux at Supercomputing '98

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot