A New Approach To Linux Clusters 143

Posted by timothy on Monday August 13, 2001 @12:19PM from the but-you-can-still-use-your-imagination dept.

rkischuk writes: "InformationWeek has an article about a group of ex-Cray engineers working on a new architecture for clustering Linux systems. 'It's not easy trying to build scalable systems from commodity hardware designed for assembling desktop computers and small servers.' Per the article, 'As the number of CPUs in a Beowulf-style cluster-a group of PCs linked via Ethernet-increases and memory is distributed instead of shared, the efficiency of each processor drops as more are added,' but 'Unlimited's solution involves tailoring Linux running on each node in a cluster, rather than treating all the nodes as peers.'" Looks like Cray engineers think about clustering even when they're not at Cray.

A New Approach To Linux Clusters

This discussion has been archived. No new comments can be posted.

Search 143 Comments Log In/Create an Account

Comments Filter:

Re:Request for help (Score:2, Informative)

by laertes ( 4218 ) writes: on Monday August 13, 2001 @01:46PM (#2112458) Homepage

The most obvious solution is to use some sort of byte code, but you said that speed was an issue. If you're using Linux, you might want to look into dl_open, a library call. Dl_open lets you load dynamically linkable libraries at run time.
I would imagine you would use this as follows; first, you'd get some data points with which to calculate from a server. Then, you'd also get the name of a shared library which is on a server (NFS mounted probably). This library has a function name 'calc' or some such that does that calculation. You can then call that function, and post the results somewhere.
I would avoid using MPI or PVM, since those are not designed for farming out data the way you are. You should probably use your own job control protocol. Also, you might want to allow for multiple archetectures, naming the library foo-0.0.1.i386.so and foo-0.0.1.alpha.so and so forth,

Re:Request for help (Score:5, Informative)

by san ( 6716 ) writes: on Monday August 13, 2001 @12:48PM (#2128822)

Hi
The normal way to operate a cluster is to have a shared (NFS) file system across all the systems, thereby solving the data distribution problem (please note though that this prevents you from doing too much file base IO because it's too slow, you might want to make a local /scratch directory on each node)
Besides the NFS share you'll need some kind of parallel programming library like MPI or pvm, and a job scheduler of some sorts. The libraries you can find on the web (maybe in precompiler RPMS, look for the mpich MPI implementation for a start), and will provide you with a programming framework for doing all the networking and setting up the topology. The scheduler can be as simple as the one provided with MPI/pvm (ie. you name a few hosts and your job gets run on those), or, if there's a number of people accessing the cluster at the same time, you might want to try a real queuer (like gridware [sun.com]).
The parallellization is something you'll have to do yourself and it's the hardest part of clustering.
Hope this helps :-)

Re:Request for help (Score:1, Informative)

by Anonymous Coward writes: on Monday August 13, 2001 @12:42PM (#2130042)

Check out http://www.cs.wisc.edu/condor/mw/

Re:this brings up something.. (Score:2, Informative)

by gnuLNX ( 410742 ) writes: on Monday August 13, 2001 @01:54PM (#2130177) Journal

Yes. Building a cluster is fairly easy if you know a little linux and some programming. Check out http://www.beowulf.org and some of the other sites about information. I am getting ready to build a 20 node cluster with pentium III 1Ghz processors. I am playing the waiting game right now with the University and shipping company however. If you don't mind writing your own distributed applications ( using CORBA or some other libriaries) then you can set up the cluster as four individual machines. Well actually if you have 4 then one will be the rserver node and 3 will be the slave nodes. the server node will need 2 ethernet cards. It will be the only node that connects to the outside world. The other nodes can then be connected by NFS. One peice of advice. Before you build a cluster you should first decide what you are bui;lding it for. Not all software can scale to parallel computing. You must first design your problem then build your cluster. Many cluster are build and run tailored to the problem they are solving. For instance the cluster I am building will be more like a network of Workstations then a beowulf. But for my particular problem it will work in the same way. There are alot of sites out there pertaining to beowulf clusters. You need some inux experiance and some hacker ethics, but it is dooable by anybody for sure. Have fun

Re:Request for help (Score:2, Informative)

by mj6798 ( 514047 ) writes: on Monday August 13, 2001 @12:50PM (#2139138)

Can this new approach to Linux clusters be used here?
Use PVM or MPI. Both exist prepackaged for most major Linux distributions.
I'm trying to design a specialized data-fitting program to be used for accelerator-based condensed matter physics.
I think this may be a case where a bit more thinking and literature research about the problem would help a great deal. People solve extremely complex data fitting problems on modern PCs without the need for parallel processing, and there are very sophisticated algorithms for doing this. You should probably talk to local experts in statistics, computer science, and pattern recognition.

Re:Request for help (Score:1, Informative)

by Anonymous Coward writes: on Monday August 13, 2001 @01:42PM (#2140461)

Divide the number of data points by nodes on your network. Associate each data point range with IP's of the available machines. Set it up so when the program runs, it will check the IP of the computer that's running it and process the approriate datapoint.
Set up nfs and then write a bash script that will execute whatever gets sent there.
When you wanted to run something, you would just copy it to the server nfs directory and all the nodes would process their respective datapoints, automatically.

Re:this brings up something.. (Score:2, Informative)

by jageryager ( 189071 ) writes: on Monday August 13, 2001 @02:36PM (#2141653)

This link [mosix.org] describes how MOSIX can be best applied.
The best solution for any distributed computing problem depends on that problem. How CPU intensive is the job? How much data will need to be distributed to nodes. Do intermediate processing steps require intermediate answers from other nodes? How fast is the CPU? How fast is the network?
Basically, if you have a lot of processing that could be manually distributing to a bunch of hosts, via rsh, or rlogin, then MOSIX can be used to easily manage/monitor that work with no coding. For harder problems that couldn't be manually distributed you might need MPI or PVM with special code in order to do the equivalent of "threaded" distributed computing.
Manual distribution is often easiest when you have network shared filesystems ( NSF, etc.,) and so is MOSIX.
MPI is short for Message Passing Interface. You can use MPI libs to do interprocess/interhost messaging and I/O on non homogenous networks. MPI does not require shared filesystems, though your own project might use them. MPI is certainly easier to manage when you have shared filesystems. One must be careful to conider the I/O time involved in network read/writes. Also note that multiple network nodes will clobber each other's data if they all try to write to the same file over the network.
MPI, or PVM is often used when the problem of breaking the job into pieces, or putting the results back together is non-trivial. For instance, if you were doing very processor intensive image processing on a large file, you may need to break the image into pieces, or tiles, and then distribute the processing of the tiles, with some processors sharing intermediate results, then stitch the results back into one file and finally write that file.
The approach that Unlimited Scale is using only makes sense in limited cases, i.e.., when computers are " getting bogged down in processing interrupt requests from peripherals." In general, multithreaded processing, or even distributed processing, only makes sense when I/O time is dwarfed by CPU time. SMP machines have an advantage in that they have really fast, communication between nodes, compared to Ethernet. Beowulf clusters have relatively slow communication between nodes. Beowulf clusters can only really be effective in the more CPU bound and less I/O intensive jobs. But if you have a job that can be run faster on a Linux cluster, you can save the big bucks on your initial hardware purchase. The more CPU intesive the processing is, the slower the network you can put up with. SETI is a good example of this. If it takes 12 hours to process a packet of data, then I/O over a 56K modem is OK.

Re:actually it shows why Cray always does so well. (Score:3, Informative)

by Oestergaard ( 3005 ) writes: on Monday August 13, 2001 @01:21PM (#2141693) Homepage

Of the top 500 supercomputers in the world, 47 are vector processor machines - the kind of processors that Cray became famous for.

Not one single of these are made in the U.S.

Cray today is a name. It's a brand. It's not a manufacturer of high performance computers.

Just for the record, IBM produced the two fastest computers currently, Intel the third, IBM the fourth, Hitachi the fifth, SGI, IBM, NEC, IBM, IBM, and Finally, number *11* is a Cray based on the Alpha processor (the T3E).

So, tell me again, who was playing catch-up with who ?

old ideas come back around, if they are good (Score:2, Informative)

by jkorty ( 86242 ) writes: on Monday August 13, 2001 @01:08PM (#2141824) Homepage

From the article:

The idea is to free some computers from getting bogged down in processing interrupt requests from peripherals, while letting a second set of machines run the full operating system, furnishing the cluster with networking, job scheduling, input/output, and other capabilities.

The central design theme of the CDC 6400 was exactly this, and it is a product of the mid sixties. In that incarnation, the two central CPUs ran only user applications, while the operating system, with all its interrupts, OS code, and device drivers, would reside nearby in the ten Peripheral CPUs (called PPUs) provided for this purpose. The central CPUs didn't even have an interrupt capability.
Guess who the CDC6400 designer was? Seymour Cray.

Re:this brings up something.. (Score:5, Informative)

by baptiste ( 256004 ) writes: <mike&baptiste,us> on Monday August 13, 2001 @12:33PM (#2144648) Homepage Journal

Try a MOSIX Cluster [mosix.org] This type of cluster spreads processes out to the machine with the least load. A Beowulf can be done, but to take advantage of it, you have to run custom software that is capable of parallel processing.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A New Approach To Linux Clusters 143

A New Approach To Linux Clusters More Login

A New Approach To Linux Clusters

Re:Request for help (Score:2, Informative)

Re:Request for help (Score:5, Informative)

Re:Request for help (Score:1, Informative)

Re:this brings up something.. (Score:2, Informative)

Re:Request for help (Score:2, Informative)

Re:Request for help (Score:1, Informative)

Re:this brings up something.. (Score:2, Informative)

Re:actually it shows why Cray always does so well. (Score:3, Informative)

old ideas come back around, if they are good (Score:2, Informative)

Re:this brings up something.. (Score:5, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot