A New Approach To Linux Clusters 143
rkischuk writes: "InformationWeek has an article about a group of ex-Cray engineers working on a new architecture for clustering Linux systems. 'It's not easy trying to build scalable systems from commodity hardware designed for assembling desktop computers and small servers.' Per the article, 'As the number of CPUs in a Beowulf-style cluster-a group of PCs linked via Ethernet-increases and memory is distributed instead of shared, the efficiency of each processor drops as more are added,' but 'Unlimited's solution involves tailoring Linux running on each node in a cluster, rather than treating all the nodes as peers.'" Looks like Cray engineers think about clustering even when they're not at Cray.
Re:Request for help (Score:2, Informative)
I would imagine you would use this as follows; first, you'd get some data points with which to calculate from a server. Then, you'd also get the name of a shared library which is on a server (NFS mounted probably). This library has a function name 'calc' or some such that does that calculation. You can then call that function, and post the results somewhere.
I would avoid using MPI or PVM, since those are not designed for farming out data the way you are. You should probably use your own job control protocol. Also, you might want to allow for multiple archetectures, naming the library foo-0.0.1.i386.so and foo-0.0.1.alpha.so and so forth,
Re:Request for help (Score:5, Informative)
The normal way to operate a cluster is to have a shared (NFS) file system across all the systems, thereby solving the data distribution problem (please note though that this prevents you from doing too much file base IO because it's too slow, you might want to make a local /scratch directory on each node)
Besides the NFS share you'll need some kind of parallel programming library like MPI or pvm, and a job scheduler of some sorts. The libraries you can find on the web (maybe in precompiler RPMS, look for the mpich MPI implementation for a start), and will provide you with a programming framework for doing all the networking and setting up the topology. The scheduler can be as simple as the one provided with MPI/pvm (ie. you name a few hosts and your job gets run on those), or, if there's a number of people accessing the cluster at the same time, you might want to try a real queuer (like gridware [sun.com]).
The parallellization is something you'll have to do yourself and it's the hardest part of clustering.
Hope this helps :-)
Re:Request for help (Score:1, Informative)
Re:this brings up something.. (Score:2, Informative)
Re:Request for help (Score:2, Informative)
Use PVM or MPI. Both exist prepackaged for most major Linux distributions.
I'm trying to design a specialized data-fitting program to be used for accelerator-based condensed matter physics.
I think this may be a case where a bit more thinking and literature research about the problem would help a great deal. People solve extremely complex data fitting problems on modern PCs without the need for parallel processing, and there are very sophisticated algorithms for doing this. You should probably talk to local experts in statistics, computer science, and pattern recognition.
Re:Request for help (Score:1, Informative)
Set up nfs and then write a bash script that will execute whatever gets sent there.
When you wanted to run something, you would just copy it to the server nfs directory and all the nodes would process their respective datapoints, automatically.
Re:this brings up something.. (Score:2, Informative)
The best solution for any distributed computing problem depends on that problem. How CPU intensive is the job? How much data will need to be distributed to nodes. Do intermediate processing steps require intermediate answers from other nodes? How fast is the CPU? How fast is the network?
Basically, if you have a lot of processing that could be manually distributing to a bunch of hosts, via rsh, or rlogin, then MOSIX can be used to easily manage/monitor that work with no coding. For harder problems that couldn't be manually distributed you might need MPI or PVM with special code in order to do the equivalent of "threaded" distributed computing.
Manual distribution is often easiest when you have network shared filesystems ( NSF, etc.,) and so is MOSIX.
MPI is short for Message Passing Interface. You can use MPI libs to do interprocess/interhost messaging and I/O on non homogenous networks. MPI does not require shared filesystems, though your own project might use them. MPI is certainly easier to manage when you have shared filesystems. One must be careful to conider the I/O time involved in network read/writes. Also note that multiple network nodes will clobber each other's data if they all try to write to the same file over the network.
MPI, or PVM is often used when the problem of breaking the job into pieces, or putting the results back together is non-trivial. For instance, if you were doing very processor intensive image processing on a large file, you may need to break the image into pieces, or tiles, and then distribute the processing of the tiles, with some processors sharing intermediate results, then stitch the results back into one file and finally write that file.
The approach that Unlimited Scale is using only makes sense in limited cases, i.e.., when computers are " getting bogged down in processing interrupt requests from peripherals." In general, multithreaded processing, or even distributed processing, only makes sense when I/O time is dwarfed by CPU time. SMP machines have an advantage in that they have really fast, communication between nodes, compared to Ethernet. Beowulf clusters have relatively slow communication between nodes. Beowulf clusters can only really be effective in the more CPU bound and less I/O intensive jobs. But if you have a job that can be run faster on a Linux cluster, you can save the big bucks on your initial hardware purchase. The more CPU intesive the processing is, the slower the network you can put up with. SETI is a good example of this. If it takes 12 hours to process a packet of data, then I/O over a 56K modem is OK.
Re:actually it shows why Cray always does so well. (Score:3, Informative)
Not one single of these are made in the U.S.
Cray today is a name. It's a brand. It's not a manufacturer of high performance computers.
Just for the record, IBM produced the two fastest computers currently, Intel the third, IBM the fourth, Hitachi the fifth, SGI, IBM, NEC, IBM, IBM, and Finally, number *11* is a Cray based on the Alpha processor (the T3E).
So, tell me again, who was playing catch-up with who ?
old ideas come back around, if they are good (Score:2, Informative)
Guess who the CDC6400 designer was? Seymour Cray.
Re:this brings up something.. (Score:5, Informative)