Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Google's 4000 Node Linux Cluster 158

Check out the Red Hat press release running at LWN, or the news article at techweb about Google's 4000 Node Linux Box. Both articles are basically Red Hat commercials, but there's some interesting bits like the fact that they have a terebyte index of 300 million Web pages, and that they might expand their cluster to 6000 nodes in the future.
This discussion has been archived. No new comments can be posted.

Behind The scenes At Google

Comments Filter:
  • by Anonymous Coward
    You can say whatever you want. That's what the anonymous button is for. Say something you fear punishment from anonymously, karma whore with your name so keep your karma!!!
  • by Anonymous Coward
    This pisses me off, google is just some lame proprietary software company. This is not good for linux at all. You don't think they actually paid RedHat for 4000 copies of thier distro do you? From what the article says, they didn't pay for anything else either from RH.

    What are they doing for Linux? Exploiting it.

  • Check out the Oracle of Bacon- when a friend of mine was at UVa [he] set up this web interface to it

    That's me [oracleofbacon.org].

    Building a graph is quite straightforward

    If you've taken a algorithms course (and passed) you, too, could probably write an Oracle of Bacon [oracleofbacon.org].

    I believe that this was done on a single computer. Pretty sure it wasn't a cluster of 4,000 ;)

    The Oracle takes up about 10% of the CPU time on a single Sun Ultra 5/300. (I didn't pick the machine. The Oracle also runs on my Linux 2xP2/350 at home.) It takes around 80 MB of memory -- 25 for the actors and movies and the rest for a cache of recent queries [oracleofbacon.org]. Each query consumes 0.6 seconds of CPU time, or 0.02 seconds if it comes from the cache. 90-95% of queries get served from the cache, so the Oracle [oracleofbacon.org] should withstand 10+ queries per second, sustained.

    The task is trivially parallelizable across big clusters (UVA has a 256-node cluster [virginia.edu] that would do the trick), but the need for that has never arisen... :)

    --Patrick

  • Because I am a hacker of the old school...I don't throw away hardware...my old house router was a 386sx 16 with two 40 Meg MFM drives. I was running Slackware, of course. I don't there are any other full-featured distros that will fit in that small of space.

    ttyl
    Farrell
  • To start with, I would guess price. If you are running 3K systems, the cost of buying in batches of 100+ of x86 clones is pretty cheap, and parts are interchangable. If you had bought DEC, oops, COMPAQ Alphas, the design of the case, etc, changes between generations of the product, and you loose hardware interchangablity. Same goes for SUNs, or PPC. ON the other hand, I can *still* put an old MFM hard drive controller in dual PIII 800MHz system at boot off of it. Try getting that level of compatiblity between 15 year old equipment on any other platform that is current today!

    ttyl
    Farrell
  • Hello? Where did you buy your brain? We are talking about 4000 PCs here. Do the maths.

    If a PC fails, on average, more frequently than once every 10 years and 11 months, then you are going to be replacing one machine each day, and three on Mondays, unless you also work Saturdays and Sundays.

    Talk about job satisfaction.

  • ON the other hand, I can *still* put an old MFM hard drive controller in dual PIII 800MHz system at boot off of it

    Now why would you want to do a thing like that?

  • Of course not.

    Then again, I have yet to see my first dead Sun.

    On the other hand, every place I've ever worked has had a room set aside named the cemetary, where old PCs are left to rot.

  • I'd be fascianted to read about the problems the google team has had to overcome in managing a their thousands of boxes - perhaps an ask slashdot article?
  • If your server has time left over to get good SETI@Home stats, then you probably spent too much money on it. ;)
    I guess there's something to be said for "headroom."
  • (In practice, the ratio would probably be closer to 12-15 Intel boxes per Sun 6500, I would guess, as a PIII doing it kind of integer work would likely outperform a SPARC II)

    There is more to a computer than the CPU. I have never needed to test or specify a system for a specific use, I thought that Suns had better memory system bandwidth, something that would seem to me to be a bottleneck here than computation speed.

    For the price, x86 is probably better, though I sure hope they selected solid quality components, I would hate to be on a crew trying to maintain 4000 computers. Other things to consider are power use and the need for climate control or at minimum A/C. So, in your support, there are many factors that need to be taken into account when considering these things, maybe x86 won out on a serious shootout.
  • I think RAIS is the term you're looking for - "Servers" and, moderation at "Funny"? Heck, it's the way of the future. The best thing about WebObjects [apple.com] is you get RAIS for free with your application. I wondered for a while why they don't make redundant macs, before I realized they weren't necessary.
  • Nothing more exciting that seeing rows and rows of rack mounted PCs.
  • by luge ( 4808 )
    Actually, computing the Kevin Bacon problem from IMDB info is not that computing intensive. Building a graph is quite straightforward, and traversing it (at least for specified names, as opposed to all names, which I guess might take a while) is also reasonably straightforward. Check out the Oracle of Bacon [virginia.edu]- when a friend of mine was at UVa, they did the conversion to a graph and set up this web interface to it. In particular, you might be interested to note the Bacon Numbers [virginia.edu], which indicate that of the 390,027 actors who can be linked to Kevin Bacon, 390,023 can be linked to in 7 steps or less. The other 4 can be linked to in eight steps. I believe that this was done on a single computer. Pretty sure it wasn't a cluster of 4,000 ;) ~luge(ahh, the continuing quest to confuse the moderators... is this OT or "interesting?" Only time will tell :)
  • by luge ( 4808 )
    Actually, I was looking through some articles linked to from their site, and the original claim (from the inventors of the game, I think) was that no actor has a Bacon number > 4. The UVa guys a) disproved for the set of all actors (as I mentioned) and b) actually proved for American actors- i.e., everyone who has a Bacon number > 4 is from IMDB's big foreign film section. BTW, there are actually actors who satisfy the seven degree rule- Christo pher Lee [virginia.edu] and Anthony Quinn [virginia.edu], for example.
    ~luge(I'm asking the friend who owns oracleofbacon.org what kind of hardware they used... no answer yet)
  • BTW, if this post goes through, it means i've managed to moderate and post to the same thread...whoops.

    Offtopic, but that is possible. When you post to an article where you have moderated posts, your moderations are canceled. You won't get any points back, though.

  • Looking at information on the web about the new Compaq Wildfire series, surly they would be a sure contender..

    You can look at the specs at:

    Benchmark performance of GS320 [compaq.com]

    Which says Suns EV10000 (64 processor) is not as fast as the 32 processor GS320....

    And the price of the GS320 is estimated arround
    $600,000 from:

    The Register [theregister.co.uk]

    So one would have thought this system to be a real contender!!! Considering the PCs are $1000 each and if we use 6000 of them, we would be able to afford 10 of these GS320 beasts (with a total of 320 alpha processors )...

  • I like Google, but here is an even better search engine. http://www.hotbot.com/text/ [hotbot.com]. No images, except for the banner on the search results page. Nothing extra, but a ton of options.

    --

  • A large number of machines in a cluster does not directly imply that the wiring and maintenence is going to be a mess. Perhaps if _you_ did it, but not if I did. I've worked on several large installation are there are right ways to do this stuff. It's not brain surgery.

    There are very clean, and tested, methods to install large cabling installations, to handle large power requirements, etc... Certainly, the setup is complex, but that's the biz.

    Think telco...go check out one of UUnet, Globalcenter, or Exodus' datacenters when you have a chance.

    -Buffy
  • Read the original Google paper [scu.edu.au]. It includes some description of Google's architecture.
  • I am not really sure why the leading Distribution company out there really needed this publicity.

    First, I thought as another user that it was obvious that Google was using Linux. Also, the whole clustering capability has been known for awhile.

    Do we need every site to have a THIS SITE RUNS ON ... statement at the bottom? Come on, we all have our reasons for running Linux. It just seems like fluff or worse geek bragging.

    I can hear the slashdotters now:

    "You can't be a real geek site you run RedHat and not Debian!"

    "Well little boy you aren't a real man till you have gotten Slackware working on a 486 33mhz machine with a bad BIOS."

    "You are all full of crap because I run BSD and it has REAL security."

    Yeah, yeah, yeah. Whatever. They can run a cluster and that is really neat and I love to Quake from their server and I bet my projects would compile really quick and wouldn't be neat if..... I think you know the rest.
  • Actually many of the larger sun systems now have the ability to 'partition' the processor boards in the system into seperate 'virtual machines' so that if one goes down the rest of the system can keep running while you replace that one processor board and then bring it up.

    Use to have a link but lost it...

  • God forbid that Sun should have competent people getting in the way of product shipping dates...
  • Evil fake Bruce (.) strikes again.
  • Of course there are several manufacturers of x86 hardware, so a comparason between a high-quality Sun box and a low-end x86 box is meaningless. How about a comparason of an x86 server, like VA or Compaq, that has much higher quality components against a Sun. Apples-to-apples.

    Administration doesn't have to be too difficult either, there are several tools to help in managing large numbers of UNIX type systems, like PIKT and rdist, that can replicate files and configurations througout a mass of machines.

    Maybe it would be more efficient to have a few very large boxes (E10K, S80, S/390, SGI O2K) but I don't think they started with the megabucks to burn, this gives them a cheap, scalable paradigm (solution, I meant solution!) that seems to work for them.

  • Who is "The Islamic Faith", and can I hire him as a wedding singer? This statement makes about as much sense as yours, the Islamic faith isn't just one person, it would be like stating that all Americans are Timothy McVeigh.
  • Does anyone know the specs of the machines that were used in the cluster?
  • How about creating a low powered Tiny PC cluster for handling all the pages?
  • Gee, man, thanks for the support! :)
  • Unfortunately, you can't do it immediately when you go to Google. Type in your search, let it return the first set of results, then change the drop down box that says "10 results" to "30 results" or "100 results" and re-run your search. It's nice when you're running an obscure search, you can just scan 100 possible hits quickly.

    Even if the actual search is slower than other engines, it's a user interface design that makes the overall searching much quicker. The only thing that I would like to see changed on Google is to be able to display 100 or more results from the front page. Then again, that would take away from the "streamlined interface" that I just got done praising, so I'll just shut up now.

    -sk

  • In addition to the streamlined, effecient page design, I use Google for two reasons:
    1. 100 results per page, most other engines only do 20 and/or changing the number of resutls per page is hard to find/do.
    2. Cached results. When I was looking for paintings by an artist, Google found several hits on past eBay auctions. The auctions were no longer on eBay, so I tried Google's cached page, and found pictures of stunning paintings.
    By not overloading my bandwidth with crap ads and layout, letting me see tons of results on a page, and getting me information that been removed from the web Google has built tremendous user loyalty. Other web companies might want to note how Google has become so popular and built such loyalty. They're doing it right.
  • Anyone know what database is used by either Google or Fast? Did they stay open source (Postgres) or is the database commercial (Oracle, Sybase...).
  • There's no indication in those that Google is using Beowulf technology. Beowulf is only one way to cluster, although the others don't generally have such an enchanting name.
  • You're requesting bloated replies?

    Any OS can get bloated, it's just a matter of what you consider excessive software. There's no question that the 300 little tools that come with Unix systems are useful for scripting, but to someone that only wants to run a web browser they're bloat. However, Unix is modular enough that you can run without many pieces -- it's the monolithic systems where bloat becomes really painful.

  • If you read the instructions, Google tells you [google.com] that you put phrases in quotation marks. Or maybe you should use Metacrawler [metacrawler.com], as it has a "phrase" button.
  • Maintenance of a large number of machines
    comes down to managing differences between
    them. If they're the same then handling
    200 is no worse than handling 2000.
    E10000's are far more tricky beasts than a
    simple linux box, especially if you're wanting
    to do domaining (the only reason you would choose
    an e10000 over an e6500).
  • What does this article really offer new? It's been known from right the beginning that they run a Linux cluster. See here: [slashdot.org]

    I find it more interesting that infact they use python [deja.com].

  • I love perensdot. (For those of you who missed it, the parent article was authored by one "Bruce Perens." which doesn't == "Bruce Perens" .) I will be sad when this account disappears into the murk of "posts start at score -2".

    I think Google is making an enormous mistake by using Linux as their OS-of-choice. Something more robust, such as Java or even Python,

    Sure, Java already does many OS-like things. Python doesn't, tho. Strangely enough, google is widely known for using Python already....

  • They do no advertising on the site,

    Sure they do. Try searching for "linux server" [google.com] and you get back a text-only ad for DigitalNation at the top.

  • FWIW, moderators, this guy is a fake. I'm not sure if it's someone trying to impersonate and defame a real individual, or if it's just someone trying to see how misinformation can get moderated up if it's written in a certain fashion, but either way, there's a lesson here: don't mod something "informative" because of its tone.

    --
    Michael Sims-michael at slashdot.org
  • I have been a big fan of Google since they first came out and have been consistently impressed, not only for the quality of results but for their clean interface. I hope they keep it that way.

    My favorite searches: Out of curiosity, I typed in Onion, and the top site was "The Onion- America's finest news source". My all time favorite happened the other day when I was searching for information of the statistics program "R". I typed in R, and the R FAQ was number 4 on the list! Rock on guys!

    "He looks like he got in a fight with the 70's and got his ass kicked"
    -Sherman Alexie
  • Is it the first stage in a 12-step recovery plan for VBlusers?

    Molly.
  • Was this article posted in an effort to see the /. effect on the cluster?

    "Well, it just got posted to /. ten seconds ago. There! Look at that little bump on CPU utilization. Wow!"

    My $.01

  • Or even better, use a program like cfengine (http://www.iu.hioslo.no/cfengine/ [hioslo.no]) and automate all your sysadmin tasks. We use a similar system at work written by our developers, and maintaining our 150+ servers spread out in 3 countries is *easy*. Need to upgrade Apache? Do it in one place, and it'll be distributed out during the night or on demand. Need to apply a patch of some kind? Yet again, it's done in one place and pushed out to the servers.

    There's no end to the possibilities!
  • ugh, it is more than that, you are obviously talking about what you read from the paper, why don't you go talk to the google guys directly. A lot more than the web crawlers and servers that feeds the crawlers are implemented in python. ...and when you do talk to them, don't ask how many modules is implemented in python, but rather ask how much work python plays a role.

  • I was agreeing with you till you mentioned "Arab terrorists", I would have agreed with "terrorists", but your having to add Arab their is very wrong. I am not Arab by any means, but I will like those with moderation mode to flag you down, We need to protect people from your likes, not just the NSA. People who sterotype and make very harmful statements.

  • 2.Cached results. When I was looking for paintings by an artist, Google found several hits on past eBay auctions. The auctions were no longer on eBay, so I tried Google's cached page, and found pictures of stunning paintings.

    Though even this is rather about the tech specs of google, I wonder how they can cache websites and still have not been sued yet.
    I remember that in the Slashdot-FAQ there is a remark about caching websites to circumvent the slashdot-effect. Among other (more technical) reasons there was also a point about possible copyright-infringement.

  • Hey, that was interesting. :)

    But has anyone consider that, those 6000 Linux boxes are actually a VM in an big s/390?? :-p

    Just a tough.

    Ok, I'm starting my own cluster now !! Let me see, a 386/4M, 2 486/16M, Pentium 75/32M, Pentium II 400/128M and 1 AMD 750/256M. Sure i can beat them. I have my own cluster!! What can i crunch now? I will test SETI@Home. May be i can find an alien life form sending email to Bill Gates:
    BORG:Resistence is Futile!
    BG: Hey! That's my MODO I'm gonna sue you!

    Regards
  • For as far as I know, techweb has a (I think very nice) script running that puts links for all technical terms in papers anyway.

    Nothing wrong with that, I'd say. If you know it, fine, if you don't and accidentilly end up on one of their pages, it's only of help. It's not like you have to actively do anything with them.

    Mad.
  • When you buy a Sun the damned thing just doesn't fall down unless you have a system mangler who keeps dicking around with it. And if a single Sun could not address the problem, then maybe it's time to buy some real iron, like a maxed out S/390. When you have a terabyte of data to process, you have to start paying a little more attention to things like I/O.

    I can assure, as one with experience, that Suns most certainly *DO* fall over. The E10K is a nice box with quite a few redundant feature. The ability to remove a system board on the fly ranks pretty high up there. Assuming you lose a processor or memory and assuming (this is a big one) that your system doesn't fall over immediatly, you can most likely replace that processor hot. However... it's been my experience that failling processors or memory bring the box (or domain) down more often than not. And let me assure, sun processors *DO* fail periodically. In a large shop, you can exect to replace at least one or two processors per year.
  • Not to start a flame war over languages, but I'd imagine that they just like Python better for large projects. Personally, perl has many cool features, but it certainly can be difficult for some people (such as myself) to make it scale to large projects. Other people seem to have no such problem, so I'm sure that it is just a matter of personal taste.
  • I won't be happy until they have
    10^100 nodes...
  • With a x86 your only going to only get about 4 procs per board though. So with a four node cluster you get 16 procs max for the cluster. Higher end Suns can handle 64 procs per system and with 2 or 4 nodes of that... Without getting into the whole pre 2.4 kernel scaling beyond 4 procs per system. Sure, if yah got a rinky-dink operation use those. I'm talking about a REAL cluster, not something out of the closet of your house. I'm talking about if you have the need for a small footprint and still need 100% uptime, then do RISC. Besides, who in their right mind you make a cluster that's so small. Besides who load balances and MPP's such a small cluster?

    None the less, I see your point. I just wasn't refering to such a small setup...

  • And you have to pay per installation of Solaris, and per support query. You can't have a team of qualified Solaris people ready to nail down any bizarre bug in the OS. You can with Linux. Also, it is cheaper to replace PCs than it is to replace SPARC servers. The electricity cost will be high but SPARC servers aren't exactly environmentally conscious either what with them having multiple PSUs/processors etc. As was the initial investment but the rest of the costs are negligible and it works. The Solaris server we had at uni. was constantly needing reboots and various tools had to be removed as they just didn't work properly.
  • Did you read what was on that page??
    http://www7.scu.edu.au/programme/fullpapers/1921 /com1921.htm

    Goto Section 4.1 and read it! Here is what you missed:
    "Most of Google is implemented in C or C++ for efficiency and can run in either Solaris or Linux."

    Only the web crawlers are implemented in python. (See Section 4.3)
  • and I'm sorry, but PC's don't cost $1000 apiece, ESPECIALLY when a large company buys them in quantity. Add the fact that they have processors, not full machines (no cd, monitor, etc...)

    Google makes a nice search engine.

    good for them, and RedHat.

    Fook
  • You're a little late on number 6. 1+1 has been shown to be equal to 2, but the proof took 211 pages in Principia Mathematica [stanford.edu]. You can also find the proof for 2+2=4 here [shore.net]. 2+2=4 is obviously related to 1+1=2, with a few extra steps.


    ...phil
  • Um, why should machines making up a search engine backend need access to Real Audio or Napster? Are you suggesting that the cluster is also used as people's workstations?

    --

  • I doubt if they would "patch it back." In case you haven't noticed, most companies haven't given a rat's ass about impressing the OSS crowd until very recently. Something tells me they really are running FreeBSD.

    You could still be right though... this only means they are running their frontend web servers on BSD. As to what powers their database is anyone's guess, since you'll never see that server from the outside world if they have half a brain. So, they could be running NT for their backend. This would be the totally wrong thing to do though. Usually companies use NT to serve the HTML because it has better applications for interactivity available than Unix, and a *nix for the real meaty, hardcore database queries, etc. I believe this is what Ebay does but I could be mistaken.

    --
  • How long did it take you to make this up?



    This presented some unique problems, tho. Using 300 nodes meant that, potentially, you could have 300 connections to EACH CLIENT. We needed to make a transparent single point of entry and use a 10.X.X.X ->legal NAT translation. Problem with that, of course, is that NAT often breaks apps like Real Audio or Napster or anything that embeds source/destination within the packet to be router through the routing level of the requestor.


    Using NAT as a front-end to a server farm returning straight HTML documents won't cause any problems.


    Go away!

  • I forwarded a copy of the above post to Rob Malda, and he sent me a concise reply describing his view of how it's supposed to work. I think it's worthwhile to share his insights with the whole crew. With his permission, here's what he had to say:

    Date: Wed, 31 May 2000 13:25:01 -0500 (EST)
    From: Rob Malda To: Joe Zbiciak Subject: Re: Moderator collision I don't think its a problem. I think moderators should moderate without even seeing the score of the comments they are moderating!

    A + means someone thought it was valid. Score:2 means 2 people. Score:3 means 3 people.

    Its not an absolute 'This comment is Score:2' its more like '2 people thought it was a valid comment'

    So there you have it. Of course, that does raise the question of why we have the Overrated and Underrated moderation categories, but otherwise, I think I see his point.

    --Joe
    --
  • At this time, it appears to have been rated back down to a 3. I think what happens is that moderators scan / read through posts, selecting particular posts to be moderated up or down. When they finally get to the end of the page, they click [Moderate]. When several moderators are actively viewing a story, you end up with multiple moderations pending for the same article. So, what should've received a +1 might get +2 or more of multiple moderators agreed that it deserved +1.

    The problem is that the moderators don't get to see the other moderations being performed in parallel to their own moderation. Perhaps there's a solution. Slashdot could ask for confirmation in cases of "moderator collision."

    For example, consider the following sequence of events:

    • Moderator A views comments
    • Moderator B views comments
    • Moderator A selects post #39 and post #42 to be moderated up.
    • Meanwhile, Moderator B selects post #42 and post #69 to be moderated up.
    • Moderator A clicks [Moderate], and both moderations are applied.
    • Moderator B clicks [Moderate]. What happens?

    Currently, Slashdot will apply both moderations immediately. This results in article #42 receiving +2, when it may only deserve +1. It's neither Moderator's fault -- they've moderated past each other. Alternately, I propose that Slashdot, in this case, only apply the unique moderation immediately, and then ask for confirmation on Moderator B's moderation of #42. This is because Moderator B had no way of knowing that Moderator A moderated #42 up while he was still reading the posts. Let's assume all moderations are applied, and continue the example:

    • Moderator C now views the comments page, and sees all of Moderator A and Moderator B's moderations.
    • Moderator C selects #69 to be moderated up.
    • Moderator C now clicks [Moderate]. What happens?

    At this point, Slashdot will apply the moderation. Under my proposal, this would not change, as Moderator C did already see that #69 was moderated up before he selected it for moderation.

    What I'm guessing would be necessary is an additional bit of state which says "This was the score that the post was viewed with at the time the Moderator selected it for moderation." If the article's current score is different than the score it was viewed with, ask for confirmation that the moderation be applied for that specific moderation. A series of radio buttons could be displayed for the affected articles: "Apply Moderation? [_] Yes [X] No".

    Thoughts?

    --Joe
    --
  • "PCs will work OK in any heat and humidity that people will"

    I pictured myself framing a house in the Texas summer heat, and repairing a barbed wire fence in a snowstorm.
  • " the guy who kicked your fucken ass "

    If I had moderator points, I'd deal you down accordingly. Since I don't, I'll mention this:

    I think that "fucken" is becoming a word. I'm glad it is, because it rhymes with "Turducken" [gumbopages.com]. I also think it would work in a subjunctive mood usage context.
  • This would probably be quite efficient - it's really the same as 'optimistic concurrency control', in which you read a last-changed timestamp for every object/record just before you do the update, and flag a concurrency issue to the user if this timestamp changed since you read that object/record.

    The overhead is an extra piece of state for each article - but since the score for each article is already in the web page, the only real impact is on the CGI script that does the update.
  • for just a 2 or 4 node cluster, you buy a high-quality PC from VA or some other reputable shop that supports Linux well. once things start to grow, you use those for database, load balancing monitors and things like that, and you grab el cheapo clones for the gruntwork of running httpds.
  • I knew had to be some heavy-duty equipment back there -- nothing else but a 4,000 node Beowulf cluster could power the awsome "Mentalplex" search engine. It's unfortunate that the search also requires the combined mental powers of 4,000 users. Which might be why I can't seem to get the Mentalplex to find anything but pr0n and mp3s. :).

    "Must... Concentrate! .....oooh, swirly..."
  • Allthough I use Google mostly, it is not the fastest engine around, that has got to be Fast (http://www.alltheweb.com)!

    It is so damn fast, that it just keeps amazing me.

    If you haven't tried it - you should!

    Here is an example on a seach for "linux":

    "3810249 documents found - 0.0051 seconds search time".
  • I would like to put one of these in my basement and finally disprove the "7 steps to Kevin Bacon" theory everyone seems to buy into.
    It's usually 6 steps to kevin bacon, and it's an NP complete problem. If you do find a way to solve it in polynomial time, please share your algorithm. You'll probably get a Nobel prize.
    --Shoeboy
    (former microserf)
  • There are ways to reduce the impact of the clustering, but it will never be better than a parallel computer.

    That's complete bunk. Whether a centralized multiprocessor machine or a massively-parallel distributed cluster would be faster depends completely on the task at hand. Specifically: How parallel is the task?

    If the task can be broken up into many completely self-contained pieces, then a cluster will generally win. You can buy lots of low-end hardware cheaper then you can buy even very good high-end hardware.

    If the task contains contention points or data access is very random, then you're better off with a single multiprocessor machine. An example of a contention point would be the locks in a database. An example of random data access would be logins to Slashdot.

    Finally, it is worth pointing out that, after a certain point, most large machines have to move to a NUMA design, at which point you start to resemble a massively parallel cluster anyway.
  • This is definately the easiest way in terms of coding, and because the moderator only intended to boost the post by one (or drop it by one) it is likely to be the most accurate.

    The problem is that if the moderator takes a long time to read the post and two people moderate it up, from a starting point of 1, to 3, and this moderator had selected to mod it down to 0, the cgi needs to be smart enough to use that as a relative -1, instead of moving the post to an absolute score of zero.

    Otherwise someone could start reading immediately, mark one of Signal11's (for an example of someone with a +5 in nearly every thread) posts as a -1 (to 1) comment, then wait till he's been modded to 5 in the initial rush (by viewing the thread from a non-logged in browser) and then submitting, effectively making their -1 worth -4....

    But, otherwise, your method seems the easiest and the least error prone.

    The only problem is that without overlapping simultaneous moderation, the scores likely wouldn't be so high anymore, so people browsing at +4 and +5 would see less messages... But I always browse at 0 anyways, so it wouldn't bother me.
  • Take a look at your user history. All your posts eventually get looked at by moderators not smoking crack and get modded down to 0 or -1. At best you are entertaining yourself for a few minutes with a single temporarily, high-modded post at a time.

    Others might respect your trolling, but the only thing that matters in the end is high-karma--and you ain't got it.

    BTW, don't bother responding with a "what are you talking about, I'm not a troll" response: I don't intend to read it.
    --
    Have Exchange users? Want to run Linux? Can't afford OpenMail?
    • But damn, that takes a staff of 200 people to manage the security/connectivity/accounts/space and other duties just for the cluster. that would mean: 1 person per 200 nodes. does not sound that much to me.
    • The Power bill has to be outrageous! are you sure that a few Sun's would have much less power consumption / per MIPS?
    • The Cabling/switching/routing mess has to be totally unmanageable i do not think it has to be a mess. after all most of the boxes will have an identical setup. just connected 16 of them to a switch and then interconnect those in groups again , etc.. i guess it can (and probably is) done in a clean and structured way.
    • What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs first i do not think that most of the boxes are directly connected to the net, most of them proably are backend search engine that deliver their results to frontend maschines.. furthermore if the setup ist done smart (and i assume it is) then you e.g. would boot identical boxes via ethernet and NFS-root and then the thing downloads latest software from a central server..etc.. so changeing software on all nodes would not be much work. only change it on a single maschine...
    • I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper. all if 4 of these are enough.. but i do not think that 4 are enough.. someone above wrote that one proably needs more then one sparc CPU to replace one high end pentium. but even if it is 1:1 then you need 125 of your 32proc box's. with lots of floor space and cooling as well. maybe less maintenace but more expensive one..
  • Still an Open Source victory!

    Alltheweb.com is running "Apache/1.3.6 (Unix) PHP/3.0.11" on FreeBSD...

    When I first saw the "powered by Dell Poweredge" sticker on their page, I briefly worried that it was going to be an NT site. Nope!
  • It's all well and good that there's PR out there about this.

    As someone who is building a large portal with Redhat, it'd be nice to have some kind of technical reference as to how they've built it. What are they using to handle the clustering? Are they using the Piranha stuff that comes with Redhat 6.2, or are they using hardware, or maybe something they've written themselves? Are they using sessions, and if so how are they handling them?

    Are any parts of the cluster sharing processing power, or they all just individual boxes clustered to appear as one?

    I think it's great that they're getting press, I'm just hoping that one of these days there will be something published on how it all went down.
  • No, I wasn't trying to say that google makes a good poster boy for open-source, but it is a great example of large organizations embracing the fruits of open source labor. Linux has gone through a lot of media exposure in recent months due to it's current 'fashionable' status, what is actually need to maintain linux's spot in the media world is examples of linux doing real world jobs. Large companies like google making public statements that they use a massive linux installation to solve a problem because it's the best tool for job are not going to hurt.

    I would be interested to hear more about "The troubles that go on at Google behind the scenes are bound to become public knowledge very very soon.", without further information I'd like to think that linux would not get a 'black eye' over any problems within google, but you seem to know more about this than me.
  • Only 2 of the modules for the entire system are implemented in python, specifically the web crawlers and the server that feeds the crwlers url's. the rest of the system is implemented in c or c++.
  • Heh, it could still be an NT/IIS clutser, but patched to report back as FreeBSD/Apache to make it more respectible.

    BTW, if this post goes through, it means i've managed to moderate and post to the same thread...whoops.
  • >a googleplex is 10^google

    No it's not. A google is the verb form of googly (a cricket term) - an off-breaking ball with an apparent leg-break action on the part of a right-arm bowler to a right-handed batsman, or conversely for a left-arm bowler.

    A googol is 1 followed by a hundred zeros, 10^100.

    A googolplex is 1 followed by a googol of zeros, 10^googol.

  • raging.com

    (seems to come up with slightly diff hits than Altavista itself, but works plenty good for me!)

  • Yes - in fact they claim an interesting demographic to potential advertisers:

    Google advertisers will benefit from marketing to a web audience with these distinct demographics:

    Male (65%), female (35%)
    High education (65% have at least a BA/BS)
    Professional (73%)
    High income (average income is $71,000)
    Highly technical (71% report high/very high computer skills)
    Online experience of 4+ years (58%)
    Accessing the Internet from work (48%)
    Using the web for work purposes (31%)
  • a Beowul....Oh, wait a minute, never mind ;)
  • He's a bigger prick than Stallman (who notoriously used to stink out the MIT Law Library when consulting Lessig on the GPL). My firm did some speculative liti work for a bunch of college kids from Tennessee who reckoned he'd ripped off their "Polygon Management Architecture", back in the days when J-J-J-J-Julius systems (BTW, that's four j's, not three, the prick has a typo in his .sig) was marketing its engine without designing any games. He would not settle, choosing instead to nearly bankrupt these college students by forcing us to take him on a ludicrously expensive round of litigation, which we lost on a technicality at huge expense to our clients. A bigger asshole, there isn't.

    John Saul Montoya (Yeah, Wosten, thatJohnny Montoya, the guy who kicked your fucken ass over the KKW second-stage funding. Don't fuck with Wall Street).

  • No, you got the details wrong! Google works so well because it is a 4000 nerd cluster. Yes, they are each sitting at a linux box, but the powerful searching comes from the fact that 4000 nerds typing aimlessly (but furiously) can produce results that are easily superior to what Altavista can turn up.

    BTW, have you looked at the http://www.hotsheet.com/ [hotsheet.com] portal? It's a portal, yeah, but it's really "clean" looking and has a ton of useful links. That's why they host my email. (no, I'm not affiliated)

    ----

  • by grinder ( 825 ) on Wednesday May 31, 2000 @04:06AM (#1036776) Homepage
    But it is a cluster of 4000 PCs which means if one goes down the whole system keeps working. If you have one big Sun and it goes down you have no redundancy and no backup. Reliability and up time for websites is make or break.

    Did you say that with a straight face?

    Assuming you depreciate a machine over three years (and that's really stretching things in the Real World), you're replacing a machine every just over every six and a half hours. Plus all the effort gets skewed down the the end of the three years. It would almost be economical to throw the door-key away and start afresh.

    When you buy a Sun the damned thing just doesn't fall down unless you have a system mangler who keeps dicking around with it. And if a single Sun could not address the problem, then maybe it's time to buy some real iron, like a maxed out S/390. When you have a terabyte of data to process, you have to start paying a little more attention to things like I/O.

    4000 PCs cannot be a viable economic replacement. That amount of hardware would require as highly a specialised environment as that of a mainframe (cooling and electricity), and certainly much more real estate. And they have really shitty I/O. If Google has money and space to piss away, well good for them, but it's hardly a wise business practice that anyone over 30 would recommend.

    If you want to play with Linux, by all means invent some statistics that show that your MIPS/$ is better than the competition. Statistics can say anything you want them to. I, however, would like to know how they derived such figures. Ignorant readers of the article might otherwise be mislead into pursuing foolish choices in computing platforms.

    Oh, and BTW, your regex is suboptimal, the split is entirely redundant and you shouldn't use double-quoted strings in Perl if you're not interpolating anything.

  • by ChrisRijk ( 1818 ) on Wednesday May 31, 2000 @03:16AM (#1036777)
    In this story at EETimes [eetimes.com], a guy from Sun talks about the pre-confiured "server farm" solutions Sun announced yesterday.

    An interesting quote is this:

    • While it's debatable whether buying a preconfigured compute farm is cheaper than stringing together a few PCs and running Linux, Tallman said the latter scenario "would work well in university and government research centers where there is a lot of free labor, but not in a company that needs to get products out the door and can't spend time developing core competencies in compute farms."
  • by stevelinton ( 4044 ) <sal@dcs.st-and.ac.uk> on Wednesday May 31, 2000 @05:44AM (#1036778) Homepage
    I think the situation at Googol is quite special. Although they have a TB of data, it is very slow changing (once per months, so about 300KB/sec) and what they have to do it is very (integer) CPU intensive. They remark that it distributes really well, so presumably network latency between the PCs isn't a problem, and locality of access to the data is good. Given that (see SpecCPU2000 for instance) Intel processors on cheap motherboards really is a big win for performance/purchase price.

    This leaves the management questions. Presumably most of these PCs are configured exactly identically, apart from the ethernet card numbers, and the work is controlled by some central servers (for which big Suns might well be appropriate). So, if I was setting this up, how would I handle hardware failures:

    1. a PC blows up
    2. the central server notices some timeout on a
    parcel of work or a heart-beat and takes that node out of the active list.
    3. the central server (or another one specialized for the job) makes a more intensive effort to sort out the problem. If it can get in, it can probably trigger a reboot, or even a re-install, remotely.
    4. If it can't get in at all then human assistance is needed. Add a task "reset node 1234" to the next hourly jobs printout for the operator
    5. On the next pass through that part of the warehouse, the operator hits reset. The node tries to reboot, goes through health tests, possibly does an auto reinstall.
    6. If no life then add it to the daily list for the operator with the electric handcart to pull and replace, send it in the daily shipment to the supplier.

    I don't know for sure that this is how they do it, but it's how I would do it. Failure is a nuisance when it happens every few weeks. If it happens every few hours, then you can make it routine and pain-free. In a cluster of 4000 identical machines, hardware failures are part of life.

    You mention other things: power -- a bare PC processor mobo and hard drive draws about 90W. So the whole cluster is about 360KW. This is a lot of power to get in, and heat to get out, but well within the normal range of, for instance, small factories, and the people who supply kit for that should be able to cope easily. PCs will work OK in any heat and humidity that people will, so ordinary office-grade air-conditioning will be fine.

    So, in their very unusual circumstances, this probably is the right call for Google. They can routinize hardware failures to the point where they just cause a statistically predictable amount of work that must be budgetted for. The central servers that control all this, store the TB database, etc. are another story. There, the more conventional rules apply, and I would bet that those are normal server hardware -- Sun, IBM or high-end Intel servers.
  • by Sun Tzu ( 41522 ) on Wednesday May 31, 2000 @03:16AM (#1036779) Homepage Journal
    The Sun solution would be much more expensive because it wouldn't be only one Sun. It would require many, many, Sun 6500's or 10000's. Since their application distributes quite nicely, the price/performance of Intel boxes running Linux would be very hard to beat.

    Try substituting Sun 6500's with 20 CPU's for each set of 20 Intel boxes and see what that does to the pricing. ;) (In practice, the ratio would probably be closer to 12-15 Intel boxes per Sun 6500, I would guess, as a PIII doing it kind of integer work would likely outperform a SPARC II)
  • by xinu ( 64069 ) on Wednesday May 31, 2000 @03:13AM (#1036780) Homepage Journal
    I'll tell yah I'm not a fan of the PC at all being a Solaris Admin. The hardware in general sucks and is unreliable.

    But in this case I think Google is on the right track. MIPS/$ ratio is definately in the favor of the PC. And with sooo many PC's if one goes down it really wouldn't make a huge difference. If it were just a 2 or 4 node cluster then I would lean towards a RISC based architechture for reliability. But in this case the cost is just to staggering to imagine a Sun cluster for this.

    Koodoos to Google, my new search engine of choice! Long live Linux!

  • by LMacG ( 118321 ) on Wednesday May 31, 2000 @03:39AM (#1036781) Journal
    Google does offer phrase searches, and a few other advanced features. Just click on the Search Tips [google.com] link from the main page. I'm not sure I'd classify their implementation as "intuitive," but it's no worse than learning, say, REXX [ibm.com]. You are correct though, in that full Boolean searching is not available -- as stated on the Tips page, Google does not support the logical or operator at all.

  • by Animats ( 122034 ) on Wednesday May 31, 2000 @07:53AM (#1036782) Homepage
    Assuming you depreciate a machine over three years (and that's really stretching things in the Real World), you're replacing a machine every just over every six and a half hours. Plus all the effort gets skewed down the the end of the three years. It would almost be economical to throw the door-key away and start afresh.

    I heard the CTO of Inktomi talk on this issue. Their basic approach to cluster buying is to buy midrange PCs in units of 100. Each cluster then consists of 100 identical PCs. Clusters are replaced as a unit, never upgraded. A site may have multiple clusters of different hardware. Every few months, they do evaluations to pick the machine with the best price/performance, which is usually a machine in the middle of the pack, not a top-end machine.

  • by SuiteSisterMary ( 123932 ) <slebrunNO@SPAMgmail.com> on Wednesday May 31, 2000 @04:13AM (#1036783) Journal
    What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs
    Not with Linux. For patching and what not, one can easily create a single script that will do it all. Or, even better, and assuming it's a closed network, make an NFS share. On each machine, put a cron job that takes anything in that directory (RPMs generally) and applies it. You're probably on identical hardware and software, so that sort of thing works. Hell, write a daemon that monitors a port, and then start broadcasting commands, and they'll all pick up on it. Lots of ways.
  • by JamesSharman ( 91225 ) on Wednesday May 31, 2000 @02:57AM (#1036784)
    It's nice to see some good Linux publicity happening, Google is fast becoming the most respected search engine around, their clean and uncluttered interface is drawing people away from the more traditional search engines where it seems you have to download more portal c$&p every day. It seems poetic the google is becoming an ambassador for linux by showing up their bloat laden competitors in the search engine market, while linux does the same in the OS market.
  • I can just see it now. A manager at Google walking over to a developer's PC and seeing this sticker [thinkgeek.com] and saying,"Why not?"

    Now all that's needed is for thinkgeek to claim responsibility for this action. :)

  • by Gurlia ( 110988 ) on Wednesday May 31, 2000 @03:12AM (#1036786)

    Yeah, all the other popular search engines nowadays seem to be ridden with banner ads, promotions, and all kinds of useless fluff on their pages. Google is nice and simple, doesn't clutter the screen, and in general makes everything easier on the eyes. I think this is part of the attractiveness of Google -- you're not flooded with irrelevant info and pictures, but just the stuff you're looking for.

    One thing I have against Google though -- I wish they had an advanced search where you can specify to search for exact phrases, etc., or perhaps even a full boolean search. I don't know how Google works, so I can't tell if these features are left out because of design issues. But, being the "hacker's search engine" and everything, it really should support more advanced searches. If they can find a way to implement this well, it may even become a deciding factor against other search engines. (I hardly know any search engine out there that can handle full boolean search, and certainly Google's speed will be a great advantage.)


    ---
  • by jbarnett ( 127033 ) on Wednesday May 31, 2000 @03:03AM (#1036787) Homepage

    So this "super computer" will be used for Total World Domination? Oh, can we use it atleast to take over some small thrid world countries? I promise to have it back by six tonight.

    The Google crew must have some killer Seti@home stats.

    I would like to put one of these in my basement and finally disprove the "7 steps to Kevin Bacon" theory everyone seems to buy into.

  • by aozilla ( 133143 ) on Wednesday May 31, 2000 @03:16AM (#1036788) Homepage
    redundant array of inexpensive processors
  • by cybrthng ( 22291 ) on Wednesday May 31, 2000 @03:54AM (#1036789) Homepage Journal
    Well, as you are all well aware of, dot.com's are going through money like nothing. Sure it is *great* publicity to have 4,000 servers witn another 2,000 coming online.

    But damn, that takes a staff of 200 people to manage the security/connectivity/accounts/space and other duties just for the cluster.

    The Power bill has to be outrageous!

    The Cabling/switching/routing mess has to be totally unmanageable

    What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs

    I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper. Sure your upfront bill may be more, but only have to worry about 8-16 power connections (redudancy) is alot easier then 6,000 power cords/strips/racks/floor space/cooling/maintenance.

    Sure it is one hell of a beast to be proud of, but one hellova costly beast to work with.

    Just my 2 cents

  • by segmond ( 34052 ) on Wednesday May 31, 2000 @03:10AM (#1036790)
    just my own 10cents, The google guys use python over perl, hrmmm, i wonder why. :D by the way their paper is a good read. http://www7.scu.edu.au/programme/fullpapers/1921/c om1921.htm

  • by heimdall ( 44846 ) on Wednesday May 31, 2000 @04:30AM (#1036791)
    I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper.

    Last I checked (this was about a year or so ago) a fully loaded (64/64) E10K ran around $12M and the base (2psr) system was running around $800,000. Even if that's off by a factor of 3 or 4, you're still talking $3-$4M a piece... at three of them, you're looking at between $12-$48M. On the other hand, the typical white box PC will run between $800-$1500. That amounts to $3.68M-$6.9M for 4600 nodes. This doesn't include the network infrastructure or administration costs, however, as someone who has administered large clusters (largest was an 80 node SP/2), it actually becomes easier to administer that many nodes in a cluster than it would that many servers. Keep in mind that there most certainly are groupings of nodes where they are kept identical except for IP.

    Another significant expense is that hardware support costs associated with such systems. If you have 4600 nodes, it's trivial to simply keep (MANY) spare systems floating around. Also, you can disable a node with negligible impact. Even if you're subdomaining an E10K, there are (a small few) single points of failure on the platform (regardless of what Suns documentation says). If you're not subdomaining it, you're simply talking a 32way SMP box (might as well just use a 6500 for that configuration). If you were to lose the backplane for whatever reason, you've lost a singificant portion of your compute resources.

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...