Forgot your password?
typodupeerror
The Internet Software Linux IT

Wikimedia Simplifies By Moving To Ubuntu 215

Posted by kdawson
from the all-eggs-one-basket dept.
David Gerard writes "Wikimedia, the organization that runs Wikipedia and associated sites, has moved its server infrastructure entirely to Ubuntu 8.04 from a hodge-podge of Ubuntu, Red Hat, and various Fedora versions. 400 servers were involved and the project has been going on for 2 years. (There's also a small amount of OpenSolaris on the backend. All open source!)"
This discussion has been archived. No new comments can be posted.

Wikimedia Simplifies By Moving To Ubuntu

Comments Filter:
  • by ACK!! (10229) on Friday October 10, 2008 @11:54AM (#25328531) Journal

    For such a large effort, it seems wild they had so many different distros running in their environment.

    What do you guys think?

    • by Anonymous Coward on Friday October 10, 2008 @12:07PM (#25328687)

      I think that it's good to standardize on the best OS for your needs, but to find out which one is best you should first try running a bunch of them.

    • by somersault (912633) on Friday October 10, 2008 @12:22PM (#25328877) Homepage Journal

      I think it likely that Wikipedia started out as a small pet project, and just happened to grow piecemeal as they needed more and more resources as they grew in popularity. They wouldn't have been sure to start with just how popular they were going to become, how could they? Also take into account that perhaps they had been using different OSes in a consistent way (though I don't expect that to be likely), like some were just for webserving, some held a quick database of current articles, some machines held compressed archives, some were for intended for virtualisation and testing out of new designs, that kind of thing?

      Anyone who has written a small well planned (or perhaps not so well planned) application but then been asked to make many, many, many changes over the years will be able to sympathise I expect. It's much easier to design a large coherent system than grow one out of a smaller system..

    • by Tango42 (662363) on Friday October 10, 2008 @12:37PM (#25329055)

      Your mistake is in thinking it's a large effort - they started with just volunteers and then had only one or two full time staff for a while with the technical stuff still being done by volunteers. The first technical person wasn't hired until August 2005, four and a half years after the launch of Wikipedia (which, by that point, was already a top 50 website according to Alexa), they only have around 5 technical staff now. It's a very small project from that point of view, it's just a hell of a lot of servers!

      • The first technical person was Brion, who'd done the job as a volunteer for quite a while before that.

        I started editing Wikipedia in early 2004. I believe they'd just made the radical jump from one box to three boxes.

        Now stuff is structured in a horizontally-expandable fashion. "Add some more Squids." "Add some more Apache servers." So a single platform is an obvious win, and picking one platform to standardise on is actually more important than which of various near-indistinguishable free Unix-like operating systems that could all do the job they pick.

        • by Tango42 (662363)

          I know, David, I'm Tango (Thomas Dalton). If you want to play timestamps, my first recorded edit was December 2002 (it was very small back then!) ;).

    • by jlarocco (851450)

      Yeah, the headline plays up Ubuntu a bit too much. Sounds like they could have simplified by moving to almost anything with consistency.

    • For such a large effort, it seems wild they had so many different distros running in their environment.

      Yeah, much better now, that all of their servers can be taken over at once through a single exploit...

  • Cos a general purpose distribution isn't exactly ideal for providing scalability, particularly when your machines pretty much all provide the same service.

    The network is the machine.

     

    • Re: (Score:2, Insightful)

      by Kuj0317 (856656)

      You are wrong there. A homogenous environment (up to a certain point) is MUCH better for scalability. Need more power? Get a new box, apply the standard customizations, throw it in the mix.

      I agree that a cookie cutter approach like this does not yeild the greatest performance per box, but it does allow for a better performance/administration ratio.

      • Re:How many admins? (Score:5, Informative)

        by brion (1316) on Friday October 10, 2008 @01:10PM (#25329519) Homepage

        Mass installation of a customized distro can do better than mass installation of a general distro (eg, the kernel and software can be optimized for your use case).

        And indeed, we use a slightly customized Ubuntu, in that we have our own patched versions of some packages (PHP, Squid, MySQL, some custom PHP extensions, etc) tweaked for performance or features we need, plus custom meta-packages to install the configurations we require on different server sub-types.

        This is pretty easy to do on any distro with a decent package manager. I still like apt better than yum, though!

        • by denttford (579202) *
          This past week, I installed Fedora, CentOS, and RHEL (for a friend). First time I'd used a RH linux in years. I'm taking that machine back to Solaris, Open Solaris, or Debian. I haven't wiped CentOS* yet, but I have to ask - does anyone prefer yum/rpms to apt/deb?

          Maybe it is a bit of a flamebait, but I am curious...

          *Of the three, CentOS was the best, but when SELinux is on by default and refuses to play nice with FreeNX out of the box - a major package offered by the installer - meh. All this to be
          • Re: (Score:3, Insightful)

            by David Gerard (12369)

            It's not apt vs yum or rpm vs deb - it's how well the repository's maintained. apt has a good reputation because Debian's repository is superbly well maintained. But Fedora's yum repos are much better maintained than Fink's apt repos.

            It's not the software, it's the repository quality. Actual humans making sure everything plays nicely.

        • by stevo3232 (794498)

          apt can run on fedora/centos/etc.

          See: http://en.wikipedia.org/wiki/Apt-rpm [wikipedia.org]

        • Re:How many admins? (Score:4, Informative)

          by Colin Smith (2679) on Friday October 10, 2008 @02:49PM (#25330859)

          Try this for an idea... The whole concept of "installation" is wrong.

          Build your own distributions. One per purpose.

          Use something like RockLinux [rocklinux.org]

          to build a ramdisk image which contains all of the software and configuration required for a particular application. By "all" I mean "only". You end up with a single file which you put on a tftp server, you boot your servers over dhcp, they pick up the OS image and boot to the image on a ramdisk.

          e.g. You might have one squid image, one PHP app server image, one Mysql rdbms server image etc. When the image boots it does whatever is required to run the app successully. e.g. putting a filesystem on the hard disk.

          The benefits:

          • Zero server configuration. (or close to it) this means no need for YUM, no RPM, no APT. No dependencies.
          • Massive scalability because of above.
          • Only tested images reach production. You know it is going to work because the production image is the same single file, you know exactly how it is going to perform because you tested exactly the same file already.
          • Everything is version controlled and completely repeatable as part of the build process.

          2 admins can run 500-1000 systems in a site easily because there is really only one machine; the network. Logarithmic increase in effort with the number of systems.

      • by Colin Smith (2679)

        And a custom distribution designed for the purpose scales orders of magnitude better still. Build a system which does only the job you want, boot it over the network on 1, 10, 100, 1000, 10000 systems just as easily. Just a few people are required.

        I agree that a cookie cutter approach like this does not yeild the greatest performance per box, but it does allow for a better performance/administration ratio.

        Nope. You have too much state on the machine. You have binary versions, library versions, config files all to manage and distribute. Over time the individual machines diverge in their configuration, even with tools like cfengine and puppet. Which means that errors

  • CentOS is free RHEL (Score:4, Interesting)

    by Hero Zzyzzx (525153) <dan@@@geekuprising...com> on Friday October 10, 2008 @12:00PM (#25328603) Homepage

    So it's unlikely the decisions were influenced heavily from a budgetary standpoint. If they wanted to stay with a free RHEL derivative linux that's essentially identical to the one you pay for, they'd be using CentOS. [centos.org]

    They chose Ubuntu. Maybe they just like it better? I think you can factor cost of out the equation.

    • *whistles nonchalantly*

      Oh, hello! I couldn't help but overhearing you, and I feel I must expound some smug knowledge I have gained by actually R'ingTFA..

      Behold the quote!

      Wikipedia could just as easily have made the switchover to all Red Hat, but that would have cost more money, he said. "It would seem to me that if money weren't an issue here, there wouldn't be anything keeping them from upgrading everything to Red Hat."

      • Note that's an analyst quote, not a Wikimedia quote. I'm not sure they actually bothered asking Brion.

        • Re: (Score:3, Interesting)

          by somersault (912633)

          Indeed. I've tried Ubuntu a few times over the years and they do seem to have done a great job at making everything feel well put together. The first version of Ubuntu I used kind of had the same problem as some of the other distros I have used where you didn't feel like all the toolbars on the desktop were really meant to be used side by side, but they started modding everything to fit together and improved pretty quickly.. if I wasn't using OSX right now I'd probably be using Ubuntu.

          I recently set up a Wi

          • Apologies for that - sometimes I get completely on an unjustifiable rant without noticing.. :/ I was just trying to point out that I would have done the same thing, as I think Ubuntu is better integrated and more exciting than a lot of the main distros, while at the same time still being professional quality and easy to use.

      • by pembo13 (770295)
        So the OP talked abou Centos, you rebut by talking about Red Hat?
    • You'd think that, but consider this:

      If you install Redhat, it costs money, because they support it.

      If you install CentOS, it's free, but if you need support, there is none. You can get support from third parties, but not Red Hat. To get support from RedHat, they'd need to move from CentOS to RHEL.

      If you install Ubuntu, it's free. If you need commercial support, you can pay Canonical. They could, for example, pay Canonical for a year, and, if they can handle it on their own, not renew their support
      • by pembo13 (770295)
        Seriously? How are you on Slashdot? You sound like the typical manager. Are you saying that there are no commercial entities which provide support for Centos for a fee? If you want to make the argument that there is no first party support, fine. But don't say that there is no support for Centos for those who want to pay.
        • by joe_cot (1011355)
          Please, offer some suggestions for commercial support for CentOS -- because the companies that I see when I google for centos commercial support are small and extremely shady (including the overused stock photo of a smiling female support tech). I wouldn't be convinced any of those companies could manage servers for an operation as big as Wikipedia.
    • Re: (Score:3, Interesting)

      by jc42 (318812)

      They chose Ubuntu. Maybe they just like it better? I think you can factor cost of out the equation.

      There might have been other motivations. For example wikimedia does lots of stuff in mixtures of languages, and probably uses UTF-8 encoding for (nearly) everything. I've been trying to get a good feel for how different distros (and OSs) actually handle mixed-language UTF-8-encoded text. It's been slow going. Everyone claims to support it. But it never takes long to find serious problems.

      The biggest probl

      • by jc42 (318812)

        Oops; I just realized that it's really unicode.org, not unicode.com, of course. ;-) To be more specific, the URL is "http://www.unicode.org/charts/unihan.html". Type in 2EA88 and press the Lookup button, to see if your browser can display the char correctly. I'd be most interested in ubuntu systems that display it correctly. Something's wrong here, and I'm not finding any useful clues.

        (I wonder if there's a way to ask firefox or other browsers "What font are you using to render the selected text?" And

  • I love Ubuntu, I've been playing with different distros since early 2000 and when I tried Ubuntu in 2006, I got hooked. I've been using it as my OS ever since. I've switched my parents to Ubuntu because I find it easy to administer and it makes it easy for me to help them. Plus, I can SSH into their box to solve problems remotely. Bottom line, as a desktop distribution I love Ubuntu. It may not work for everyone, but for me it's a perfect fit.

    But as a server distro, I'm not so sure. I'm surprised that

    • Re: (Score:3, Interesting)

      by pak9rabid (1011935)

      But as a server distro, I'm not so sure. I'm surprised that Wikimedia didn't go with a distribution that's more established for server needs.

      As a server distro, it rocks. I've migrated from Gentoo to Ubuntu Server for my home server and I've never looked back. As for enterprise-level distros, I'd have to go with Debian. There's not a whole ton of differences between Debian and Ubuntu Server, but I would trust Debian's 'stable' repositories over Ubuntu's repositories in a mission-critical setting, as the packages in Debian's repositories seem to be more hardened as opposed to Ubuntu's packages, which tend to be more cutting-edge.

    • by JeepFanatic (993244) on Friday October 10, 2008 @12:23PM (#25328883)
      I'll probably get modded Troll for this but whatever ...

      But as a server distro, I'm not so sure. I'm surprised that Wikimedia didn't go with a distribution that's more established for server needs.

      If you have an argument to make about the OS's merits as a server then make it based on facts. Tell us why you don't think it's a perfect fit on the server. Don't just say "I'm not so sure" and leave it hanging there. Support your position with something that can be argued.

      • Re: (Score:3, Informative)

        by Drew M. (5831)

        Just another person who's dealt with Ubuntu in a large enterprise setting. I don't mean for these comments to be flamebait, but it may come off that way. I'd just like to see more attention put toward them.

        1. Incomplete automated installer. You can do nearly anything from Redhat's kickstart, but working with d-i doing partitioning, especially more advanced lvm and software raid setup is nearly impossible without some custom scripting hacks outside of d-i. Also, don't even ask what happens when you have a us

        • Re: (Score:3, Informative)

          by petermgreen (876956)

          uess how difficult it is to mirror the "pool" directory without also getting the packages from every other version of Ubuntu.
          Not too hard you just have to use the right tool, https://help.ubuntu.com/community/Debmirror [ubuntu.com]

          Why can't I just have a single directory I can rsync?
          IIRC the main reason debian introduce the pool structure is to allow packages to be shared between versions (particularlly testing and unstable) and therefore reduce the archive size.

  • Right now where I work was running 6 different OS's. Right now all the Point-of-sale system are XP-based, the laptops are a mix of Dell's and Apple, the router/firewall runs off Gentoo, and they have a couple OpenSuSE workstations.

    On the server side, the webservers were a mix of Debian, the application server and database server were both OpenSuSE. They remote monitor a number of digital signage/interactive kiosks using another Linux package (Debain-based I believe). At the end of the day each system had

    • Re:Simple is good (Score:5, Informative)

      by moosesocks (264553) on Friday October 10, 2008 @02:22PM (#25330533) Homepage

      I need to overwhelmingly emphasize that OS X Server is *barely* suitable for a production environment.

      I'm a big fan of Apple, and do appreciate the nice GUIs that they provided with OS X Server. However, it's not particularly stable, tends to break at odd intervals, and ignores many common Unix conventions, making it a huge pain to perform certain tasks, or do things not supported by the GUI.

      It's a nice start, but I'd be very cautious about adopting it across your entire server infrastructure. Using it to host certain Apple-y apps might be fine, though I'd rely upon Linux/BSD for serious server tasks, especially if you already have the staff/experience to do so.

  • OK, now I'm curious. The summary mentions a touch of Open Solaris, but the article doesn't. What did they decide to use it for and, more importantly, why did they make the exception?

    • by johnjones (14274)

      bet its the DB servers

      but also would like to know...

      regards

      John Jones
      http://www.johnjones.me.uk [johnjones.me.uk]

    • by brion (1316) on Friday October 10, 2008 @01:25PM (#25329749) Homepage

      These are on our new image/media-upload fileservers. We're trying out the wonders of ZFS (snapshotting for consistent backups and "rm -rf oops" protection, potentially filesystem-level replication, etc).

      Since they're an isolated service type it's not a *huge* burden to have them be a little funky (eg, we don't randomly have an OpenSolaris box in the middle of the Apache/PHP cluster), though if we could do ZFS on Linux without jumping through scary hoops we'd happily to that instead!

      We'll try it out for a while, and if we're happy with it we'll keep using it, if not we'll migrate to something else eventually (the machines should as happily run Ubuntu as they do OpenSolaris)

      • Re: (Score:2, Interesting)

        by eric2hill (33085)

        FTR, make sure your ZFS pools don't get above 80-85% full. Our 24T pool went from "pretty good" to "abysmal" when we jumped to 91% capacity. I freed up a bunch of snapshots and got us back to 81% and the performance came back.

  • by nick graham (1132955) on Friday October 10, 2008 @01:35PM (#25329871)
    [citation needed]
  • Wow, not Debian? (Score:3, Interesting)

    by TheDarkener (198348) on Friday October 10, 2008 @02:06PM (#25330281)

    I'm actually pretty surprised. I know Ubuntu == Debian in a lot of aspects, but... To go to a distro that is *mainly* geared toward the desktop market (I know they have a server version, blah) for something as huge as Wikimedia, I'd think they'd rather go to Debian since it's considered more stable (although maybe more outdated as well). I have been a Debian zealot since the mid 90's and moved my DESKTOP to Ubuntu later on - but still think Debian is a best fit for servers.

    Of course, there's always the whole "Ubuntu offers real support contracts" thing. That, in itself, is enough for any larger company to make the choice, right there.

  • by mnslinky (1105103) * on Friday October 10, 2008 @06:00PM (#25333125) Homepage

    A lot of folks seem to fail to realize that Linux has distributions. The kernel is the core of every linux system. From there, various organizations, Canonical being one of them, package the userland, a package manger, and an update service together, and call it their own. It's how Linux has worked for many years.

    That being said, what you're really shopping for when seeking a Linux distribution is all the stuff around the Linux kernel. That is where Wikimedia found the benefit. Regardless the timeline, Canonical offered them a pro-bono support contract, there is evidence of long-term update availability, and an overall 'good' package set.

    Also, for the record, Canonical does offer a server-edition of Ubuntu. See their website for more information.

IF I HAD A MINE SHAFT, I don't think I would just abandon it. There's got to be a better way. -- Jack Handley, The New Mexican, 1988.

Working...