How Facebook Runs Its LAMP Stack 111
prostoalex writes "At QCon San Francisco, Aditya Agarwal of Facebook described how his employer runs its software stack (video and slides). Facebook runs a typical LAMP setup where P stands for PHP with certain customizations, and back-end services that are written in C++ and Java. Facebook has released some of the infrastructure components into the open source community, including the Thrift RPC framework and Scribe distributed logging server."
One question: (Score:5, Interesting)
About how much has Facebook saved by using Open Source Software? I ask because I am not familiar with licensing costs from competing solutions. Thanks!
Mark Zuckerberg is a GEGAWNTIC DOUCHE w peach fuzz (Score:2, Interesting)
Ah, the sweet ironies (and hypocrisies) in life. There's something beautifully creepy about a person fighting so hard against the same thing they fought so hard to create. In today's case, the culprit is Mark Zuckerberg, the young man more responsible than perhaps any other for his generation's obsession with displaying itself publicly on the internet. The New York Times has reported that a judge turned down Facebook's request to have "unflattering documents" about Zuckerberg removed from the website of Harvard magazine 02138.
At the center of the issue is an article in 02138 about Facebook's evolution and the subsequent lawsuit from classmates asserting Zuckerberg stole the idea and computer source code to begin his own project. The New York Times calls the article "sympathetic to the plaintiffs's account and questions the validity of Mr. Zuckerberg's claims."
The 02138 article also contains Zuckerberg's handwritten application to Harvard, and a journal that "contains biting comments about himself and others."
Perhaps Gawker summarized it best, saying, "This is the same dude who made billions from a website that allows you to let everyone in your friend network know when you are peeing."
And now he's mad that a private persona he would like to keep that way has entered the public domain. Yes, the sweet ironies and hypocrisies in life: why do we love them so much?
Re:A hodge podge mess (Score:1, Interesting)
http://pastebin.com/f23937796 [pastebin.com]
Looks like shit to me
Re:Come on guys (Score:2, Interesting)
While I think Facebook is nothing more than one big popularity contest, I have to agree.
At least most of the stuff on Facebooks website works.
With slashdot, half the time clicking on a comment to expand it doesn't work unless I refresh several times or copy and paste the link into a new browser.
The right hand sidebar will say 'freshmeat' and show stuff from linux.com and vice versa.
At first I thought this was because I still used IE and that was the problem, being that slashdot doesn't cater to IE users, fine. So after I switched to Chrome I figured it wouldn't be an issue, yet its not any different.
I still can't expect expanding a comment to work, I still get crap listed as fossfor.us showing freshmeat entries, 'get more comments' doesn't do shit half the time.
As I've said countless times, programming in PHP and using MySQL 99% of the time means you don't know what you are doing. There are, however, those few large sites that use it that can actually justify its usage because it fits, but only if you actually know what your doing.
I have websites powered by PHP, ASP.NET, ASP, Java, and C. Some of those are good fits for what they do, some of them aren't and I've learned that the hard way. I've also learned that in most cases things are written because a developer 'knows' a specific language. My personal opinion is, if you only 'know' one language, you aren't a programmer. A real programmer can use just about any language given a good reference manual, and can be proficient in that language rather quickly after starting to work with it.
Unfortunately, most people who call themselves programers, aren't. They just happen to be able to get by with a language they've been spoon fed in the past long enough to hack out some POS that barely manages to get the job done and will drive any sane programmer absolutely mad when they get stuck taking over after the original devs are found to be incompetent.
Makes you wonder how many online services have failed because of arrogance and ignorance of the developers.
Re:Related /. article (Score:3, Interesting)
It takes pretty much 0 work to make LAMP continue to function. Its for all practical purposes, set it up once (properly) and forget it.
It takes work to make the applications on top of it function continually as thats where the change occurs. LAMP isn't going down on its own, it'll appear to 'go down' because of the 'mostly useless modules' that work along with it fail, not because LAMP does.
I would expect the admin(s) that care for 'the core LAMP platform' spend most of their time doing other stuff. In reality, its probably only multiple to avoid any single person holding to much knowledge and to maintain coverage while that person isn't at work. I just can't imagine they do a whole lot of work 'keeping it running', with the exception of handling database growth and performance, which is more likely handled by the people who design and work with the applications that use that database.
Re:Not very well (Score:5, Interesting)
PHP, as a language, is more than capable of handing four requests per second (which can be said of pretty much anything other than punch cards).
Writing bad code in PHP, however, will of course slow things way down. Just like not having indexes on your databases, or doing stupid/unnecessary JOINs. Or not caching properly (see: Wordpress). Writing fast and efficient code in any language is easy enough provided you're a skilled programmer. Facebook, unfortunately, started off as Zuckerberg paying a friend with some web skills to build out a system, and it grew so quickly that replacing the code (or, rather, the DB schema) with something that doesn't suck probably became near-impossible. If you write code with scalability in mind, it's not a tremendous problem.
Of course, nothing is going to cope well with the sheer volume that Facebook deals with. There's plenty you can do along the way to help yourself out, which Facebook may or may not have done. You can bet that nobody thought the site would ever have 200MM users when the first lines of code were written; they probably never expected 1% of that. Writing intelligent code is the most important part of scalability - writing smart DB queries and minimizing the number required probably being the biggest part of that. Have your MySQL servers instead of PHP do some calculations in queries (hashes, query-related math, etc) usually doesn't hurt since you're generally offloading CPU-intensive operations to a disk-bound machine (i.e., has spare cycles).
There's all sorts of tricks and optimizations. Some are language-specific, and some aren't. But making bad decisions early on is a lot harder to fix than an inefficient foreach loop.
Re:Not very well (Score:5, Interesting)
They have somewhere in the region of 5,000 servers in their main datacenter and (I believe) others scattered around the world, but restricting it to just that main center, that means each server is handling around 4 requests per second
I somewhat doubt every single one of them is a dynamically driven webserver. Probably at least half are databases, search servers, caching servers, backend appservers, file servers, CDN type stuff, backup servers, hot spares, admin servers, staging machines, etc.
For example: Newzbin has 5 webservers in main rotation; it also has 7 search servers (plus one development machine with similar specs), 6 database machines, 2 backend systems running most of our cronjobs, 2 admin servers, 1 web development server, and 2 systems for building and deploying OS's from. As far as load is concerned, the backend stuff is far more important than the frontend. Sure, we could rewrite the main site in Java or Scala or C++ and get away with 3 webservers and still be N+2, but trust me, those extra two or three webservers is not a significant cost next to that of development.
I can either spend £5k on extra equipment (plus occasionally boosting our space and bandwidth costs, but those are dominated by other systems already), or I can spend £70k a year on another developer, who *still* won't allow us to match our development speed with PHP, and then rewrite tens of thousands of lines of code, likely into much more.
Much of our backend is written in C. That's where the big payoffs for efficient languages is, not a bit of database-limited HTML rendering. Judging by how many big sites are still running PHP, Python and Ruby for their frontends, this would seem to be the case elsewhere, too.
Re:Not very well (Score:1, Interesting)
I know someone who works at Facebook.
FUD-o-riffic.
According to my contact, PHP is a serious problem there. It scales poorly, requiring Facebook to throw more hardware at it.
They're not generating the whole page every time, unless they're big fucking idiots.
On the contrary, they hire lots of smart people. But they have legacy code that was not well planned for the size of operation they have now, and it has been painful to try to clean things up after the fact.
Re:Not very well (Score:4, Interesting)
True. But writing cache code is not easy and makes your code more brittle. It increases the likely hood a user will interact with the website and do something, say "update my profile" only when they click "save", their profile hasn't updated yet because your cache sucks. Then you have to plaster your site with bullshit messages about "please allow 30 seconds to see the change".
But what is far, far, far worse is you are allocating programming resources to non-features. Caching is a non-feature that adds zero value to your website. Your users dont interact with your cache. They interact with your website--and I bet if you are like any moderatly complex site, you've got all kinds of bugs that annoy the hell out of them. So rather than allocate your developer time to fixing those annoying bugs (thus adding value) or adding new features (thus adding value), you are stuck pissing away time optimizing bullshit your users never see.
So yeah. You can cache the fuck-all out of your website. But only by stealing developer time away from working on features that make your users happy. Of course if you wrote the thing in C instead of PHP, you'd have a different set of development problems of which I could only have nightmares about.
In otherwords, engineering is always a tradeoff. Use PHP (and MySQL) and piss away developer time on caching the fuck around their weakness. Use a compiled language like C and piss away developer time doing fuck-if-I-know because you didn't free mallocs or had to write a template language from scratch or some insane shit like that. Pick your poison!
Re:Not very well (Score:5, Interesting)
As you say, there is a tradeoff. It doesn't matter if you're fighting the need to cache intelligently in PHP, or the need to get everything right because you're developing a complete solution in C (or whatever) or the need to interface to someone else's system for serving pages if you're using something in between. It also doesn't matter if you're using a servlet technology, or you're punching bits out on a paper tape and feeding it into a machine which converts it into EBCDIC and... you get the idea: don't fuck up.
In any case the whole argument is fucking stupid because: PHP is not implemented in PHP. And Facebook is not implemented in pure PHP. See summary: Facebook runs a typical LAMP setup where P stands for PHP with certain customizations. At some point you have to ask yourself how many wheels you want to reinvent. If you extend PHP you can reinvent fewer wheels. I'm not sure it's the right answer, but I'm sure it's not a horribly wrong one. I'm also absolutely certain that barring some massive development in processing the future is only going to involve more parallelism and more clustering, and that if you expect PHP to scale on a single machine you're a bozo.
What I have personally noticed about using PHP is that a single page load can consume an absolutely insane amount of memory. This problem, too, is mitigated or eliminated by aggressive use of caching. In order to cache properly you need to do something intelligent with your data store, which I think is where most people fall down. Having looked into the mishmash that most CMSes produce in the db is enough to make you weep. I long for an elegant object-oriented CMS based on practically anything, but the simple truth is that PHP is by far the easiest thing to get going without spending any money and that has probably done more than anything else to propel it to the head of the FOSS class, at least in terms of popularity. A staggering number of quite excellent websites seem to be built with it as well.
In summary, I reject the notion that PHP is a serious limiting factor for the majority of websites and that most of those for whom it is have failed to understand PHP. (Not that I'm any PHP guru.) It's true that a clustered web application is significantly more complex than something which is not clustered. However, it's also [potentially] far more scalable. At some point you simply run out of machine. When you can't get anything better from Sun (AFAICT they make the single machines which can handle the most threads today) you're going to have to cluster, even if it's only to two machines. At that point you'll have far more complexity invested in having a single system image to work with and the pain of moving to a cluster will be magnified that much more as well. If you accept the notion that clustering is today and for the foreseeable future the best way to handle scalability (which I admit is at this point not a proven notion, but is at least a well-supported theory) then the idea that PHP is a major limiting factor is just plain silly. Sun is circling the drain, and everyone else is concentrating on clustering. Your call...
Re:Not very well (Score:1, Interesting)
You might note that the presentation covers this to some extent. They mention some customizations they've made to PHP in the area of caching the bytecode from the PHP source files. They mention that PHP, by default, will stat the file system every 2 minutes to see if files have changed. From the sound of the presentation, they've probably customized it to check for updated files only when explicitly instructed to since they don't change the code that often.
Don't get me wrong...I still think PHP is completely unsuited to a site the size of FaceBook, but it's not reparsing the PHP source file for every request.