Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Linux Software

Ask Slashdot: Optimizing Apache/MySQL for a Production Environment 143

treilly asks: "In the coming weeks, the startup company I work for will be rolling out a couple of Linux boxes as production webservers running Apache and MySQL. Management was quick to realize the benefits of Linux, but I was recently asked: "Now that we're rolling out these servers, how do we optimize out of the box RedHat 6.0 machines as high performance web and database servers in a hosting environment"?
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Optimizing Apache/MySQL for a Production Environment

Comments Filter:
  • by Anonymous Coward
    Haven't done it with mySQL, but with Postgres/PHP/Apache I allocate 1Mb of RAM for each httpd child, and then enough left over for the entire db in RAM. I also run a separate server to feed images only, off a RAID stack. Make sure that the apache htdocs is on a separate partition and controller than the db files.
  • by Anonymous Coward
    My mail server was showing some strange performance problems under high loads which were caused by mysqld (used for authentication) running at nice +5. Apparently safe_mysqld did that behind my back. Under high loads mysqld would be put on the backburner.
  • by Anonymous Coward
    The Apache/MySQL/RH is a common combination that will continue to grow in popularity for small and medium size installations. Judging from the responses that I have read no-one has performed a serious analysis of the performance issues of this trio. Most of the suggestions are useful first-pass ideas but there are likely other specific tuning approaches that can be used.

    I ask that you document your development specifically focusing on any novel solutions that you found that increased performance. A faster CPU doesn't count (sheesh). Also, put this info together in a concise readable format and provide it to the Apache site or the Linux Tuning site (forget the URL at this moment.) It's very important that work of this type be formally documented and accessible.
  • by Anonymous Coward
    You'll probably need to tune your configuration on several levels: hardware, OS, application and sql/other.
    Hardware Tuning:
    - Use a caching raid controller fully populated with cache configured for raid level 0+1.
    -Use IBM or Seagate 10000 rpm SCSI drives with lot's of cache.
    - Consider multiple SCSI cards (or channels) to separate the OS + logs, indexes and data files on to separate raid arrays.
    - Also strongly consider using separate web and database servers, so each can be fully optimized for its job.
    - Obviuosly use as much ram as you can afford. (preferably 100 or 133 mhz)
    -Uses multi-processor computers for the database and web servers.
    - Connect the web and database servers using a back end network, separate from the internet connection.

    OS Tuning:
    I'm not terribly familiar with tuning the Linux OS, but I suspect that there are many resources already available.
    In general you'll want to:
    - Optimize the block size on your raid array's for mazimum performance (trial and error using bonnie or the like)
    - Optimize the amount of memory used for cache.

    Application Tuning:
    - Look at http://www.mysql.org/Manual_chapter/manual_Perform ance.html#Performace for a list of parameters to tune Mysql with. You'll either need to talk with TCX about using the available optimizations, consult with someone else who does, or spend a lot of time with trail and error, since these optimizations are very dependant on your hardware configuration and what type of work you'll be doing with the databases (ie. write intensive, read intensive, or both)
    - Look at http://www.apache.org/docs/misc/perf-tuning.html for information on tuning apache.
    I would first suggest using PHP, but barring that, I would definately use mod_perl. There are probably a lot of other sources for tuning apache available on the Internet.
    A suggestion: optimize the number of server children with the number of availble processors.


    SQL/other Tuning:
    Understanding how to properly build tables and indexes is somewhat of an art, but you can really make or break the whole site with proper use of SQL and indexes. I'd either spend some time learning table/index design, and coding sql for performance, or consult someone who knows.

    Hope this helps a little bit.

    Jerry
    jerry@bellnetworks.net
  • by Anonymous Coward on Sunday August 29, 1999 @04:32AM (#1718673)
    At the Apache.org web site there is a guide to optimize Apache's performance.

    Also Dan Kegel wrote an interesting web page in response to the whole Mindcraft NT/IIS vs. Apache/Linux fiasco and on that page are several detailed measures to improve Apache's performance under Linux:

    Dan Kegel's Mindcraft Redux page [kegel.com]
    Apache Week 'zine [apacheweek.com]

    ...as for my own personal experience w/ Apache I learned that when compile Apache, remove any Apache modules you won't be needing saves plenty of RAM, and in the httpd.conf file you want to set StartServers, MaxClients, and MaxRequestsPerChild so that Apache does not spawn new children too often -- the trick is before you start Apache look at "top" count the number of processes, now start Apache under normal traffic conditions, look at number of processes you're running now to see how many http children are running -- whatever that number is add 10, and that should be your StartServers setting. The MaxRequestsPerChild default is 30 but I like to crank it up to 300 or more so that http children are not being killed and recreated too often (the reason for that setting was to avoid possible memory leaks from sucking up all your RAM which hasn't been a problem with the httpd's I've worked with)

  • by Anonymous Coward on Sunday August 29, 1999 @03:51AM (#1718674)
    Basically, I just winged it. My site started out with maybe ten thousand hits per day, but quickly (over the course of two years) ramped up to about 5 million hits a day. I just hacked together some Perl scripts, and when I need to make changes, I just try 'em out on the production server. Who needs beta testing? If there are performance problems, I just buy faster hardware. If there are stability problems, people are understanding, after all, I *am* using Linux.

    Sincerely,

    Rob Malda
  • Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.

    Not only that, turn them off. (AllowOverrides None, IIRC) If you simply don't use them but have them enabled anyway, you pay the price WRT all the stat(2) calls the server does looking for them.

    This is all IIRC, but I usually have a good memory. Then again, I did just wake up.

  • Basically, it comes down to: Postgres is much more complete (it has more of the SQL spec implemented -- transactions, etc.). MySQL is much faster. It all comes down to how you expect to use it. If you are going to be doing complex joins and transactions and such, MySQL probably won't cut it (yet), otherwise, MySQL (most definitely!) makes up in speed what it lacks in features.

    There's obviously more to it than that, but I'm not aware of any specific comparisons...
  • ---plug alert---plug alert---plug alert---

    My fhttpd [fhttpd.org] with combination of MySQL and PHP can be considered, too -- it allows some configuration options and optimizations that Apache doesn't provide -- you can limit the number of connections to database, use separate userids for sets of scripts, etc. If you want even more performance, program in C or C++ can be written as its module, and the API [fhttpd.org] is much easier to use than one of Apache.

  • By LIKE I was refering to full substring matching using LIKE "%foo%". "foo%" will use the index but "%foo%" won't and on my server running 4 %% types would bring it to a crawl. I finally had to go with exporting the table to a flat file after updates and use an awk script to search. It can handle 40 concurrent searches with awk.

  • by rodent ( 550 ) on Sunday August 29, 1999 @05:03AM (#1718679)
    Personally, I designed and currently admin a site that gets about 1 mil hits/day. Over a weeks time it averages about 50 queries/second with peaks at 500 queries/sec. The setup is dual p2/400's with 512 megs ECC (soon to be a gig) and the db's on a lvd scsi drive. The db's run a total of 2 gigs. There's typically 50 apache processes running at a time.

    As for optimization, definately check your queries and always use keyed fields and == queries. Doing like queries will kill your performance to being unusable on decently large tables (>100k records). Definately read the MySQL docs concerning RAM usage and the various switches to optimize it's RAM usage. That is extremely important.

    As for Apache, don't use .htaccess at all costs and only compile in required modules. Also check the tuning FAQ mentioned above.

  • "HTML Programmer", eh? Talk about a skill that will be obsolete in 20 years, when we're all using XML and have WYSIWYG XML editors...

    BTW, programmers write programs, not text. So "HTML Programmer" is a misnomer in the first place -- that should be "HTML page creator".

    -E
  • FreeBSD is Apache's main development platform? I don't know where you got that from. It's not like the Apache developers sit huddled around a single box and develop in a dark room. I don't think fbsd is the prevalent platform among Apache developers.
  • You should consider availability in your plan. I recommend having a hot spare box. This is quite easy to do for the web server side. Just make sure the content on the two servers is synced up. Perhaps using rsync( or rsync through ssh for a little more security ).

    On the database side theres a feature of mysqld called --log-update. Call it using mysqld --log-update=/usr/mysql/update_logs/update. This will create a log of eveything that changes in your DB and can be reinserted back into the mysql monitor. To go along with this everytime you call `mysql --flush-logs` a new update file will be created as update.# - where # increases for each call. At this point there are quite a few scripts written to insert this log file into another DB - most of them use perl DBI.

    To increase the performance of your setup there are several options noted in the mysql manual. But none of them will do a whole lot of good if the queries and tables you construct are poorly designed and indexed.

    Depending on what scripting language you use theres probobly a way to compile it into apache. Whether it be mod_perl,pyapache or PHP. I would plan on doing this. A good way to speed up your system after this is to run 2 httpd servers.

    The first server compile plain apache with mod_rewrite and proxy support, the second server compile in your application support. If you put all your applications in one directory you can easily proxy to them with proxypass.

    ex. proxypass /perl server:88/perl

    This way things like images and html will be served by a webserver that only takes up 400-500K instead of one that could take up to 10M-20M depending on how many scripts and libraries are in memory(Of course some of that is shared). When your server gets hit hard you'll probobly notice having maybe 5 - 10 times as many regular servers as there are application servers this way.

    A more advanced thing with this setup is to utilize your backup server. Its will take a little work but you could have apache proxy to a list of application servers that exist in a config file, and have this config file altered based on system availabilty. At this point though it may just be easier to get localdirector, unless your organization is really strapped for cash.

  • I've thought of this approach too. It should be very fast once you're done - maybe faster than any other way. But it seems to me the going will be very slow. C/C++ is much more work to write than Perl, PHP and the like. And you'll need to learn alot about the Apache module API. I looked at this API in some detail and realized how complicated it might get.

    You could of course start it out as a module - forgetting the CGI version. If you're leaking memory during development, keep 'MaxRequestsPerChild' at a very low value - 10 or maybe 1 even. Then increase it to 100 or whatever when your leaks are under control.
  • *cough*bullshit*cough*

    Think about it... do you want to do 64k XOR's 32-bits at a time on a single strongarm 233 (at best) or half/full cache lines at a time using SIMD on an SMP?

    The hardware solution. Those cache lines are filled with useless data that should never have been competing with my process data. Hell it should have been off of my local bus by now, instead of filling my cache. Yeah your SIMD can take care of entire cache lines but it's gotta shove it back out to the bus, not to mention compete with DMA and system interrupts in doing so.

    I bet you think that WinModems and WinPrinters are better for the same reason, because your P3/500 has more power than the little microcontroller and/or DSP that are present on full hardware-based solutions.

    A carefully designed hardware solution will beat out a software solution every time in terms of speed. If my processor (or any number of them) is busy looking after the drive array, that's one less thing that it could be doing for me.

    Your comment on XORs is bullshit. A RAID controller could be designed with an integer DSP which could blow your P3/500 Xeon out of the water in terms of integer operations, especially something as mundane as XOR and parity checking.

    Actually I think that's exactly what my DPT RC4040 cache/RAID module does. The host processor on the PM2044UW is a Motorola 68k but that's only for busmatering and host functions. All the RAID is done on custom silicon. I've got 64 megs of cache on the module, meaning it could very well likely chew through more data faster than your cache subsystem could keep refilling a dozen times over, keeping in mind that you use your moderately-loaded system for something other than a RAID controller.

    In a similar vein, when one of my drives finally opens up a black hole and all the data on it disappears into oblivion, the hardware controller works on keeping the system stable while rebuilding the array, where your processor would now be spending even more processor resources doing the same.

    Remember: every time you do something in software it's using CPU time that could be used to actually run the computer, rather than run the peripherals. And the raw speed calculations are bullshit because the CPU has many, many other things to do than just fill its cache lines with SCSI data.
  • So? I've known people who consider LOGO, or even COBOL a programming language.
  • by jd ( 1658 ) <imipakNO@SPAMyahoo.com> on Sunday August 29, 1999 @05:18AM (#1718686) Homepage Journal
    Here's my quick list of things I do, when tuning the webservers I've set up in the past. Note: I offer NO guarantees to the usefulness of this information. For all I know, it'll turn your pet hamster into a frog.

    0) If you have LOTS of RAM, compile Apache, MySQL and optionally Squid with EGCS+PGCC at -O6. The extra speed helps.

    1) Guesstimate the number of simultaneous connections I'm likely to have.

    2) Guesstimate how much of the data is going to be dynamic, and how much static.

    3) IF (static > dynamic) THEN install Squid and configure it as an accelerator on the same machine. Give most of the memory over to Squid, and configure a minimal number of httpd servers. You'll only need them for accesses of new data, or data that's expired from the cache.

    4) IF (static 5) If you've plenty of spare memory, after all of this, compile the kernel with EGCS+PGCC at -O6, but check it's reliability. It's not really designed for such heavy optimisation, but if it works ok, the speed will come in handy.

    NOTE: Ramping up the compiler optimiser flag to -O6 does improve performance, but it also costs memory. If you've the RAM to spare, it is sometimes worth it.

  • Is this for real? Rob Malda posting as an Anonymous Coward? What's up with that Rob?
  • Yeah, I posted the Rob Malda as Anonymous Coward question - mostly because I wanted clarification on it. It seemed very odd, and yes, I knew it was probably just some anonymous lame-oid. Now ... to the rest of you (mostly A-Cowards as well) who choose to harsh on me because I merely question this ... Screw you.

    Trust me ... I would put my IQ points up against any of yours any day of the week. And yes, I did list HTML on my resume as a programming language, because in the positions I would be interested in ... there's very little reason to treat it otherwise. By profession, I'm a researcher, not a programmer.

    So, as I said, to everyone who has so little better to do than scan Slashdot waiting for opportunities to flame others (under Anonymous Coward status), screw off.

    Cordially yours,
    David
  • Mount the /home/httpd filesystem with the noatime mount option, then there will be no writes generated from read-accesses, and your ext2fs will effectively work as a ram-disk, if you have enough memory. Only, it's a helluwalot easier to just change stuff where it resides, and know it will be written to disk, instead of back-forth-copying tar images...

    Second benefit: if you really get low on memory, you're fucked with a 1 Gig RAM disk, whereas the disk-cache will quickly be thrown away and used for whatever memory hog you have running.

    ram-disks are good for booting over-modularized kernels _only_.
  • Software RAID is usually faster than hardware RAID.

    It's also more flexible (think RAID of RAID of network block-devices).

    It's also cheaper.

    It has features that some hardware controllers don't even have (like background initialization).

    What kind of idiot talks about software RAID without knowing jack about it ? :)
  • In 20 years from now, there will be something else that we're using. XML won't be it.
  • When I set up a production environment (regardless of operating system chosen), the first step is always to have policy and change control.

    Change Control

    You must have change control or you will suffer downtime. Downtime represents a transaction rate of 0 trans/sec, which is clearly unacceptable.

    Development and Acceptance Test

    You must have a separate machine for development, and another machine for acceptance test. Of the two machines, only the A.T. machine must be identical to your production server. Otherwise A.T. simply cannot replicate the environment you're going to test, and thus any testing is at best misleading, or at worst, completely invalid.

    You must create a set of repeatable build instructions that takes you from a fresh blank machine to a stable, reliable, working production system. And you should have a set of tests that thoroughly gives the resulting systems a complete workout, including sustained load, boundary condiditions (such as empty rows), and attacks against the system whilst trying to continue to process transactions.

    Finally, the best advice I can give you is don't skimp on reliability and availability. Buy RAID with hot rebuild. Buy a server with redundant PSU's, and not a handmade machine. Buy an additional NIC per machine, and put that on a different switch - dual path everything.

    In terms of SQL and web based stuff, from a security standpoint, it's always advisable to have your SQL server behind a firewall (or at least on a separate private network).

    In terms of speed, I've always found that having enough RAM to allow several outer joins to complete in RAM really helps. As someone else mentioned, it's a good idea to index columns you select on a regular basis.

    Make sure you can dump the database online - stopping the dbms whilst a dump takes place is unacceptable; if it takes 30 minutes, that's reduced your availability from near 100% to 97.9%. That's bad.

    Good luck!

  • I've seen it mentioned several times in this thread to not use .htaccess. While I can see the validity of this argument for allowing/denying access vial IP/host or something else.

    Is this truly less efficient than a database query for username/password type access?
  • I can't agree more on the Database.

    I have no experience in Cold Fusion so I'll decline comment on that.

    I'd caution anyone trying to Netscape's web servers on HP though. In my experience it likes to run away alot. I've been away from the environment with that configuration for a while and maybe they've upgraded OS or servers, but we were getting it hogging 90% of our CPU several times a day.
  • Actually, last I heard. They couldn't even handle the load of hotmail with NT. The scrapped it right after buying it and went back to FreeBSD.

    I guess, if that's still true, it just shows what kind of Unix admins/programmers work for MS :)
  • Ya know, I truly find it hilarious that people believe hardware raid has some huge benefit...

    I've been using software raid, both over normal SCSI and hardware raid, in production servers for quite some time... hardware raid, even fast ones like DAC1164P, are going to get smoked in something like RAID5...

    Think about it... do you want to do 64k XOR's 32-bits at a time on a single strongarm 233 (at best) or half/full cache lines at a time using SIMD on an SMP?

    The best approach so far has been to allow hardware to handle raid0 for simple striping and disk management, and leave the XOR's and large chunks done in your main processors (after all, this is all streaming to prefetching helps a good bit) if you can afford the cycles.

    Refer to linux-raid archives and my performance postings there with any questions

  • Nothing could be further from correct.

    If read-only, raid 1+0 allows striping reads across all physical drives, so you get the performance benefit of raid0 with the mirroring (and drive death survival) of raid1. If you don't care about data redundancy (you might want to care about making sure your site is available), a pure raid0 will still get data off drives faster than a single drive.

    Of course, I'm a strong software raid advocate, with a switch to hardware when it's cheaper to offload those cycles to other chips rather than speed up (or increase the number of) main processor(s).

    Sites that have a lot of writes, OTOH, have to balance data amount available vs. performance (etc) wrt raid1, 5, or 10.

  • Was it an indexed field? If so, the search should have been binary on == and possibly binary on LIKE (not sure though)... Any int fields you search on should definitely be indexed :)
  • First, PLEASE don't point people to that horrible howto... as soon as Linus will accept the real software raid versions (and howto) available over at:

    http://metalab.unc.edu/pub/Linux/kernel.org/pub/li nux/daemons/raid/alpha/

    and

    http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

    Second, realize 0+1 (typically 1+0, or RAID 10) only gives you half of total physical space in effective space.... sometimes you can afford that, sometimes you can't... and you still generate the scsi bus loads of the full drive set :)

    In the very typical (especially in these situations) case of reading the databases, it's worth agreeing that 1+0 becomes 0+0 (since you can split reads across a raid1, assuming no failed drives)

    Last, as a side note to the mysql part, try to use isamchk (if the db server can have any down time) for pre-sorting your database instead of doing the sorting as part of your SQL

  • We have successfully covered the topic if you are going to try and use fewer dollars. But if money were no object would you consider other technologies besides Apache/PHP/MySQL/Linux?

    Such as Netscape/Cold Fusion/Oracle/Sun?

    Besides not being able to call on the experience of all of you guys when the going gets tough, what are the other drawbacks besides the obvious (MONEY)?

    MySQL is not a solution for me. It lacks many features that Sybase or Oracle provide (can you say TRANSACTIONS?). Netscape and Cold Fusion have better integration of security. Has a benchmark been done on PHP vs Cold Fusion? PHP seems to be able to handle Cold Fusion's role pretty well according to PHP's site.

    Is the answer truely a mish mash of the both? Pay for Netscape for the SSL and Oracle for the STUD (I still like Sybase better) of a database that it really is, but go freeware where you can?

    Just looking for a couple of good opinions.

  • by felix ( 7014 ) on Sunday August 29, 1999 @05:54AM (#1718701) Homepage Journal
    So some of the best things you can do have already been mentioned - split out your database from your front end webservers, let the backend have it's own machine and run raid 0+1 on the db server. The frontends won't need the raid since they'll be serving a lot of the static stuff out of cache.

    Some other ideas, are to split image serving onto it's own apache, not necessarily it's own box. This apache can be completely pared down to absolute minimum modules, since all it will be doing is serving up static images. It also let's cache be used efficiently, since mostly the common images will be stored. As opposed to common images contending with common text files for cache space if images and content are served from the same apache.

    Also, what are you using in apache to create dynamic pages and connect to the db? Use long running processes where possible, which means pick mod_perl, php, fastCGI, servlets, etc... over plain cgi scripts. This will save you lots of cycles and also let you have persistent db connections. Always a very good thing.

    Taking the splitting out of machines to the next level, you could also try splitting all of your dynamic content to it's own machine, mod_proxied through your front end apache's. This makes the front ends very small since they barely need any modules installed at all. It also gets some extra performance out of your dynamic content apaches. Of course you're running a lot of boxes now. :)

    Read this [apache.org] if you're running mod_perl. And read this [mysql.org] to optimize your db.

  • I work in a research lab that does a lot of databases on Linux. We started off with msql and then graduated to mysql. We were initially running redhat with msql and slowly moved to Debian, since we felt it was a more stable server distribution. Also it was more configurable, and we were able to tweak almost anything in the system to it's limit. Recently, we moved to Oracle 8i, but we kept our mysql around.

    Some of the thing you might need to know. If your going to do some serious databases, I recommend you spend more money in faster harddisks (SCSI preferble, multiple disks (oracle runs very nicely with the database spanned over 3-4 disks and the program running on another disk -- partitions wont do ). Have a generous amount of RAM and swap. If your making this a database box, dont use it for anything else. Even hosting a web server is not a good idea (As far as I'm concerned). Use WebDB if you like and host the database box seperately with just the database running as the main application.

    Make sure you have a stable kernel. Make sure you have a secure system. Use ipchains to block out anything but local and remove all telnet and other daemons. Security is something a lot of people forget when making large databses.

    Make sure you make daily, if not hourly backups (based on how sensitive your data is). RAID is a good way to keep your system running. Also if your database is web based, you might need to have 2 or 3 boxes set up identically and databse queries being distributed over all of them.

    With Oracle, read everthing, they have a lot of tweaks listed on their pdf files and documents that come with the dist. Read all of them. Some tweaks are to the kernel. So pick a good stable kernel and stick to it. Forget about monthly kernel upgrades. I recommend yearly or every 6 months kernel upgrades. Software wise, if your doing Oracle 8i, make sure it's a glibc2.1 system (RH6 and debian potoato (we use potato, even though it's unstable, it lets us tweak the system and gives us the most familier interace ).

    On mysql, it might help to read some of the online tweaks, also it might be a good idea to compile the server yourself, instead of using the one that came with your dist. Or compile it and copy it over what came with your distribution. Dont use msql unless there is no other way to do it.


    And good luck.
    --
  • Not to be heretical or anything but I was doing some benchmarking on MySQL's LIKE versus == matching on int's. It was actually faster using LIKE. I don't know why but I suspect it's because LIKE uses some sort of binary tree to find the int and the == tries to walk through them. This is not the case when you're using like to match a string or substring, in that case == seemed to work better.

    -Evan
  • Apache was written with reliability as a higher priority than raw speed, hence it's multiprocess rather than multithreaded. Threaded servers will tend to be faster, because there is a definite overhead for starting new server processes rather than starting a new thread, but on the other hand, it's way way harder to write a solid multithreaded server, so if your threading server has a problem, the liklehood is the failing thread will take out the rest of the server. If an Apache process fails, it'll just quietly die, while the server will continue serving.

    As for multiprocessor hardware, Linux works just fine for me. I'm writing this on a dual P3, and my other workstation is a dual PPro. I haven't tried it on boxes with > 2 processors though. For a web server, more processors are unlikely to get you any benefit, however. I'm pretty sure that apache on a single processor will easily saturate your network bandwidth, no matter what it is. Now if you're doing really complex CGI's, like, for example, some kind of real-time stock calculations, that require a lot of processing, then multiple processors might help. But if this is the case, I'd probably advocate hooking up several boxes in parallel (Mosix [huji.ac.il] is designed for this) and farming your CGI's out to idle processors on separate machines. Your Database might also benefit from multiple processors, but (for a properly indexed DB) probably only in extremely liminal cases (very, very large DB's), and if so, you should have it on a separate machine too. In general, spend the extra money on RAM instead of another processor. Your clients will thank you :-)

  • Depends on what you want your DB to do, really. I can't speak specifically to syBase (I've also heard good things about it) but I know why we use mySQL. It's fast, and very low overhead for queries.

    The caveat to this, of course, is that you must know how to set up your database right. I recently had an opportunity to play around with a fairly large db (upwards of 400,000 records) on mySQL. The records represent people, and some of the fields are birth month, birth date, last name and first name. I wanted to select las and first names for people who were born today. So, with no indexes, the query selected about 600 records, and took 11.8 seconds. Yes, that's right, 11.8 seconds. I was floored! Here's me thinking "mySQL's fast! It'll work great!" Well.

    So then I went back through and indexed (birth month, birth date), checked that I had done it right with EXPLAIN, and ran the exact same query again. This time it took 0.8 seconds. A total time savings of 11 seconds. I learned an important lesson that day... Always index everything you're going to use as a key! With this in mind, mySQL is indeed damn fast, and low overhead.

    Now, the other thing I can't really speak to is reliability. mySQL doesn't really support referential integrity, and I guess it's up to you whether you need it or not. I've seen my share of M$-trained database folks who use CASCADE as a cheap crutch to paper over their bad code. Rather than write queries that do what they really wnat them to do, they just spend the extra overhead to have CASCADE's do it for them. I've also seen times where this was crucial to a db's function. Either way, it's something to consider. I've also never seen mySQL handle failure, or had to rebuild it after one. Whatever you usde, your strategy should account for this possibility, in any case.

  • HTML is only a Hiper textual markup language - you don't do logic operations with it (til now) - it's not a PROGRAMMING language. Just text formatting language!
  • ((HTML == text_formatting_language)!=programming_language)

    You can't put logic on it. Not without JavaScript, etc....
  • well of course PHP will fun faster than perl *as CGI*. use mod_perl and be happy. PHP is a pretty close imitation of perl, but perl has a much more complete, mature and flexible programming environment. the main advantage of PHP is that, by being simpler and smaller, it's easier to start working with. for a complex site, where you want a fair amount of real programming on the backend, I'll take perl over PHP anyday. mod_perl has many modes of operation; the simplest (Apache::Registry) emulates CGI scripting without much of the fork/compile overhead. Embperl lets you put perl inside the html (this is the way PHP does it too), and you can write complete handlers if you want too.
  • What about other (free) HTTP servers?
  • While it is acceptable for some minor stuff there is no point in using it once you hit something larger. I wonder why people don't use Sybase - it is free for production, very nice database that can handle a lot of stuff.Extremely easy to use, programm - there is excelent PHP3 support (well , as good as PHP3 offers - generally database interface in PHP3 is one of the most idiotic designs I ever seen)
  • with linux caching, this isn't really necessary. with enough memory, the whole thing will be in memory anyway
  • By way of general-purpose web-db tuning advice:
    • There's no way to have a well tuned system here if the db isn't well tuned, especially as the db grows in size. Make sure that all of your queries are as efficient as possible. Check to make sure that queries are against indexed columns whenever possible. Use the "explain" feature of the server to check the complexity of the queries you're passing.
    • Compared to static page delivery, just about any parsed HTML is Evil and Bad for your performance. Limit db lookups to pages that truly need it. IIRC, Slashdot handles this problem by having the front page re-generated by a cron job every so often. Once created by the script, it's just a plain ol' static page.
    • Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.
    At this stage, you've probably heard all this advice before, but one repitition of the obvious never hurt anyone. Hope it's helpful.
  • Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.
    Not only that, turn them off. (AllowOverrides None, IIRC)
    Er. Yeah. That's what I meant. :)
  • Or just change webserver.
    Some (I use Roxen Challenger [roxen.com]) use a single-process approach.
    They "compile" and then embed your scripts into the main process, and so you save time because you don't need to fire up the interpreter.
    Also, because of the long-lived, single-process approach, you can share the DB connections among your scripts, and most of all cache
  • Anybody know anything here? It seems to me that mSQL is a bit faster for simpler queries (esp no joins), but that's jsut an observation.

    --Andrew Grossman
    grossdog@dartmouth.edu
  • I'm sorry, what do you write your programs in? Do you just type in the hex codes for the binary executable?

    If not, you're probably writing it in text. And before you start some "it's not real programming unless it's compiled" rant, tell it to a perl hacker...
  • by wintahmoot ( 17043 ) on Sunday August 29, 1999 @05:39AM (#1718718) Homepage
    I haven't read all of the previous comment, so it may well be that this has been posted before.

    Okay, this is how I generally do it. First of all, I suppose that you're using Perl, so these tips are for a Perl/Apache/MySql environment.

    1) Use mod_perl so that your script doesn't neet a whole perl compiler for each separate instance in memory. The performance boost is just incredible...

    2) Use Apache::DBI. It will prevent your script from connecting and disconnecting your DB each time it's called and rather use a persistent database connection. Great for performance.

    There are some other tweaks that you can do. If you're interested, just let me know [mailto]...

    Wintermute
  • by rde ( 17364 ) on Sunday August 29, 1999 @03:34AM (#1718719)
    There are ready-made solutions out there such as E-smith [e-smith.net]; you can download a cd image (or even buy the cd), and it'll install the system with extras built in; it's designed to be an 'out-of-the-box' sorta thing.
  • Yeah, sure.. damn stupid retard spammers.

    "Only one thing is for sure in the Universe: me"
    --Corndog
  • If Apache is using mod_perl to serve dynamic content, how does Apache pass the page to Squid? Does the original HTTP request go to Squid on port 80 and then Squid hands the request off to Apache? Should Squid be on a firewall-like machine in front of the Apache machine?

    User computer --> Squid server --> Apache server --> DB server?
  • If your data is so important that you need RAID 5 on your servers, why are you using MySQL, which does not support database transactions? I admit I have not used MySQL, but the impression I got was that it was a little "fast and loose" with regards to reliability. Did you consider any databases that support transactions like PostgreSQL or Oracle? You seem to have really done your homework and I'm very interested to read what you've found out!
  • I've got an application with a fairly small number of web pages, a number of which require very simple database access. I expect the database access to be fairly minor (it's just a couple of small sections of the website). We've decided to use Red Hat 6.0, Apache, PHP3 and MySQL.

    Is it faster to put Apache and MySQL on separate Linux boxes, connected via 100Base-T? What sort of performance hit would we get if we put it all on one box? What about one box with double the RAM? Thanks in advance for your help.

    Ryan

  • And how many NT servers do they need to handle the load?
  • Yes, I forgot about www.apache.org [apache.org]. Indeed, their server run FreeBSD. In addition, FreeBSD is (or at least was the last time I checked) their main development platform. As for that M$-owned company, I did not include it due to flamers/ac's/etc.
  • Yes, you can get them precompiled from TcX. I was referring to the binaries precompiled from Redhat (I have no idea if they are TcX's or not). My sole purpose was to emphasize the performance gains you can achieve using pgcc and having the server statically linked. In addition, you may get even better optimization using a newer version of pgcc than TcX used.
    OTOH, I'm sure that the binaries from TcX are probably fully optimized and would be the best source if you didn't want to or are unable to compile them yourself.
  • Even though I have very little respect for AC's on Slashdot, I will take that as a compliment. It's almost fun to watch the flame (read: dick-size) wars...though it tends to get old quick.

    Through simple (and more complex) testing, we found what works best. Personally, I wanted both servers to be FreeBSD, but we found that Linux had a significant advantage when used as the SQL server (see some basic test results here [fxp.org]).

    You don't need a PhD to figure out that you should use what works best. I trust FreeBSD implicitly with the web-serving because of it's stability and speed WRT web-serving. OTOH, I trust Linux for the SQL server due to it's stability and speed WRT the SQL server. As stupid as it may sound...use the best tool for the job. Both OS's have their strengths; people should be emphasizing what each *CAN* do instead of bickering over what the other can't.
  • I'm pretty sure I read somewhere that FreeBSD was their main development platform, but I sure as hell cannot find it now (though I will continue to look). OTOH, according to Netcraft's [netcraft.com] survey [netcraft.com], the majority of their systems are FreeBSD.
  • by platinum ( 20276 ) on Sunday August 29, 1999 @06:33AM (#1718729) Homepage
    First of all: IMO, if you have to ask how to optimize your company's equipment in a forum such as this, you need some real help (perhaps of the mental variety). There are a plethora of web sites on optimizing systems. OTOH, I might as well share our experiences.

    Our company uses Apache, MySQL, and PHP extensively (and exclusively). You can't beat the price/performance ($0.00 / excellent == great value). Thorough our research, we settled with the following combination:
    • Web Server: FreeBSD 3.2-STABLE with Apache 1.3.9 / PHP 3.0.9 on a PII-400 w/128 Meg RAM, IBM 4.55G U2W Drive. Due to FreeBSD's proven track record for Web/Network performance, stability, and security (e.g. Yahoo [yahoo.com], wcarchive [cdrom.com], and others), it's a natural.
    • SQL Server: Linux 2.2.x with MySQL 3.22.25 on a PII-400 w/256 Meg RAM, IBM 4.55G U2W System Drive and a Mylex AcceleRAID 250 w/4 IBM 4.55G U2W Drives in a RAID-5 configuration. Linux was the obvious choice when considering MySQL performance and driver availability wrt RAID controllers.
    Optimization suggestions:
    • Apache: Ensure you have adequate spare servers to handle the connections (StartServers, MaxSpareServers, MaxClients, and MaxRequestsPerClient in the config); nothing sucks more than clients not being able to connect. Also, if you are using embedded script of some sort (PHP, Perl, etc.), use modules compiled into Apache (mod_perl, etc.); this should significantly increase speed and decrease the overhead of reloading the module for each access.
    • MySQL: Tweak the applicable setting as appropriate. We increased (usually doubled in most cases) the following: Join Buffer, Key Buffer, Max Connections, Max Join Size, Max Sort Length, Sort Buffer, and Sort Buffer). If possible, depending on the amount of data, get as much memory in the system as possible. If the OS can maintain frequently used data cached, disk access won't be required which significantly increases the speed of queries, etc. In addition, get rid of that pre-compiled MySQL and compile it yourself. If possible, optimize using egcs/pgcc for your platform. Also, compile mysqld statically; this will increase it's memory overhead a bit but can increase it's speed by 5 - 10% by not using shared libraries.
    • Storage: For optimum speed, use SCSI (of course). For our data, we require RAID 5 for redundancy. If that is not required, RAID 0 (striping) can be used for increased speed. The optimal way is to use hardware RAID (external RAID or RAID controller). Luckily, Linux has drivers for quite a few different RAID controllers that are available for a reasonable price.
    • Linux: Beware of Redhat's security problems, disable all unnecessary services, et. al. Seek out security-oriented and Linux performance-tuning sites for more suggestions.
    • General: Don't skimp on hardware. A cheap component, be it a drive, network card, motherboard, or whatever, if it fails, will cause unrecoverable downtime. We decided on Intel NL440BX boards (serial console/BIOS support is nice), PII-400's, and IBM SCSI drives in both boxes. If one box were to have a catastrophic failure, the other is able to perform both webserver and SQL server functions if necessary. We can also simply replace a failed component with one pulled from a similarly-configured non-production (test) box, or just swap boxes altogether.
    Both Apache [apache.org] and MySQL [mysql.com] have good sections on performance tuning. Do not be afraid to RTFM.

    Any questions/comments can be directed to me. Flames directed to /dev/null.
  • ...or a STACK of drives, in the old school modular sense.
  • by amon ( 24507 )
    RAID 0+1 is an absolute must if you want to seriously serve anything, Linux or not, RedHat or anything else.

    Nice starting point if you are on a budget:

    Software RAID mini-HOWTO [unc.edu]

    Also take a look at:

    Linux High-Availability HOWTO [metalabhttp]

  • Is there any advantage in Linux to RAID 1+0 over a three or four drive mirror set? (Linux supports more than 2 drives in a mirror, and reads round robin from them.) So, would 3 or 4 drives in a single mirror set generally be faster but have less storage than the 1+0 setup? Especially for a database with a majority (>90%) of reads.
  • Agreed. Using Apache modules via mod_perl, you can avoid the overhead involved in forking a new process, and you can maintain persistent database connections via Apache::DBI. Definitely get the well written O'Reilly Book "Writing Apache Modules with Perl and C". You can use as much or as little of the Apache API as you want.

    If you don't want to learn how to construct a module, but would rather stick to the CGI protocol, mod_perl can still help you through Apache::Registry. It keeps your cgi scripts precompiled and ready to go, and you can still take advantage of persistent database connections. The downsides? Increased memory consumption for each httpd process, and more attention must be paid to initializing variables that are no longer wiped clean between requests.

  • Some good suggestions out there, but one thing hasn't been mentioned. It's good to test the site with some kind of load generation tool (I think there's one at apache.org) when you're trying out different configurations *before* you go live. Every site is likely to be optimally tuned a bit different.

    Also overlooked is possibly tuning the filesystem for caching and the like (file descriptors) and networking (maximum connections).

    Possibly most of all, when I've seen performance problems, it's been due to how the code was written :).
  • Agreed.. I think the only flamebait posted on that post was the moderators comments.
    --

    A mind is a terrible thing to taste.

  • Why compile MySQL yourself? You did not suggest anything what TcX does not do already.

    The binaries (Intel) you can get from www.mysql.com are already compiled with pgcc -O6 and statically linked.

  • Memory is pretty damn cheap -- I've been running my web server off a ramdisk. Archive your web server in a tar ball then just expand it onto the ram disk... just don't put your db there :-)
  • Postgres is totally free and supports transactions. It might not have the performance of Oracle, but it doesn't have the cost of Oracle either. :-)
  • MySQL and PostgreSQL seem to be the two main backend db engines discussed here. Does anyone know if any comparisons exist between the two that cover their use (by a commercial organisation)?
  • Since when is PHP a close imitation of Perl? It may be similar in syntax, but that's about it. PHP has one purpose: server side scripting for web pages. There are a bunch of functions and modules which might not be so "web-related" (yet they are still useful in certain situations), but the main focus is still on server side scripting. Therefore, it is not merely a close imitation of Perl.
  • Well, if dynamic > static, also use squid. Why? whats the point of a cache for dynamic content? well simple.. slow readers keep memory hungry modperl or apache children sitting on the socket dribbling bytes.. Squid, being multithreaded, can take the dynamic content, leaving the children to get on with some other user, and being multithreaded, squid dribble it out at whatever speed the client can cope with without taking all your ram. Its really worth while.
  • The database-driven websites that I have The Houston Northwest Bar Association Website (with an attorney finder) [hnba.org] and the The C Bookstore [cbookstore.com] (plug!plug!plug!) are based on PostgreSQL [postgresql.org] rather than mySQL so I don't know how well these lessons apply, but I've learned that PostgreSQL has a considerable overhead to each query so one big query is better than lots of little queries.

    Of course, neither of those sites is particularly busy and I'm more proud of the management utilities than the sites themselves, but that's par for this course.

    The thing I did learn was that using perl and CGI is quite clumsy for this sort of thing. I eventually switched to PHP3 because everything goes together much faster. I don't know what it does to the performance, but since both sites are being served from the world's slowest Web server hardware (the database server is a 486dx2-80 and the database server has the HNBA website on it but the C Bookstore Web server is the 5x86-120 that I use for most of the four dozen or so domains that I host) and performance is not that big an issue, I'm not all that worried. It'd be nice if it got some hits, though.

  • It figures an eponymously named anonymous loser like yourself would have the balls to criticize someone else. I think it's fair to say that you don't post your online information because you haven't accomplished jackshit.

    You cynical bastards are quite amusing. It'll be interesting to see how cynical you are 20 years into your dead end careers. I'm sure the "HTML programmer" will be doing quite fine.
  • microsoft dot com
    guess what site gets more traffic :))

    (www.mediametrix.com says microsoft dot com does :)
  • I'm replacing some perl code with Java servlets using mySQL, Apache JServ, XML and XSL.

    Apache JServ allows load balancing (basically doing a round robin over each of your servlet engines). I've found performance goes up about 30% for each PC you throw into the mix (I've only been able to test this up to 3 PCs).

    FYI: I've found I needed servlet engines running on 2 PC's connected to 1 mySQL database to reach the performance of the perl app which stores its data as | delimited files.

    While this may seem pretty poor, using a database means that the scalability (for size of data) is going to be a lot better than the file solution and the servlet solution used XSL which gives us a lot more flexibility over the HTML that we generate (basically each one of our users can have a completely different looking site while running the same app as all the others).

    I'll be posting some benchmarks at http://objexcel.com in a few days if anyone is interested.

    Peter
  • >welcome to the commercialization of linux pal, it's downhill from here.

    That's rather cynical, as we port various Linux applications to QNX it's in our favor to document what we've found to improvde performance. The fact this documentation also helps Linux is just part of the benefit of open source, itself.

    If a company has gone through its paces to approve using Linux it's only logical that the people looking for all that free support will also contribute to it.

    -From Up North
  • Doesn't inlining of code occur at such a high optimization number - mind you I noticed that MySQL does this anyways...so at least this is half the problem - time for Apache - heh heh.

    Interesting idea...
  • This talk of HTML programming reminds me
    of the bright light who suggested, in response to a "rewrite the browser in Java" thread on the mozilla.general newsgroup, that it would be
    better to rewrite the browser in XML...
  • ...that talked to two replicated RAID arrays way on the back end...


    How did you implement this, may I ask? Particularly, how were the two RAID arrays mirrored, and how did the Web Servers/Database servers do I/O with them?

    Cheers,

    -NiS
  • No, they went back to Solaris, which is very interesting when thinking about disputes between Microsoft and Sun :-). By the way, it's no wonder that NT can't handle the load, considering some 40million users they have.

    I guess it sure was the pain for people at Microsoft to choose Solaris as their "OS of the choice". Hmm, it might be worth asking why they didn't stick to their great OS? :-)
  • Why not use multiple small servers than load
    balance them with eddie. check out the
    eddieware project. Cool thing is, it runs on
    FreeBSD AND Linux and it's open source :)
    http://www.eddieware.org [eddieware.org]

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...