Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Business

30+ GB Databases On Unix? 249

CaptainZapp asks: "A customer of mine runs a ~30 GB data warehouse on a Sybase SQL Server Database. Now their business requires a mirror of this database in a different location. An offer by a reputed U/X vendor for the hardware turns out to be about five times as expensive as when you get a reasonable x86 box, with the necessary amount of disk space and, say, 1 gig of memory. What does the esteemed Slashdot community think: Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?" He's not worried about "mission-critical" here, he's just wondering if it's possible.

"Now, the database is not mission critical (which doesn't mean it's not a major pain to reload it), so the issue if raw devices are supported is not too relevant. Further, and even more important, this is a major chance to convince a global player of the capabilities of Linux.

All that said, I' m aware that some of you readers have a quarter terabyte of disk space at your disposal. But that's also not the issue at hand. The question is if it is feasible to run an industry strength database of 30 - 40 Gb size with all its consequences (uptime, maintainability, dumps, etc...) in a Linux / Intel environment."

This discussion has been archived. No new comments can be posted.

30+ Gb Databases on Unix?

Comments Filter:
  • Win 2K doesn't have 2GB file limit so it is infinitely more scalable than Linux. Also, you get nice GUI tools to help you if you don't know a heck about databases.
  • by Anonymous Coward
    The new oracle pricing model isn't based upon users anymore. You basically pay $15 per MHz of cpu speed you have, at least for the standard edition. The enterprise edition rings in for quite more. For instance, if you have oracle running on a dual PIII at 700MHz, that's 1400 total cpu points.

    1400($15) = $21,000

    Support cost is an additional 22% of the total list price.

    $21,000(1.22) = $25,620 total price.

    That's per machine. It is ironic, that once you license a machine under this model, you can run as many instances on it as you like (barring performance). Seeing that this is a ~30G database, I wouldn't see that happening.

    The low end single user licenses your talking about are time based licenses on a single 500 MHz processor (if i remember correctly). So, with one of those, you could run as many instances you can squeeze on a single PIII 500 MHz for two years. I do reserve the right to be wrong on this though. The large sum pricing above is for permanent licenses.

    Cheers
  • by Anonymous Coward
    We've been running ASE on linux for almost a year now. We had some initial issues with our RAID array and performance issues that more memory solved. Since then, it has been rock solid.

    See the ase-linux-list for more info on large db's and raw i/o. mailing list archive [mathforum.com].

    However, replication server is not supported. yet. I think this is going to be a showstopper for you, eh?

    Again see the list for more info.

    http://www.sybase.com/linux/ [sybase.com]

    michael peppler's home page [mbay.net]

  • Uhm, actually, seek times are dramatically improved in most (if not all) RAID levels due to the fact that redundant data + multiple drive heads = faster concurrent access. Write times on the other hand suffer terribly under RAID 5 (since you have to write multiple copies of the data).

    I've never seen a production DB not run on at least a RAID 5 array (most run on something more serious, like netapp drive arrays, which are basically a RAID 5 type of system).

    Who moderated this misinformation up?

  • by Anonymous Coward
    Linux and 30Gb+ database work fine. In my company we are using both oracle on linux and oracle on sun. We have databases running on linux and Oracle 8i that are more than 200Gb, running on a dual PII and it really work flawlessly, we are not using raw partition but a regular filesystem mounted from a raid 5 volume on a mylex adapter. We are extremely happy with that, and we are using it on a extremely important production system for the company. We are not using sysbase for linux, but I have some friends that are using it and they are very happy with it, so my guess is that it should work fine for you. After that, for the kind of hardware you need, well, it all depends of what kind of traffic you have, but if i were you, i'll definitly go at least with a dual-PIII-something and a raid 5 card like mylex, and a couple of 9 or 18Gb scsi harddrive.
  • If at all possible, I'd recommend getting some 64-bit hardware. Probably an Alpha-based system. Next, get a decent filesystem like ReiserFS [devlinux.com] or Global Filesystem [globalfilesystem.org].

    If you are running on x86 hardware, there's not telling if the accesses will be capable of reading large files (>2GB).
    --
    Ski-U-Mah!
  • Linux/IA32 probably not, at least under e2fs as you'll likely hit the 2Gb filesize limit, depending on how the database engine involved implements storage (Oracle using its own data partition in "raw iron" style?). Linux on other architectures, specifically the 64bit ones (Alpha, Sparc, Sledgehammer and IA64 before long) would probably be fine.

    I hate x86 as much as the next guy, but wouldn't file size limitations be an issue with the operating system or filesystem, rather than the CPU architecture?

    --

  • BTW: Before I get flamed, the Hotmail/FreeBSD thing I remember from somewhere, but I can't remember where. I do know its NOT on an NT box, which basically leaves UNIX.

    My understanding is, shortly after Microsoft bought Hotmail, they send in their engineers and tried to convert it to NT. After awhile they gave up and left. They tried again several months later, with similar results. NT won't do it.

    Here [netcraft.com] is the NetCraft query.

    --

  • Linux uses the native word size of the machine for file offsets, so on 32bit architectures, file sizes are limited to 2GB, while on 64bit archs, Linux can handle files up to 8EB (1 exabyte ~= 1 million terabyte).

    Shortly after posting, something like this occurred to me. Not something I know much about though; thanks.

    I want an exabyte of something.

    --

  • You'd also want RAID 5, preferably hardware which is supported by Linux

    What kind of advice is this people are giving???

    RAID 5 is real slow for small writes common with a database. You have to first read the whole stripesize (much bigger than the oftentimes single block to write) from all disks, calculate parity and write the small changes back (data + parity). What you want is mirroring, RAID 1, which won't decrease write performance to a crawl but keep your data safe.

  • I've had great luck with 11.0.3.3 on Linux, but I'm not doing anything serious with it yet.

    The original 11.0.3.3 could use both files in the file system and raw hard drive partitions to store data. Do not use files in the file system for your large database, because the limit is still 2Gig for a single file. Use fdisk to allocate a large block, then format it with Sybase.

    A fresh release of 11.0.3.3 is available at linux.sybase.com that addresses all sorts of bugs and caching issues. I suggest that you start with this product. If you can make it work with a free product, you have lots more options - you can have two backups on the free version, and one on the 11.9 supported version.

  • When you have Sybase configured properly, you can have only a single UNIX process acting as your database server (if you don't run the backupserver when you aren't running backups). If you have SMP, you run one more Sybase server process per processor and they communicate with standard IPC. Installation is tricky for a novice, but the tools work as advertised. Sybase uses the RPM format for their Linux installations.

    The Oracle installer (written in Java so you must have the blackdown JRE - it is just sick and wrong) commonly fails when configuring a database instance. Yes, there are workarounds available, but why not write an installer that works properly? When you have Oracle running, it lights up your process table like a Christmas tree - at least 4 server processes, plus some sundry rubbish.

    A UNIX admin who admires efficiency will be happier with Sybase.

  • Your history is so far out of whack that I can't really address it... suffice to say:

    • BSD and AT&T UNIX shared stuff from Version 6 through Version 8, IIRC, but SysV was developed afterwards by AT&T in conjunction with Sun
    • SCO now handles SysV, and the Open Group handles UNIX
    • BSD and SysV are the two major strands of UNIX, not SysV and "SrV"
    • Linux is not UNIX and neither are {Net|Open|Free}BSD, but they might as well be
    This [google.com] might help you out a bit...
  • I worked at Nielsen for a bit, and (IIRC) they had around 10-12 TB databases, using Sybase under sun hardware (E10000s and others).

    Because I had a half-dozen Sybase SA's within a stone's throw of my desk, I used Sybase for my personal database at home (it was only about 160M).

    Sybase works really well under Linux. I'm pretty sure you won't need to worry about file size, because I ended up having several files for the database (my database was initially too small, and I just initialized a new disk and added it to the current database to add data and log space).

    lance
  • so I'm not sure why your jaw is on the floor.. What are the benefits of local storage?

    In this scenario, I'd think a gig interconnect or two (on a dedicated storage network) to a NetApp might be fine. Probably faster than local storage (assuming you have a better subsystem on the NetApp), and you contain your storage concerns into a dedicated machine, rather than having to deal with host storage. Adding extra scsi host adapters while the db machine is active is a dance any sysadmin would surely avoid.

    And while you're at it, why not have two NetApps clustered?

    cheers,
    -o
  • Of course that will run on Unix. Most large databases do and only a few clueless DBAs run that kind of DBs on NT.

    I'm sure they'll run also on Linux x86, but I'd be concerned about I/O... and other architectures like IBM or Sun can handle that in better ways (and, oh yes, you can still use Linux there).

  • Machines desinged to deal with very large databases tend to be more expensive than your average desktop, even if it has the same amount of disk. They tend to be built for reliablity and stability and speed. All of which cost money.

    My Advice don't skimp on buying the box, you will probably loose anything you save in admin costs on a cheap and not very good box.

    The Cure of the ills of Democracy is more Democracy.

  • It's only "Silly" until your UPS dies (or the card fails or your SCSI bus resets) while there are cached writes.

    I would be under the impression that if your UPS going to die it would let you know through the power control protocols. Unless you mean if the UPS explodes unexpectedly, in which case I thought the battery kept cache data, not state data, which was the reason for my "silly" comment. If it keeps state data (a transaction log if you will) then I'm all for it. :-)

    Cache will alleviate the performance problem for brief, small transactions.

    Which was exactly the context in which I was speaking. The parent to my reply had stated that the bulk of DB transactions were small and that the multiple-write nature of RAID5 made it a performance bottleneck. I had said that a large write cache would allieviate that.

    If you're moving more than 256MB through the controller (in either direction, remember that reads consume that cache, too) in less time than the disks can service it, then your I/O's become as slow as the disks. This is unavoidable and unfixable.

    Agreed. But then you're back to square one anyway, with the system (usually) being faster than the bulk storage, which is why you have a small but fast disk cache, a slower but bigger controller cache, and a slower yet but bigger filesystem cache on the OS. Each time you step back from the hardware you get a larger cache. System memory is slower than the fast SRAM on the disk cache, but if the memory has it it's a ton faster than actually waiting to get the drive to give you the data (and waiting to get it over a 16/32-bit bus

    RAID5 is best-suited for read-intensive environments, or cost-sensitive customers. It is not a high-performance solution. As others have said, RAID0+1 (striped mirrors) are the answer if you want fast and safe instead of cheap and safe.

    I'll state again that it depends on your situation. No need to spend a pile on 30G SCSI-II UW disks for a database when you're doing many small transactions. Better to get a few smaller SCSI-II UW disks and RAID-5 with a large cache. There's the ultimate, then there's the practical. :-) The lines between which depend on the pocketbook and the application.

  • RAID 5 is real slow for small writes common with a database. You have to first read the whole stripesize (much bigger than the oftentimes single block to write) from all disks, calculate parity and write the small changes back (data + parity). What you want is mirroring, RAID 1, which won't decrease write performance to a crawl but keep your data safe.

    Personally I don't like having two very large disks around. Give me a half dozen or so smaller ones.

    Also, Most hardware RAID controllers have a decent amount of cache with them. The DPT [dpt.com] controllers I use can have up to (I think) 256M of ECC cache RAM and optionally battery back it up (silly IMO). That'll fix your performance issues on RAID5.

    I think that RAID5 is a good idea, but YMMV.

  • by BadlandZ ( 1725 ) on Wednesday July 26, 2000 @02:00AM (#904995) Journal
    I have absolutely NO experiance with Sybase w/ Linux, but Sybase has claimed they support Linux, and are planning on being at Linux world, so it's worth calling them about it. (They seem to be trying to hire Linux techs pretty agressively!).

    SQL database at 30G, sure. I would say call Sybase Inc. first, then VA Linux second, and get the answers streight from the people who are most likely sure to give you a usable product. Get your prices, then compare.

    I'd be more worried about the differances in _how_ your going to mirror the data (connection speeds, transfer methods, how frequently) and that Sybase doesn't garble things when going from a database on one OS to another (unlikely, but possable).

    I'm sure Oracle for Linux will be mentioned, because there are many claims that it will handle such a situation. But, your problem there is going from Sybase to Oracle, not from another OS to Linux. Keep in mind, not all "SQL" databases are identical, the SQL may be, but the extentions provided by the manufacture won't be.

  • As many other people have said, 30Gb is nothing special volume-wise on 'big' machines, like Sun's.

    30Gb is probably at the top end of what you could expect to put on an x86 box and so the question is, what do you want to do with it? If you're just storing the data and doing a few simple queries, you should be okay, although you'll probably want more than a gig of memory.

    If you're doing heavy duty processing with many users then forget it. It's not a problem with Linux, but the hardware. (Yes, Linux will run on a mainframe but you can't get Oracle/Sybase/Informix on it.)

    The software is less of an issue. Any of the big commercial databases would do the trick (I prefer Oracle, but then I wrote the Oracle on Linux Installation HOWTO -- URL above). MySQL has no transactions or referential integrity, so even if it could handle the volume it wouldn't be appropriate. Don't think I'd trust PostgreSQL, either.

    Bottom line, I think you'd be cheaper with the expensive hardware in the long term.
  • Take a look at Deja.com (aka deja.com)
    All of that is run off of an oracle database..

    The Database is HUGE!
    /dev/rd/c0d0p1 71706488 41278452 29710576 58% /v/10

    41GIG
    As long as you have the right indexes... you're all set..

    ChiefArcher
  • I think this really comes down to what you are choosing to have as your true goals. If you want cost, speed, redundancy, scalability or any other factor.

    You need to decide what things are important to you and start making choices.

    In the end, you will need to trade off all of these things to find your solution.

    Personally, I do a lot of work with banks at the moment. They want brand name, proven tech. Not necesarily the latest greatest. On top of which, they are willing to pay for brand names. As a result, I would spring for a RAID tower coupled with a Sun box running Oracle. But, if I wanted cost, I would probably pick up a VA box or custom built with a RAID tower and run linux with oracle or maybe try postgresql.

    It all ends up being a trade off.
  • Are you seriously suggesting running Oracle on an NFS filesystem?

    *jaw drops*

    I would have to recommend against this. By buying hardware RAID and an appropriate filesystem add/on (e.g. Veritas File System) you can get all the benefits of the filer with all the benefits of local disk.

    --

  • While I agree that lately a lot of questions have been pretty brain dead, I think a lot of these questions have been pretty good discussion starters and I get a lot out of the responses from people who either know what they are talking about or know how to look things up and provide a reference URL when they respond. There are usually at least a few of these people replying to most questions.

    I do ask people posting replies to avoid posting anything if you don't know for sure or you are too lazy to check your facts before posting. There are far too many people writing uninformed opinions and using phrases like AFAIK and IIRC to forgive themselves for not checking their facts before posting.

    Sorry this post turned into a rant.
  • My Advice don't skimp on buying the box, you will probably loose anything you save in admin costs on a cheap and not very good box.

    Yep, couldn't agree more. Don't even think of using anything other than hardware RAID for something like that, too. You won't regret it.

  • At work a group implimented a Oracle data wharehouse on Sun equipment running solaris. The database sizes are expected to scale to 1 TB. But if I am remembering correctly the cost on the equipment/oracle was about $2.0 million.
  • Using raw disk partitions is not the same thing as Linux's "raw block device" support, which lets you access block devices without going through the buffer-cache layer. Some database programs want to use this so they can do their own caching, etc. However, lots of things access block devices, e.g. /dev/hda1, for example mkfs, and fdisk. The raw block device support is still very new, and was developed by Steven Tweedie. It uses kiobufs to do zero-copy IO. You would know about this if you were at the second memory management talk, given by Ben LaHaise, at the Ottawa Linux Symposium last week :)

    So, unless it's the database's release notes that say not to use whole disk partitions, there should be no problem. The kernel lets you access a disk partition as a big file very easily.
    #define X(x,y) x##y
  • Linux uses the native word size of the machine for file offsets, so on 32bit architectures, file sizes are limited to 2GB, while on 64bit archs, Linux can handle files up to 8EB (1 exabyte ~= 1 million terabyte).

    Recently, large file support on 32bit archs has been developed, but it isn't in the main kernel yet, AFAIK.
    #define X(x,y) x##y
  • If only the US Navy had your attitude. The whole "Gunship dead in the water" episode could have been avoided with a simple "Well, don't enter a zero there!"

    I know it must be hard to be a non-windows hater (that's non-"windows hater", not "non-windows" hater) and listen to the crap that's flung about around here, but you've reached an absurd level of defensiveness. You're defending software (an application AND an OS) that crashes (according to one report) when the user makes a simple mistake and placing the blame on the user. An application should not crash when given invalid input. It should notify the user. An OS should not go down when an application misbehaves. It should kill the app, perhaps generate a core file, and keep on chucking.

    Now, Linux and other UNIXes are not without their own problems [cix.co.uk] in this regard, but at least the people responsible don't respond with "don't do that" when told about it. Neither does Oracle, I bet, but you do. You and Microsoft.
    --

  • You must be smoking crack. Raid 5 is optimized for throughput at the expense of seek time, and DB don't give a fuck about throughput, but on the contrary live and die because of seek time. NO RAID 5 FOR DATABASES. Repeat 100 times.
  • The kind of database you're talking about is a far fetch from what most people here will need, including the original poster. I don't doubt that very specific cases require exceptions to the rule; the rule being, no RAID-5 for databases. That's not just for simple web-type databases; actually it's stated in the O'Reilly Oracle DBA book, which addresses much wider needs.
  • In the Linux install notes, they claim that for optimal performance you have to split the Oracle install on 4 disks; which implies no RAID.

    But hey, am I supposed to have higher journalistic standards than the slashdot editors? Eh eh eh eh.

  • And one of the discs could also be a tape drive. Yeah.
  • seek times are dramatically improved in most (if not all) RAID levels

    Seek time is not going to be any better in mirrorring, for one. The two heads reading the same data won't go faster than one head, will they?

    Then for striping, this usually won't make any kind of difference since data access will be randomly spread over the disk. So there you go.

    NOW smartly organizing the database WITHOUT striping amongst several disks *will* make seek times faster, actually, it will require less seeking. A typical Oracle installation (as recommended by Oracle) will have for example the software on one disk, the indexes on another, and the actual data on a third.

    Now since one DB transaction requires typically at least one index lookup and one data retrieval, which are unlikely to reside close to each other on one disk. Now when they're separated on two disks, subsequent queries will have less seek time .

    Now, since I was right, will you give me my karma back? ;)

  • ...[S]ybase will NOT load database dumps made on different platforms...

    BTW - This is the case for most (all?) RDBMSes. The short answer is that binary database files (they store the data) are platform dependant.

    Some RDBMSes, like Oracle, allow you to specify the block size of your binary database files. But even if you have 2 files with the same block size, they may not be transferable between databases.

    --

  • Well, when you're at the 30 gigabyte size, your options as far as remote replication are a little limited. For example, its not enough to warrant the sort of large-scale storage array that an EMC Symmetrix would offer that comes with built-in remote replication (SRDF).

    What you could use is Veritas Volume Replicator. It runs as a service/daemon on your box and mirrors every write over IP to another box. It can be configured to do it synchronously (the I/O blocks until the remote I/O completes) or asynchronously (higher performance because there's no delay, but you run the risk of data loss when the db server goes down).

    Unfortunately, Veritas VR is not available under linux - I think you said it wasn't under linux anyway, but a lot of people are offering linux solutions.

    Also, given that your database is only 30 gigabytes, do you actually do a lot of writes? Realistically, if you only do a couple of hundred inserts an hour, you could just, every hour, manually insert the changed records into the remote db. Heck, do it every 5 minutes. I'm not familiar with Sybase, but on Oracle, you can just run the redo logs on the remote data center. That's going to be the cheapest option, and the most linux compatible.

    Anyway, if this is really enterprise-level, spring for Veritas - their stuff is expensive but really good.

    Cheers,
    Matt
    Matthew J Zito, CCNA

  • At my work, we run two large databases, one that's about 95 gigs and the other 160 gigs on linux. Now, we run our production db on solaris, but the data warehouse and the ticketing db are on linux with oracle. We've had great results with linux for the most part - the biggest problem is that the documentation by Oracle is not as good for linux as it is with Solaris, and its harder to find DBAs with Linux experience.

    I can't vouch for Sybase's stability under linux, but Oracle will do you just fine. Get a dual or quad-cpu box, depending on how much data you need to do, and 2 gigs of RAM either way.

    Matt

    Matthew J Zito, CCNA
  • burton:~ # ps aux | grep mysql
    root 232 0.0 0.0 1904 0 ? SW Jul20 0:00 [safe_mysqld]
    root 246 0.0 0.1 11244 48 ? SN Jul20 0:00 [mysqld]
    root 254 0.0 0.1 11244 48 ? SN Jul20 0:00 [mysqld]
    root 255 0.0 0.1 11244 48 ? SN Jul20 0:00 [mysqld]

    Whats your point again?

  • I would stick with Sybase
    Absolutely. I spent several months last year adding Oracle support to an application designed around SQL Server and cross-vendor development is really something you should avoid if you can. Leverage your existing DBA knowledge and you can probably use one DBA for both sites. If you do go with another vendor, you'll wind up with another DBA either on salary or on retainer.
  • yes, you can get Oracle for x86 Solaris.
  • I agree with you. RAID 10 will give a nice combination of safety and performance. If your crazy (well, for just 30 gigs, perhaps not *that* crazy) there's the new Adaptec UDMA 66 RAID card which I think may support RAID 10. It definitely supports RAID 5. Actually, I wonder how bad RAID 5 would actually be with one of those cards and five UATA 66 Maxtors with the 2MB cache on them. They are pretty fast drives....

    I don't know if the Adaptec card has it's own caching, but it would be very cool if it did!
  • BSD is not 'derived from System V' - it forked off from Unix earlier, maybe version 7 Unix.

    Also, Solaris 2.x is based on SVR4 (System V Release 4) - SVR4 is quite upward compatible with SVR3.x.

    And Solaris is not spelt with a 'u'...

    Linux was not 'built on Posix' (not a meaningful term, Posix is an API spec) but I believe Linus tried quite hard to conform to the 1003.1 specs, and the bash people have tried to conform to the POSIX shell specs.
  • Your subject is patently false. I defy you to show me where Oracle officially recommends against RAID.
  • by ansible ( 9585 ) on Wednesday July 26, 2000 @04:47AM (#905020) Journal

    The problem with a caching controller is that unless it's well engineered (with it's own battery backup), you more likely to run into filesystem corruption in the case of a power failure or OS crash.

    A standard filesystem (such as ext2) on top of RAID5 will never be fast for small writes.

    NetApps get around this because the WAFL filesystem is explicitly designed to sit atop a RAID4 drive array.

    And there is a difference between RAID10 and RAID0+1.

    RAID10 is a stripe of mirrors. Each pair of disks stores the same information (RAID1), and a stripe is created over those mirrors. This can tolerate multiple drive failures as long as at least one drive from each mirror is working.

    RAID0+1 is a mirror of stripes. Two stripes are created(RAID0), each with half the total of disks. These stripes are then mirrored(RAID1). The problem here is that if a drive goes out, it takes out the entire stripe. If a drive in the other stripe goes out before the rebuild is complete, you're hosed.

    Normally RAID systems (like RAID5) can't tolerate more than 1 drive failing at the same time. However, RAID10 provides more protection than RAID0+1, at the same price.

  • ``With Intel, isn't the limit still 4CPU for the latest generation?''

    I don't think so. Didn't the recent benchmark comparing IIS vs. the new webserver from RedHat run on an 8 CPU SMP system? You can get more CPUS... you just don't see them in the advertising aimed at Joe Sixpack. They tend to be just a bit on the pricy side.

    Cheers...
    --

  • ``Alpha is cheap. A reasonably good alpha is under 5000$.''

    I think you're talking about an Alpha-based workstation. No one's going to be hosting a 30+GB database on a workstation. They would be looking at a DS10 or DS20 at a minimum. Expect to pay something in the area of US$20K for a smallishly configured DS20.

    ``Storage will be a 1000$ more.''

    A whopping $1000 for disk space to host a database? Only if you plan on sticking the entire thing on a single 36GB drive which would be an inexcusable performance hit. And that would leave no money for any kind of mirroring.

    I guess this $6000 configuration isn't intended for a production system.
    --

  • ``Remember, raw devices also means only one file per disk.''

    Depends on your UNIX. Under Tru64 and some other Unices, you have storage management tools (under Tru64 there's Logical Storage Manager, for example) that'll let you slice up a disk into as many pieces as you like. You then access the disks through either /dev/vol/... or /dev/rvol/... (if you really want to use raw data partitions). Striping across SCSI adapters for better I/O performance is quite easy.

    ``P.S. $2K is way low for both the hardware and the database!''

    Agreed. Some of the dollar estimates that people are throwing around are fairly humorous.
    --

  • by rnturn ( 11092 ) on Wednesday July 26, 2000 @04:46AM (#905024)
    ``oracle uses its own raw partitions/filesystem to store its data. this speeds up oracle''

    It doesn't have to manage it's own disk space. And it may, under certain conditions, provide better performance. We have been moving away from raw data partitions. This after running some benchmarks of a large table residing on raw partitions vs. the same data residing in tables in a filesystem. The performance was actually better while accessing the data in the filesystem. We're talking 10+% better performance not just a few percent. Our experience, based on our benchmarks, and discussions with Oracle technical people, is that the preference for using raw data partitions was based on performance tests using older versions of UNIX and less capable filesystems. Of course, your mileage may vary.

    Aside from performance, if your database changes frequently, adding and deleting tablespaces is a major pain (with long downtime) when you're using raw data partitions but is a snap when you're using filesystems for data. If your database is fairly static raw partitions might buy some little bit of performance but, again, at the expense of managability. IMHO, raw data partitions just aren't worth it. Even if comparitive performance were a wash, the easier means of managing the database weighs in favor of filesystems.


    --

  • Yes, using a Network Appliance Filer is using Oralce with NFS - but this is a solution developed by NetApp in conjunction with Oracle, and is AFAIK the only NFS solution Oracle recommends.

    (No I don't work for NetApp.)

    We're considering Filers to replace local disk on some of our Sun 450s (running Oracle 8.1.6) at the place I work.
  • We here at Giganto Communications (name concealed for my protection) use Intel based multi-processor machines exclusively for all our important database needs. you obviously are in the minority when the largest communication companies in the world will do what you will not.

    as for Linux, no. We suffer with NT crashes, but we are sneaking Linux in the door, one server at a time, until.... well you get the picture :-)
  • think you're talking about an Alpha-based workstation. No one's going to be hosting a 30+GB database on a workstation. They would be looking at a DS10 or DS20 at a minimum. Expect to pay something in the area of US$20K for a smallishly configured DS20.

    This is UK price for DS10L. You do not need the expandability of a DS10 or DS20. Also AXP has even cheaper machines. Sold in the UK by evolution.

    A whopping $1000 for disk space to host a database? Only if you plan on sticking the entire thing on a single 36GB drive which would be an inexcusable performance hit. And that would leave no money for any kind of mirroring.

    You are right. Off by 2-3 times. Was thinking of an external IDE RAID to SCSI box. Works fast enough. Is cheap enough. If necessary mirror two or more at RAID0.

  • by arivanov ( 12034 ) on Wednesday July 26, 2000 @03:30AM (#905028) Homepage
    Very bad idea. Or maybe even "Stupidity is limitless"

    1. If you have not noted Oracle legal has walked around every single site that had Oracle vs X benchmarks (X=mysql, sybase, informix) and made them drop them. This is actually possible under the 8.0x EULA. Actually just read the EULA. It is a masterpiece in itself. You are not allowed to benchmark the product and not allowed to question the fact that it is fscking slow and not ANSI compliant. That is besides the fact that if I was you I would not buy something where the manufacturer intentionally disallows fair comparison with other products. It is enough to say fsck this at least for me...

    2. The original database is on Sybase. Sybase is at least more or less syntactically ANSI SQL compliant. Oracle is as far from ANSI as it gets. It will be a good guess that it will take you ages to port the bloody thing. And porting it will be more expensive than the "expensive" hardware.

    3. I would see if the database design is implementable under postgreSQL or MySQL on an Alpha. Alpha is cheap. A reasonably good alpha is under 5000$. Storage will be a 1000$ more. This is as much as an appropriate x86 box. Postgres does not have a 2GB database limit anyway as it splits database files. MySQL does not have this limit on alpha because the platform is 64 bit. Your problems are in the key limitation/lob interface for postgress and transactions for MySQL.

    4. If Neither of the solutions in 3 is implementable you have to open wide you wallet and buy informix for Intel or DB2 for intel. Both of them work and are ANSI compliant. In btw DB2 for Intel linux developer edition is free. Free period. No expiration. So you can actually see if the database will work. And they match Oracle on some benchmarks and DB2 beats the crap out of it when it comes to real scalability and clustering.

  • Oracle on Linux may be the best choice. DO NOT I repeat DO NOT skimp on hardware. Some of the Linux platforms advertised in Linux Journal would be you r best bet for industrial strength hardware.

    Much Success!!!!
  • by muchandr ( 12588 )
    will do your mirror asynchronously, with no impact
    on performace. Will your obese monster of a
    database do it? You can download from www.kx.com
  • I would skip the filesystem layer and run off of partitions (or /dev/md devices?) directly. Although I've only had 10Gb at a time going under it, and on actual partition devices, not RAID devices, (I am resource-underprivelaged, unfortunately), Adabas/D handles this very well. I'm sure other commercial databases would as well, if they are truly supporting linux rather than just paying it lip service =)

    I see absolutely no issues with pushing Adabas/D farther than I have, it has not had any issues with it whatsoever. Of course, if you used files on ext2 or reiserfs or whatever, you would have unnecessary slowdown and potential instability, use disk partitions.
    --
    Paranoid
  • by jetson123 ( 13128 ) on Wednesday July 26, 2000 @02:54AM (#905032)
    Of course, UNIX can handle it, probably better than just about anything else out there. Linux isn't UNIX, of course, and whether Linux can handle it is a different question. It probably can if you find the right software (I'd give DB2 a try).

    But why ever would you replicate a database to a different kind of server? If the original database runs on Sybase SQL on whatever, then the obvious answer is to replicate it to an identical setup. Anything else, whether mission critical or not, is just going to be a lot more work, training, and maintenance.

  • Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?

    MacOS and Filemaker.

    Honestly, though, of course "Unix" can handle a database this size... it all depends on what hard ware your "unix" is running on. Obviously Linux or *BSD on a 368, 486, or Pentium system won't cut it, but if you up your ante to a dual P-II or P-III system, or even a Quad P-II Xeon system (which should be relatively "cheap" compared to offerings from Sun and Compaq), you'll be well on your way...
  • In an interview with Linus himself he stated that he has an 8 CPU Intel....
  • True :)
    Above that... Linux is really a Unix clone. (People who call it a "work-alike" are just being cute... it's a clone thats all)
    Sun entered the market and really got it's fame with SunOs (a BSD based Unix clone).

    Linux is usually called a *nix not a Unix becouse it is not liccensed from AT&T or SCO. (Or anyone else who held the trademark)

    It should be noted that BSD and Solarus are Unix forks. The BSD dev group and Sun must maintain compatability by relying on documented standards just like Linux.
    [notied becouse some Unix people who dislike Linux will attack Linux becouse it is built on standards not on the accual code. The idea being that Solarus and BSD are the same code and by default compatable. This is false for the above reason. The below is just to extend the point nothing more.]

    Solarus and BSD are forked from diffrent code. BSD is from the original AT&T code later known as SysV. Solarus is from a total rewrite in the 1990s known as SrV. SysV and SrV are not compatable.

    So in reality Linux, BSD and Solarus are three totally unique (and multally compatable) operating systems. Linux being the only one of the three with no liccens to the name Unix.

    Over time many Unix clones were incorrectly called Unix. However this fact was less than noticable as forks and clones had no standards to folow and ended up pritty much being mutually incompatable.

    Linux was built on Posix the first effort to correct this issue.

    On a side note... Linux disordented me becouse I learned Unix on an AT&T 3B2/300. But Linux didn't thow me much.

    One gripe people have about Linux is that it is posable to write Linux only code that dose not work on BSD or Solarus.
    While true it is equally posable to write BSD or Solarus only code.
    It is an effort of the programmer to maintain portability. Failling that it dose not matter what operating system the code was made on.
  • BSD is liccesned from AT&T and thus is a Unix.

    Small issue :)
    I believe BSD was never cerifyed... It simply is by age alone
  • As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!

    My understanding is that Oracle can now use its own filesystem on Linux...I don't know very much about Oracle's properietary FS... but my thinking is that it would make life easier. I dunno. Anyone else know?

  • by FascDot Killed My Pr ( 24021 ) on Wednesday July 26, 2000 @02:04AM (#905051)
    The title and summary say "Can Unix handle it?" while the "below the fold" area asks "Can Linux/Intel handle it?".

    I'd say the answer to the first question is a resounding "duh!". The answer to the second is a resounding "probably".

    I found Oracle on Linux to be quite usable and nice (except for lame non-readline-enabled interactive tools) and fairly fast. But there is something...incongruous about spending $2000 on hardware, $2000 on Oracle and then using a free OS (that you WILL have to tweak to optimize).

    Other tidbits:
    1) Do NOT, I repeat NOT NOT NOT use Oracle on NT. The (evaluation) version I tried sucked BIG TIME. The bulk loader didn't properly support all the file formats it was supposed to and I was able to repeatedly crash the box by mistyping field names into the table creator GUI. Add all the problems of NT (no real remote management, etc) and you have yourselves the makings of a nightmare.

    2) Raw devices are for more than recovery. They also help in the speed department. If you are going to be loading 30+ GB of data multiple times (this is a backup, right?) you are going to want speed. IIRC, ~100MB took about 5 minutes to bulk load (raw, not insert) on Oracle for Linux. That's 25 hours of load time for 30 GB.

    3) Can't you take the backups from your primary DB and load them as restores to the backup DB? That would save tons of time and effort (up front AND ongoing).
    --
    Give us our karma back! Punish Karma Whores through meta-mod!
  • by Dacta ( 24628 )

    It's not really a platform problem. You might have to partition the DB into multiple files to get around the 2GB file size limit on Linux (I think Sybase can do that), but I doubt there would be any other real problem.

    Sybase runs on Linux, of course, so there is no problem there.

    I'd ask in the Sybase newsgroups about the biggest database they have seen on Linux - they have a good reputation for quick answers. (About the onlt good thing I have to say about Sybase, but still....)

    I'd be surprised if there aren't quite a few Linux DB's bigger than 30GB anyway.

  • I looked at this sort of thing over a year ago, and previous posters are right, 30 GB is kinda puny. Here are the headaches I had that I hope you can avoid.

    1. First and foremost, stack the box with as many SCSI adapters as you can. I/O quickly becomes a bottle neck on large DB systems. Also if you're doing Linux go with Linux's built in RAID, I hear it's faster than the hardware raid cards you can buy out there. That said be sure to get more than one of some hot processors, you're going to be using a goodly portion of one of them to do your RAID.

    2. A journaling filesystem would be good. I don't know of any available for Linux (except maybe XFS, what's the status of that?) you really really don't want to fsck your Raid 5+1 (Yes I said 5+1)

    3. Unless you have the funds to implement a slightly lower performance box, expect to be developing on a seperate instance on this same server. That means worst case another 30 GB of space for the new istance, which will also require a kernel re-compile to get the shared memory and semaphore settings right. (You are using Oracle aren't you? ;0)

    4. Better yet get your requirements up front for number of instances and design the hardware for that number + 2, and tune the kernel appropriatly. Whatever Oracle gives you for kernel parameters multiply them by the number of instances.

    5. Don't sweat the raw devices stuff. It's generally more trouble than it's worth. It makes backups harder, makes restores harder, and makes RAID harder. It's just not worth the headache.

    6. Invest in a nice DLT library that is supported up front. Get your backup scheme in place, even if it's just your DBA's writing dump files nightly. A good DBA can restore from a dump in a few hours, AND they can restore a dump of production to your development database, making those refreshes from production a fairly painless task (and management/developers/DBA's *WILL* ask for refreshes from production.

    7. DON'T considder RAID 5, onless it's 5+1. RAID 5 can be murder on DB performance, especially in a VLDB, where you perform inserts (it's a little less bad on Datawarehouses) Think 1+0 or 0+1, and span the + across multiple controllers/disk arrays.

    8. Don't skimp on your DBA. In reality most any competant SA can administer a DB *system*, sink any payroll money into a very good DBA, it will save you in downtime and calls to oracle later (You are using Oracle aren't you ;0)

    g:wq
  • does the indices fit into main memory? not close - a good fit is needed. so you need at least 50% more memory than the indices to be on the safe site. (you wanna do joins, etc ...)

    are there big tables, each one >memory/2 ? or are there 1000 small ones. (we talk about real mem here not virtual)

    the rules are:

    1) design
    2) choose hardware and software on the details of 1.

    sometimes a little redesign makes it possible to have more freedom on hard/software ...

    (the 50% i mention above, are a value form experience. the more flexibility you need the larger the real memory needs to be. having indices in ram and 50% of the memory to work gives you a fair amount of flexibility. driven by the needs of the application the % can be 20% too, it depends on how often you create entries, what and how often you look up fields and what joins are necessary to do that ... this gets really complex ... ;-)
  • you are totally right, i wanted to avoid your answer by my last addendum. i personally use very big tables with a few indices, becaue the data is structured like that, therefore the 50%.

    if thats the RARE application im sorry. i thought mine is pretty common ;-)

    my mail was mainly about thinking about the design, how you could use the memory you have for faster access and not about the exact %. (leaving everything to oracle is a non-hackers choice :-)

    but i stay with the rule, that the indices used by common queries should if possible stay in memory ... (complex situations make this of course impossible)

    ask slashdot: does anybody now have >1000 tables with >3 indices each ?

    :-)
  • by DamageBoy ( 28870 ) on Wednesday July 26, 2000 @02:03AM (#905057)
    Unix systems handle the largest databases known to mandkind
    as we speak.
    Databases managed by unix systems have been known to be in
    the vicinity of around 2-6TB.

    Your question seems to refer to Unix on x86 databases that
    have that size.
    Of course that running unix on x86 systems usually boils
    down to running Linux...

    Linux is officially supported by both Oracle, Informix and
    I think that even Sybase altough I'm not completely sure
    about that.

    Obviously running it on the same RDBMS would be an easier
    to accomlish, so you'd probably want Sybase to support Linux.

    You'd also want RAID 5, preferably hardware which is supported
    by Linux.

    You'd probably want to use some sort of journaling file systems.
    I myself have no problem trusting the beta versions of ReiserFS.
    I've also ran oracle on them witout any problem.

    If you feel reluctant in using bleeding edge kernel patches
    for a production environment, I can only recomend that you use
    SMALL ext2 partitions to avoid catastrophic FSCK times, and let
    Oracle / RDMS do it's magic in managing a single 30GB database
    over smaller files...

  • Oracle over NFS to a NetApp Filer would work fine on a Sun or such. But despite HJ's valiant efforts, Linux's NFS isn't there yet. Linux is getting there, but there are still real good reasons to go buy those Suns if you've got a big, mission-critical problem.
  • You can absolutely do this. Now, depending on which version of Sybase ASE you use, you may run into some dumb limitations. For instance, version 11.9.2 has a 2GB limit for the size of a device, so you have to partition your disks into 2GB slices, and distribute your database across multiple devices. I think they increased the size in ASE 12, but I haven't worked with that yet so I don't know what the limit is there.
  • I'm using Sybase ASE 11.9.2 as my company's database, and running it on Red Hat Linux 6.2. We've found that using raw partitions can work, but with this version of Sybase the largest you can get a partition is 2GB so you have to distribute your database across several devices. That's no big deal, though.

    Now, if you wanna talk about performance...get yourself a RAID and use a multiprocessor system. Sybase understands SMP systems and the RAID will help you on your I/O.

  • Are you seriously suggesting running Oracle on an NFS filesystem?
    Not only am I suggesting it, but in timing tests, Oracle is performing a little better over the Filer! We were previously using a locally-attached diff-SCSI Sun A1000.

    Oh, also, you can write your Oracle redo logs to the Filer, even though they recommend against doing so to anything other than flat disk or RAID 0/1. Why? Because the Filer uses a journaling filesystem in NVRAM, so the writes happen as fast as the wire (GigE in our case) can run.

    would have to recommend against this. By buying hardware RAID and an appropriate filesystem add/on (e.g. Veritas File System) you can get all the benefits of the filer with all the benefits of local disk.
    How long does it take to back up that local disk? For us, it takes about 2 seconds, and takes up almost NO STORAGE SPACE! The Filer has a feature called "snapshot" which is basically a copy-on-write filesystem. You tell it to snap and it comes back after a second or two. After that, you can always go back to that point in time and recover files on-line, without any sort of programatic interface (just filesystem access). There is even an add-on package called snap restore that will instantly restore the entire filesystem to that previous state....

    So, get this, our Oracle backup is: put all of our Oracle tablespaces in locked/suspended mode; call tell the filer to snap; unlock the tablespaces. Now if we ever need to restore, we just bring Oracle down, swap in the old data files, bring Oracle up. We can also do tape backups this way, as the Filer backup program uses snapshots. Thus, as soon as a backup is started, you can start writing to the data again safely!

  • by ajs ( 35943 ) <ajs.ajs@com> on Wednesday July 26, 2000 @02:14AM (#905064) Homepage Journal
    I run a database of this size, and it's not a challenge. Cost is very high, but that's mostly because a database of that size is one that you cannot afford to have to restore.

    I currently use a Sun architecture, but I know of sites that use Intel/Linux, HP PA/RISC and even (may all the little gods help you) Intel/MS/SQL server which does have it's place in non-mission-critical places where you're never going to have a good DBA.

    I can seriously recommend the Network Appliance Filer for back-end storage. Their claim that their network-attached storage array is faster than local disk sounds silly on the face of it, but there are good and valid reasons that it's true (mostly due to their journaling and caching strategy which is highly optimized for NFS). The Filer makes databases a lot easier to manage. For example, the Filer can make an online backup in less than 5 seconds, no matter how much data you have!

    Back to your original point: 30GB is small, don't sweat it. But, don't cut corners either!
  • Is that 2 GB limit a 32-bit limitation or is that limitation also present on 64-bit machines?
  • I work for France Telecom, as the SysAdmin for Voila.com

    We use Linux exclusively on our servers. (Well, except for one lil box running NT to interface with Reuters, because they refuse to make their proprietary client for Linux)

    Our current database is around 4 Terabytes. It sits on about 80 servers all running Linux.

    Admittedly, we use a custom database package, developed in house, and not an RDBMS, but when your dealing with such a specific dataset (we index web pages... thats it...) you don't need the flexability of Sybase.

    Then theres Google... How many thousands of Linux boxes are they running? How huge is there database?

    So yes, Linux is more than capable of handling a puny 30 gig database. Heck, I have more than 30gigs of data indexed on my HOME machine. (30gigs of MP3's all indexed and cataloged with Postgres) not quite the same as a "30 gig database" but similar.
  • Whever you build a database, you must at how it will be used before you make physical layout decisions. The Asker here specified a data warehouse, which to me implies a DB which will be written to once then read hundreds of times afterwards. With an R/W ratio that high, write performance is only a minor consideration compared to read performance. While RAID-10 would give great all performance, for read access it won't do an awful lot better than RAID-5, at just over half the hardware cost.

    So for a data warehouse I would not hesitate to do RAID-5.

    As for mirroring, I can't speak for Sybase, but Oracle supports a wide variety of mirroring and networked DB options. I would look into something akin to snapshots, which are read-only copies of a master database. Designate one copy of the DB as the write-to master, and snapshot it over. Of course, this all depends on why you're mirroring. If you are doing this for redundancy in the event of catastrophe, look at your loss tolerance and acceptible downtime. You could do something as simple as making a copy of the database remotely, then copying over your redo logs at every log switch. Then if your database fails, use the redo logs to roll your remote database forward, and bring it on line.

    World of possibilities.

    --

  • That's not what "not mission critical" means. Not mission critical means that it is okay if you have to do a reboot and the server is down for 10 minutes. It means that you don't have to have a redundent cluster to make sure that if one goes down you don't need to take the database ofline. Losing an entire 30GB database and having to reload it is unacceptable under any circumstances. Treating it mission critical usually means cost takes a back seat to having 100% uptime. I don't think that's what he needs.
  • You can throw up to a 64 mb EDO Dimm on them.
  • Just FYI: a 30GB database doesn't imply one file. I have a 10GB+ Oracle on Linux database right now; Oracle organizes data into tablespaces which contain one or more data files. The data files can be spread over any number of partitions; in fact for performance it's better to spread them over multiple disks.

    Now what I'm doing isn't mission critical, so I can't comment on that aspect of it, but I will say this: a 30GB database will certainly require more than 1GB of memory.

    --jb
  • Wal-Mart's Teradata data warehouse is one if the biggest (if not the biggest) data warehouses in the world. You can read about it at NCR's website [ncr.com]. In the article, they say it is 7.5 terabytes, but from what I have heard, they now have two warehouses that total 110 terabytes.

    It runs on NCR's 5200 system [ncr.com], which is based on Intel architecture. It scales up to 512 nodes, with 1-4 Intel processors per node.

    The operating system is NCR's MP-RAS (a flavor of UNIX that runs on Intel architecture). I'm not sure if it runs Linux ;-)

    *disclaimer*
    I _do_ work for NCR, but I just thought this was some neat information. I don't work in our data warehousing department. The system above would cost many millions of dollars, so it's out of the range of the average /. reader, and if you are going to spend that money on a data warehouse, chances are you are talking with NCR anyway.

  • by JonK ( 82641 ) on Wednesday July 26, 2000 @02:23AM (#905112)
    Bad Idea.

    Changing RDBMSs is a Really Painful Experience and one to be avoided at all costs if possible: it makes changing OSes look trivial (hell, even upgrading from one point release to the next can be a world of pain). If the data's already on Sybase then for god's sake keep it on Sybase. Go for Sybase on Linux, Sybase on SCO, Sybase on NT or whatever but remember: it's a RDBMS and the underlying platform is effectively irrelevant (pauses for flames as thousands of enraged Slashdotters start to spout off and steam at the ears)
    --
    Cheers

  • This was my thought when I saw the "Linux = x86 UNIX' posts.

    For those people, a lot of *NIX's are available for the x86 platform:

    • Most Linux distro's
    • Free/Open/NetBSD, and BSDi
    • Solaris
    • SCO
    • etc...

    Linux may be the most publicized version of x86 *NIX's, there are others. In fact, I would reccommend that the DB mentioned in this question be run on a BSD. If FreeBSD can handle Hotmail, it can handle almost anything IMHO.

    Dave

    BTW: Before I get flamed, the Hotmail/FreeBSD thing I remember from somewhere, but I can't remember where. I do know its NOT on an NT box, which basically leaves UNIX.

  • by bero-rh ( 98815 ) <bero@reCOMMAdhat.com minus punct> on Wednesday July 26, 2000 @02:07AM (#905125) Homepage
    You might have to partition the DB into multiple files to get around the 2GB file size limit on Linux

    Or patch the kernel so it doesn't have the limit.
    Patches for this are available; if you don't want to build your own kernel, get the Red Hat Linux Enterprise Edition, which has this patch by default.
  • by Rebar ( 110559 ) on Wednesday July 26, 2000 @04:42AM (#905136)
    Yes Raid 5, in hardware thankyouverymuch.

    Like most everyone else, you are assuming all database are OLTP systems. Data warehousing or data analysis on the other hand requires MASSIVE data transfer rates (mostly read activity), and Raid 5 with large stripe sizes and multiple arrays works really well for this type database. Most queries against the roughly 3TB database I currently work on run in several minutes passing somewhere under 100GB of data each, and if we had used OLTP tactics (indexes to join everything, small block size for low latency reads, etc) to tune the database, they would run in days or hours instead of minutes. Aggregate I/O rates on this monster can exceed 500MBytes/second.

    As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!

    Someone please tell me that I will be able to use large files painlessly on Linux sometime. Until then, run large databases on name brand UNIX servers with name brand UNIX. Linux on x86 is good at a lot of things, but a large database isn't one of them YET.

    SQL> select sum(bytes) from dba_data_files;

    SUM(BYTES)
    ----------
    2.9003E+12

    And every byte is on RAID 5.

  • Oracle for Linux.

    Should easily be able to handle this. And considering the size, anything less would probably not be a good idea.

  • not any of the free unicses (gnu/linux , nbsd, fbsd, gnu/hurd, obsd) is "real" unix because none of them is certified as UNIX(r). general missunderstanding among *bsd ppl is that *bsd is unix while linux is not.

    btw i think (not sure) that there is some group working on certifing gnu/linux as UNIX(r). i believe it costs at least 10 000$ to get this certification....
  • by tjwhaynes ( 114792 ) on Wednesday July 26, 2000 @02:32AM (#905145)

    Look - 30GB database? Lets just look at the necessities first and then we'll get down to a choice of vendor (because you are going to want a reasonably heavy weight database server for this).

    30GB of data. Okay - so you aren't mission critical. Even so, with that amount of data, you probably want a hot-swappable redundant system such as RAID if availability means anything to you. But these days you have lots of choices for RAID, including software RAID under Linux. I'd probably still go for a hardware solution for RAID, but that is because I'm not clued up on how robust and failure-proof the Linux RAID is when one of the disks dies. If you don't care about redundancy, 40GB drives are easily found. For performance reasons you might want to find four drives of say 15GB each so that random access to the drives can be done in near parallel, especially if you stripe the drives, but that is yet another option.

    Accessing 30GB of RAM is going to require some reasonable memory space - think 512MB minimum and work up from there. Of course, you could run it on far far less (say 80MB) but you will pay a performance penalty - the database products I know about have plenty of tricks up their sleeves if they have spare memory to play with, and resort to paging out to disk when things get tight.

    The choice of software is important too. I'll declare my biases up front and say go for DB2 Universal Database, partly because I work on it and I like it. Your other choices are Oracle, obviously, and there are a host of other database vendors out there for Unix systems across the board. DB2 UDB is easier to administrate and looks to be faster than Oracle, as well as generally being cheaper to deploy. As far as functionality goes, everybody nowadays assures SQL92 conformance. SQL99 core conformance isn't too much to hoot about, as it's basically SQL92. The SQL99 spec is far more modular than the SQL92 spec, so it's easier to match the base functionality and then add on SQL99 conformance for, say, the multimedia extentions, later.

    So the answer to your question is yes - it is possible to deploy a 30GB on Unix. And it is definitely possible to deploy the same database on Linux - both IBM and Oracle have versions of their databases on Linux.

    Cheers,

    Toby Haynes

  • I have a few beefs with your post... 1. Oracle on NT a. '~crash by mistyping...' The answer to that of course, is to not mistype mission critical data, you should be using scripts for bulk trtansfer anyway. b. '~no remote management on NT' Where the hell does this come from? Oh, I know. You have no clue how to manage an NT enterprise, you're just taling out your a$$. As an NT engineer, I can do ANYTHING from my laptop, from ANYWHERE in the world. Using only MS tools and a few scripts I wrote in vbscript. I concede that sometimes it would be nice to have a 'true' terminal connection to the server, but you don't 'need' it. 2. '~Bulk import of Oracle for Linux.' Only 20mb a minute? bwahahahaha I can reload data into my NT, MS SQL server at over 150MB PER MINUTE. So, after a 3.5 hour restore, I can go to the bar and get a few brews while you sweat away for almost ANOTHER ENTIRE DAY. Oh, and get this. I can buy RAID tape devices that are supported under NT, and restore at up to 600MB per minute. Anybody want to restore a 30GB db in less than hour? I do. 3. '~use restores on backup db' Why not use any one of many excellent database mirroring/synchronization products on the market? Setup both servers, specify replication partners, and don't worry about it anymore. You may want to learn how to manage enterprise applications before spouting off on them. And before anyone flames me as a M$ booster or something, let me say that I do actually have and use *nix systems in my work. However, they do not run my enterprise messaging applications, databases, etc; they are for development processes because my company's clients require us to test on compliant systems. (I also use one of them for security / penetration testing. The tools developed for the *nix platform are better than on NT. However, I believe this is due to a loyal fanbase of long-time *nix users, and not because of any perceived flaws or inequalities of other os's. EvilGrin Fighting misinformation wherever it can be found.
  • Yes, yes it is. I believe that UNIX is the OS that the IRS uses for their database, which is many many many terrabytes in size. My friend's father works for them, and we discussed this in my file and data structures class.

  • You seem like a pretty clueful DBA so I'll won't reiterate anything you can easily pick up by reading the documentation.

    I'm in the middle of doing a feasibility study of migrating our flagship database (~30GB ASE 11.5) from big-iron AIX boxen to commodity x86 boxen running Linux / ASE 11.9.2

    I have not found the dump/load incompatibility to be a major hassle. If you tune your Linux box for fast BCP the load shouldn't be too painful. As an alternative, you might try using DBArtisan from Embarcadero Technologies [embarcadero.com]. It has a migration feature that makes moving data and schemas between servers very painless. It is well worth the price ($5000, IIRC) - it will pay for itself quickly in time savings alone

    In my test setup, I was able to move our 30GB database from the AIX box to the Linux box in about 10 hours, which fits within our normal scheduled maintenance window. The AIX box is a 4-way RS/6000 box w/ 1 GB and all the storage allocated as virtual partitions on a RAID-5 array (I didn't set this up). The Linux box is a quad Xeon w/ 1 GB of RAM and 8 drives; I'm using raw partitions and doing my mirroring manually from within Sybase. DBArtisan runs on an Athalon 550 w/ 128MB under NT Workstation.

    The AIX box is a little simpler to manage, because the old DBA had all the tables on the default segment. Even though it's more work, I prefer to hand-tune the database and place the big and/or active tables on their own segments & devices. Needless to say, you need to be comfortable using sp_placeobject & sp_partition to take this approach. I find that the extra effort setting up the server pays off in the long term in performance and reliability. Barring the difference in the physical storage strategy, I don't see any factor that makes ASE on Linux more difficult to administer than ASE on any other flavor of Unix. Actually, the OS-level administration is simpler in Linux than in AIX, IMHO.

    Since you say this is going to be a data warehouse system, you REALLY want to use partitioning so you can take advantage of parallelism. Re-read chapters 13, 14, 15, and 17 of the Performance & Tuning Guide [sybase.com] before you start, you'll be glad you did.

    I don't know what your uptime requirements are, so I can't say if Linux is robust enough for you. If you need rock-solid 24x7 availibility, I'd say stick with big iron and commercial Unix. If you don't need to be bulletproof Linux should be fine. For us, the cost savings are worth the slightly higher risk. As I write this, our Linux test server has 63 days uptime and has survived several stress-tests with no problems, so reliability hasn't been an issue so far. Linux performance seems to be on par with the AIX box so far -- but the database is not the bottleneck in our system.


    "The axiom 'An honest man has nothing to fear from the police'
  • Linux is not all Unix. Just like all Linux is not RedHat.
  • Sybase takes control of the HDC so it is not a filesystem file. Further, only a dunderhead would want 1 sement that is 30G in size. Sybase is smart and I have had no problems developing on Sybase for Linux and have been using it to play with for several years--never put too much strain on it.. most I have put in at one time was 6 G. Seem to handle it, though.
  • I see several compaq [compaq.com] machines out performing RS/6000s on tpc.org [tpc.org]

    My recommendations if you are on a budget: Stick with Linux and Sybase and get some vendor support. Definitely stick with x86 hardware since you are on a budget. The size of the database is less important than the actual design. How much data is going to be used at any one given time? Figuring that out will tell you if you need to add another gig or three of RAM. A good dual-processor machine should be sufficient, perhaps a quad if there are lots of simultaneous users. Bottlenecks in a database are rarely at the CPU.

    IMHO, you should concentrate on your RAID setup. Get ~20 4GB disks and set them up with RAID 10(full mirroring+striping). That alone is going to give you much, much better performance than a solution with say 4 20GB disks. At $200 per disk this will run you about $4,000. Paying careful attention to this will get you your best database performance while still spending a hell of a lot less than you would with an RS/6000.

    You need the performance, but obviously you can't fit the whole database in RAM. So get a good RAID controller and buy as many small disks for it as you can.

  • by johnlcallaway ( 165670 ) on Wednesday July 26, 2000 @02:59AM (#905182)
    I hear this question a lot, and I am really tired of it. It doesn't matter how big the database is, but how much it is going to be used. If I created a multi-terabyte database (can you say p0rn?) that only had one user, sure Linux/Intel could handle it. But throw it up onto a network with millions of requests per hour, and the equation shifts. Could you build an Lintel box to support it?? Lets see....

    Here are the priority items for any database box --
    • Memory. Databases love memory for cache, logs, etc. If you can keep your entire database in memory, disk speed becomes irrelevant after the first data access and for writes. If your box only supports a couple of gigs of memory, move on. We have boxes with 4GB of memory, and our DBA wants more.
    • Disk Bandwidth. The more disk bandwidth the better. Several little disks scattered about multiple SCSI controllers will usually perform better than comperable aggregated large disks. Don't even think about using IDE/EIDE
    • IO Speed. The faster your disks, the better (Duh...) Again, disk size can play second fiddle to disk access times. I would rather have many small, fast disk drives than one large, slow one.
    • CPU speed. Did you notice this was last??? Face it, if you can't keep it in memory and your disks aren't fast enough for your processor(s), then the CPU speed isn't as relevent
    • Network bandwidth. Most computers do not have issues here. However, there is overhead pushing data over a network, and the more data you push, the need for network bandwidth increases to respond to requests.
    It is also a good idea to seperate application/web servers from database servers. All modern databases support the ability to service database requests over a network. Providing a unique network solely for database activity that is seperate from the user network is common in most shops now to support the data movement from database servers to the app servers.

    The game all sys admins and DBAs perform is finding the current bottleneck. There is always a limiting factor for performance, and it can usually be tied to one of the above items

    Determining a configuration to support a database is not easy. You need to gather usage predictions, such as number of concurrent users, read rates, update rates, log projections, and make a guess. You also need to know your target audience and how they access it. A million requests spread over 24 hours is not the same as a million requests in a short period.

    This is only a sig, this is only a sig.....
  • So you add multiple data files. It's really no big deal.

    It's handy to use smaller data files anyway. It can be useful for load balancing.

  • Redhat Oracle distribution.

    Though i havn't tried it personaly Redhat do a very good Oracle tailored distribution,it gives Oracle it own partiton and is setup for performance &co. The support is ment to be quite good as well.

    I have worked with quite a few DB systems (M$ sqeeel, Sybase, intrabase, as well as the less server based db's postgres, paradox, access &co , and have an Oracle training course comming up soon, it has lots of info on Oracle for linux, but as i havn't been on it yet I can't go into any details.(but this is another story?)

    I believe Oracle will also run on other unix platforms, and may have support from other linux distribs other than redhat.
  • by twisteddk ( 201366 ) on Wednesday July 26, 2000 @02:56AM (#905205)
    And You also quite nice manage to change the question Yourself.
    Nobody said anything about Oracle. No wait.. I take that back... But the person posing the QUESTION didn't say anything about Using oracle for a DB. The question actualy stipulates a Sybase DB !

    But anyway, to answer the question posed in the first place: Yes You COULD probably handle a UX/NT trasition of the data, but try not to change database as this often screws with the data. Not all tables are stored identically in all databases (probably one of the reasons there are more than one supplier of databases). So for gods sake.. Even IF You want to have a backup/mirror on the UX box, make sure You run the same DB.

    But still, it sounds like you want to "exchange" the UX box for an Intel machine running Linux.. Am I right ?
    If this is indeed the case, yes even a 30 (or 50 gig for that matter) DB is possible. The major pitfalls in this scenario are (I've been there myself):
    1. Physical space for disks.
    If you go buy a Intel machine You limit Yourself to say about 3-4 SCSI controllers, and unless You go and buy a shitload of External conenctivity (kabinets and such), which can be a pain, You're often limited to about 8-10 disk drives, so size Your DB with some future expansion in mind.

    2.Backup solultion
    Make Sure to have a decent and FAST backup. I've not yet been able to run parallel backups on Linux (maybe I'm just not very good at configuring it), and it DOES take a while to backup 30 Gigs, even on a DLT, so if the client wants high-availability, take this into consideration. However, in Your situation, this might be redundant, since this DB WILL be the mirror (but the point should be handled otherwise).

    3. High Availability.
    Your client might want the DB to be accessible at ALL times, and we all know that when a PSU or CPU goes in a NT box, the machine is pretty darned worthless. And getting a decent service level on a Intel box is almost impossible (usually 24 hours is as good as it gets). Also You should consider if this mirror should be used as a fail-over in case of whatever.

    4. Remote access.
    Remote servicing is a bit easier though, as You can easily set up Telnet or whatever. However, You can get some goot remote programs for Windows machines also, just not AS good. But this should only factor in if You need to access the machine frequently. If the choice is between UNIX/Linux, it's the same diff. But if it's UX/NT, then think about it for a while.

    5. Maintanence
    Maintenance is a BIT heavier, especially when the machine gets older, but the first year or two, how gives a S***. Also, whatever peole might say of UNIX harddirves, they're EXACTLY the same as the ones sold for Windows machines. They're just formatted differently. So You will save a bundle on the costs of aqusition, which should cover for the added maintenance of trading in old components that can no longer hack it (MB's, Networking cards, SCSI controllers, RAM etc.), All of the components which are NOT the same :)

    6. Choosing the right hardware.
    You might want to make sure to spend a few more dollars on the right hardware. Whatever people might say, the UNIX boxes are most often put together with the best of hardware, ECC ram, Redundancy controllers, and hot-swap drives (and sometimes also other pieces can be swapped whilst power is still on). DON'T save more money here than absolutely nessesary. A good point to make would be: It's basically the same hardware, only the software is different.

    These are my thoughs/experiences on this matter. As for "FascDot Killed My Pr". I REALLY have to say: I've been running an Oracle DB (8.04) on an NT for over two years now, not a single glitch yet. And YES, it's a development DB, so there ARE active users. And installation was as sweet as pie. Only major flaw in my opinion is the inability of older Oracles to "bundle together", You could not have more than one major relase DB installed on one system, You have to add another logical DB to the exising one, or install a different major relase version of oracle as the second DB. But that's SUPPOSEDLY done away with in version 8 and up (not that I'm not haveing problems with it anyway)

  • I would stick with Sybase. There are no major advantages to going over to Oracle, and 30Gb is not that big an issue. My advice would be to base-line this as a dual CPU box with at least 1Gb of memory. ASE 12 does nice things like "companion" mode to provide you with the other copy. You could also use Rep-Server to provide replicatition as the means to have a second copy. That is the nice thing about Sybase, there are many flexible options to solve a given problem. Like many of the other comments in this thread, I would stick with Sun on the box side as you pay for the reliability. I would also suggest that you get professional help on the design and architecture side.
  • Just a thought - you could deploy something that is, in-fact, mission critical on Linux/ix86 - and it wouldn't even cost an arm and a leg. (Just a leg, perhaps.)

    See:
    Mission Critical Linux [missioncriticallinux.com]
    Oracle [oracle.com]
  • there are currently two versions of sybase for linux to concern yourself with. 11.0.3.3 and 11.9.2. one of the biggest differences is that 11.0.3.3 does not support raw partitions while 11.9.2 does. you will get much better performance from 11.9.2 using raw devices, also it lends towards better data integrity.
    as for your system, you'll be amazed at how much you can accomplish with linux/intel. there are only two components that you really need to worry about, CPU busy and IO busy. if the system that currently houses your database is running sql, then you can run this to get an idea of how to set up a like system:

    1> declare @loop_var int
    2> select @loop_var = 0
    3> while @loop_var begin
    5> exec sp_monitor
    6> select @loop_var = @loop_var + 1
    7> waitfor delay 'yy:yy:yy'
    8> end
    where x = iterations and y = the delay in hr:min:sec.

    run this during a "peak usage time", have the results dump to a file using the -o param and then
    take a look at the CPU and IO. you'll get something like this:

    cpu_busy io_busy idle
    ---- -------------------------
    3(0)-0% 0(0)-0% 13863(5)-100%"

    this is a sybase ase running on red hat at idle. during production you will want cpu_busy to be in the range of 60-70% as this allows for some growth, if you hit 80% or more start planning for more cpu power. conversely, if your io_busy is getting hit hard it may indicate problems with your network configuration, or that your device configuration needs tweaking. poor performance from a sybase server is not always cpu related.
    i run a 10GB DB on an intel pIII 600(ish) with a 1/4 GB RAM and my cpu_busy sits around 65% most of the time. except when users try to dump the contents of their windows "c" drive into the database...grr.
    hope that helps, ymmv of course.

    -scroe

If entropy is increasing, where is it coming from?

Working...