Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Linux Software

Linux Databases with Huge Tables? 220

Eugene writes "I am working on a project to create an expansive network monitoring and all encompassing database solution. With archival and ageing of all data we are looking at a database that grows by 40Gb a year, with some tables being well over the 2Gb file limit in Linux/x86. We have narrowed ourselves down to Oracle 8i (for it's stalwart referencial integrity checking, and PL/SQL for tracing across aged route information) and PostgreSQL (for it's object polymorphism and CIDR data types for IP addresses). We are concerned with the robustness of Postgres compared to "grown up" Oracle, as well as file size limitations. Can anyone give us any advice on these issues? "
This discussion has been archived. No new comments can be posted.

Linux Databases with Huge Tables?

Comments Filter:
  • by Anonymous Coward


    You don't need the patch for "raw devices" to use
    Sybase on Linux. You just use partitions as block devices. The mailing list archives have more on the issues involved in choosing whether to use a file system or "raw device" ( archive [swarthmore.edu] ), i.e. recoverability.



    We are using Sybase on Linux for several web based services. We have chosen it for speed, it's backup and recovery features, it's support for referential integrity, stored procedures, and all the other things one would expect from a SQL-89/92 compliant database engine. MySQL just doesn't offer those features, although there are ways to imitate them.



    At a fraction of the cost of Oracle ( and DB2 ), it's hard not to take a good look at Sybase. At least if money is an object. Oh and by the way....



    ASE 11.0.3 is FREE! I don't think there is another database engine with as many features, robustness, and speed that make that claim.



  • by Anonymous Coward
    Get your requirements together before you start looking at implementation issues. Then find somebody who can match your requirements to a reasonable design. I don't think an RDBMS will be part of your solution (it would not be part of my solution).
  • by Anonymous Coward

    Um, while you can use partitions w/o filesystems, it's still not raw I/O.

    The benefit is really that, since you're not running a filesystem, you won't sit through endless fsck activity after a crash. Your database server will need to perform its own recovery, though.

    And the 2GB limit to a "database device" applies. Sybase's "database device" seems similar to DB2's "tablespace". Probably goes for Oracle, as well.

    Since a database (and even a table) may span multiple "database devices", this doesn't present a problem.

    I've run multiple 2.5 GB databases with heavy loading using Sybase ASE 11.0.3 on RH 6.0 with excellent performance and reliability.

    IMHO, Sybase delivers the power and reliability, without the extra complexity of Oracle and DB2.

    Just my $0.02, spend it wisely... :-)

    Gordon.

  • by Anonymous Coward
    I would agree. Of course, I like DB2, but I can point to one major advantage that DB2 has over Oracle -- when something goes wrong, DB2 just gets slower and slower. Oracle will commit hari-kari and cheezle your data but good. And yes, I have a few years with both, and Informix as well (which isn't bad, better than Oracle, but without IBM's wonderful documentation). I have been very pleased with the DB2 betas, although I have been playing with 60GB of data and working mostly with huge numbers of small (1.5MB) files. I have had no issues at all. I will have four large FC-AL disk stacks to play with next week (8x36.4) and plan to grab stuff off of the EMCs feeding the S70As and see how far I can push it with decent performance off of a small Netfinity with 1GB RAM.

    I completely understand the position of this guy. We have several TB of old data, some of which we are having to rent MVS time to process, a lot of which comes from companies that we have taken over, and all of which really needs to be sorted and folded into our setup (AIX, DB2, SAP) within the next few years while people still remember what they were doing and what the *gotchas* were with their accounting and all. I think that Linux/DB2 will do the job as long as people are really using it lightly (relatively) and Linux DB2 is "real" DB2 -- from Linux to AIX without a hiccup.

    Of course, with a less insane load (or a nice four way box), I am sure that things would run better, but they aren't doing to badly now for me.

    If I can get another 1GB RAM, I will be able to add more users, but for the time being Linux DB2 is running like a typical IBM product -- slowly, but with the reliability of a Cummins deisel.
  • by Anonymous Coward
    I hate to say this too, but in me and my friends' experience, you might just want to go with Solaris and Oracle. We've tried running some experimental software RAID packages on Linux and have burned Oracle databases, unable to recover (ahem, we had no backups, but still...). I guess if you went with a really coservative loadset on the Linux box, that would be ok too, and you'd save money on hardware. The sun hardware is the real expensive commitment, not the OS.. Also, I'd recommend Oracle, not just because of it's track record, but because of paid support options and backup and sysadmin tools. These features make running a HUGE database possbile, where my __guess__ is MySQL and Postgres are very weak in these areas. Oracle also lets you create very fault tolerant, redundant DB systems that are fast and pretty much bomb-proof. If the system MUST stay up, then Oracle DB's with failover stuff configured is great. Don't think you get this from the smaller free packages, god bless their hearts. Thanks and good luck
  • by Anonymous Coward
    Well, I don't think so. A basic Multiprise with RAID would run you about $400,000, a few decent disk stacks would be $800,000, and that would be enough for 1.1TB of data on SSA and/or FICON, with the power of a low end G5 box (one MCM). Figure in licencing and upkeep at $300,000, it is air cooled, and you are only looking at $1,500,000. And that would be enough power to run a small Fortune 500 company.

    If you needed to access all of that data at once, day in and day out, while running multiple jobs sorting that same data with maximum uptime, that would be ideal. But if you needed the uptime, UNIX would be cheaper (HACMP with two S80s at $350,000 each with disk and 8CPUs and 4+GB RAM)(and a RAMAC or a 7133 stack on two SSA loops) for that application.

    Both UNIX and mainframes have their place (even AS400s have their place in large environments)(not to say that they don't do fine in small ones either), but I would say that this is a toss-up between a commercial or a free UNIX and a commercial or a free database. I would vote for Linux and DB2, probably, as long as the load wasn't too heavy.

    And don't assume that you still need to spend $50,000,000 on a mainframe. You can get really nice mainframe clusters (!) for less that $15,000,000 these days!
  • by Anonymous Coward
    Your idea of mixing Oracle and PostgreSQL seems counterintuitive at first - why use 2 DBs in one app?

    Due to Oracle's architecture, hardware resources had better be substantial in terms of CPU, RAM etc. However, it is hard to go wrong with either Oracle or DB2 or Informix or Sybase ASE. My suggestion is to either use Oracle 8i with data cartridges Or Informix Universal server or DB2 with Data Extenders to give you 1 DB with the features that you are looking for in PostgreSQL. Simplifies things and makes it more reliable.

    On the other hand, if politically things are viable, Versant and Objectivity (at a minimum) have Object DBMS on Linux already. Much better OO paradigm and faster and less resource intensive. This is especially suitable if you use C++ or Java. if you're a SQLhead, forget I mentioned ODBMS.

    MySQL does not have transactions so unless you can gurantee that your app doesn't want rollback, you shouldn't consider it for heavyweight stuff.

    Make no mistake, PostgreSQL is good but it obviously does not have the huge testing base that Linux has. The question is, does it have the testing base that Oracle, DB2, Informix or Sybase have? You decide.

  • by Anonymous Coward
    With Oracle, you can spread tables accross more than 1 file. The 2G limit does not really cause a problem. Don't use postgresql, it's klunky. MySQL lacks important data integrity features (rollbacks and foreign keys).
  • by Anonymous Coward
    You have some of your facts wrong. It was Ingres that was the research database project that was sold to CA, not Postgres. Montage was renamed because the name was already being used by another company. Stonebraker was never asked to "change the Berkeley rules". All the code produced by Stonebraker's research groups has always been available under the standard Berkeley software agreement. Anybody could have taken the code and used it for whatever they wanted. Andrew Yu was the chief programmer of the Postgres project for a while and Jolly Chen was a grad student of Stonebraker's. What's now PostgreSQL was an independent project that they worked on for fun. I actually ported an early version of this to Windows NT as a proof of concept. It had problems but it was able to run the Wisconsin Benchmark. Jon Forrest (a former minor member of the Postgres research group)
  • by Anonymous Coward
    You have to ask yourself one question: How important is this service (to me; to my company;etc). If the answer is important you need to look at a database which is exhibits appropriate characteristics (robust, scalable, supported, flexible, ...ible).

    I have used Oracle 7 (not under Linux) in a banking environment for mission critial apps, with large amounts of data (100s Gb's). Performance is excellent (with a good DBA), scalability is excellent, robustness is excellent, architecture is showing its age.....

    I'm sure MySQL is excellent for what it was designed: free, well understood, reference SQL
    platform. However it's my contention that for the sorts of data volumes you are looking at Oracle/Informix/Sybase/DB-2 are most likely to be the way to go.
  • by Anonymous Coward
    As much as I hate to say this, MySQL & PGSQL are still far, far away from becoming as stable & feature-filled as Oracle, esp. Oracle 8i.

    This has nothing to do with Linux, as all of (fine) products are fully supported on Linux & other fine Unices , Although, personally, I'd choose Linux as my platform for the future, which ever Database product you'll eventually use...

    As for the 2GB file limit thingish, worry not, because Solaris had the same problem until a patch was released for 2.5.1, and I still recall running a 70GB Oracle 7.3 (!) DB on that configuration, you just use as many 2GB files as you wish.

    Oracle 8i has all the features you should require for what you seem to need:
    On the database side

    • data-structures inside columns
    • sets inside columns
    • Good (REAL) integrity of references
    • Java support inside the DB (JVM running inside The DB backend), Forget about PL/SQL, a real programming language in the Database!
    On the system-ish side:
    • data partitioning into files
    • parallel servers
    • HSM (Hiererchal Storage)
    • Point in time recovery
    • Smarter Indices
    • Archive-logs
    • Online (Hot) backup
    • SQL tracing
    • And the list goes on...

    Apart from features & performance there's one thing you want to keep in mind:
    When the Sh*t hits the fan, you'll wan't Oracle's date-rcovering features.

    Sorry if this looks like an Oracle Ad, Oracle does have it's ugly sides, I'll be the first to admit that, but there is no real grounds for comparing Oracle 8i with MySQL & PGSQL, or any of the other currenty GPL'ed or free DB's I've seen.

  • I have been a dba for 5 years and I would hate to exaggerate, but I'm not sure you could pay me to use Oracle(ok - I have been paid before but it was kicking and screaming :) For ease of administration, flexibility, raw power, functionality, (this list really goes on), I wouldn't use anything but Informix. I haven't played with Informix on Linux to that scale - but I can tell you that you will regret going on Oracle for any large scale stuff especially if you're not too familiar with it. I've used Informix on 3 tera on solaris and I don't think you can beat it anywhere. Plus 9.2 will be released on Linux and you'll have datablade technology too.
  • Mysql handles some database with files near 2GB
    at my site. 10 million records. No Problem, speed
    is adequate (some 0.01 seconds for _complicated_
    queries. Also, since it's running on Alpha it
    won't have Problems with the 2Gig Limit.
    - Speed isn't a problem.
    - Large Files aren't a problem.

    But:
    - MySQL on Alpha isn't very stable. I've had some
    months problems until a version came out which
    would even compile properly. It compiles now out
    of the box, but still, isn't stable
    - LOCKING. Locking a table with 10M records sucks,
    because nobody will even be able to read it while
    it's locked.

    In the end: I wouldn't do it again this way.

  • First: MySQL isn't even in the running here. It lacks transaction support, triggers, etc., by design. MySQL was designed for speed, not features.

    PostGreSQL has the features he needs, and the latest version is very stable (as stable as Oracle). Its big problem: PERFORMANCE. While its performance is acceptable on a 2gb database, you can forget it on a 40gb database.

    I would say that if he can afford Oracle, to go Oracle. Oracle is proven in large-database environments. Informix, SyBase, and Adabas-D would also be good choices. If he's interested in more obscure databases, Empress has a nice 4GL. SolidTech's stuff looks pretty good too.

    Doing a search for "linux database" is instructive....

    -E

  • Hmm, why was the message marked a 'troll'?

    SGI Iris and Solaris are excellent choices for jobs that have outgrown Linux. That's the beauty of Linux -- once you have outgrown Linux, you can move up to bigger computers. And with 40gb per year of data, we're talking 400gb within a few years.

    I have personally fsck'ed a 40gb ext2 partition. It is *NOT* fun, it takes over 20 minutes even on the latest/fastest RAID subsystems! So it is indeed appropriate to question whether Linux is the proper solution for this problem. After all, this isn't Windows NT -- we don't have to say "Linux everywhere" to win (just "Windows NT nowhere", grin).

    -E

  • The latest version of PostGreSQL has almost all of the stuff that you mention, except for online hot backups and HSM. It even has several languages other than Java that run inside the DB backend, such as TCL, Python, and Perl.

    PostGreSQL's only problem nowdays is performance on huge databases. Well, replication is still a problem too, as is hot backups, but performance is the killer here. Oracle is proven at running huge databases, PostGreSQL is not. If I were thinking 40gb of data, I'd look harder at Oracle than at PostGreSQL.

    -E

  • by gavinhall ( 33 ) on Tuesday October 12, 1999 @05:29AM (#1621782)
    Posted by patg:

    OCI programming? Uhhhhgghhh. It's cool, but very different/difficult to the uninitiated. The Oracle distribution comes with some good OCI sample programs, but I wish the interface were like mysql's C interface.

    With OCI, you have to bind every one of your columns before doing an insert,update, or query.

    I suppose that one could write a C wrapper library around OCI which would do that binding for you.

    Essentially, DBD::Oracle is such a beast, except it being for the use of perl.

    I was at one company where the Oracle consultants suggested using a perl cartridge with Oracle's web server because DBD::Oracle wasn't "supported". I went on the explain that DBD::Oracle is really just an OCI application and that it indeed is something Oracle should look at as being "supported".

    By the way, DBD::Oracle (along with DBI) is an excellent piece of code that powers many a site.
  • That's exactly right. Lets not forget about the baseline quality and stability of other MS products. MSSQL7 does /not/ stand out as a superior product, in any way. Even Oracle is faster than MSSQL. And MySQL blows it away. Don't bother with the reverse psychology on SlashDot, AC, it doesn't work.
  • whereas it IS for Postgres. Postgres on Linux also can run on Alpha machines, blowing away many of the painful 32-bit architectural restrictions. 6.5.2 is much, much better than older versions, but I'm not 100% sure that I'd trust it for a mission-critical system yet.

    Oracle, on the other hand, is developed on Solaris and then ported to other platforms. So if you're going to run a huge Oracle installation, do it on Sun hardware. Use raw I/O (can't do this yet under Linux) and stick the (sievelike) Solaris boxes behind a firewall, for god's sake. But please don't be a martyr and run a big Oracle installation on Linux... yet.

    Don't forget that Oracle is and will remain a complex beast. My personal viewpoint is that Oracle is the C++ of databases -- highly flexible, can be outrageously fast, but you MUST know what you are doing. Hire a professional DBA if you don't... a big installation like you describe will rapidly spin out of control unless someone skilled is around to manage it.
  • Actually, ASE 11.0.3 is (in the beer sense) entirely free -- no support from Sybase, but still free.

    ASE 11.9.2 is a fully supported product from Sybase and is priced along the same lines as ASE on NT.

    Just clearing the air...
  • I've used Solid, and it works pretty well. It's a lot simpler to administer than Oracle, and it fully supports transactions, with various concurrency options. Check out www.solidtech.com. I've used it personally, although it was really as an object repository rather than a large data store. It has a good ODBC driver, and it exists for many systems.
  • The problem is in the VM system, not the filesystem. The 64-bit read/write/open system calls are in glibc presumably because other operating systems support them; I don't know what happens with them on Linux (they probably degrade to the normal 32-bit ones, and the library simply translates the parameters).

    In any event, it doesn't matter with Oracle, or Solid, or any other halfway rationally designed database. They all allow multiple tablespaces, or datafiles, or whatever you want to call it. It isn't simply a matter of 32-bit compatibility either; there's the little matter of disk/filesystem limits and load sharing.
  • by Grim ( 1405 ) on Tuesday October 12, 1999 @04:12AM (#1621788) Homepage
    We have been using postgresql for all our database needs for the last 18 months where I work [best-ads.com]. It has gone through a number of changes and is now a LOT more stable than it was only a year ago.

    If your data is critical and 100% availability is essential, I would probably still go for oracle, as it has a far more mature background and, whilst closed source, is very stable. Otherwise, postgresql is a very good choice. It handles tables of effectively infinite size (it just breaks them into 1GB chunks to avoid the filesystem limit). And it has the bonus that is is open source. If you need a feature, you can add it. I am sure that if you are part of an organisatin generating 40GB of data a year, you probably have a few programmers floating around to add enhancements.

    As an addition, I have found the postgres development team to be VERY responsive and helpful. Personally I would say they are probably the most active and helpful open source dev team I have seen.

    If you want something out of the box, with 100% availability, go oracle. If you want something that you can control, that will grow with you, and will eventually get 100% reliability (it is very close right now), go postgres.

  • I'd say look at IBM's DB/2, which seems to be more solid than Oracle 8i. Informix' Dynamic Server might be worth a closer look, too, but Informix tends to be less robust. You can get free demos of both these products.

    Disk journaling and a fast filing system are also very important. It looks like Reiserfs is going to beat SGI to having a journal, and it's already faster than e2fs. I'd say give that a look, too.

  • If you're putting 40Gb of data on a machine, you're going to have to do some tuning, or at least vaguely know what you're doing, no matter which database you use.

    I've played about with Postgres and, while it's more 'professional' than MySQL, I still wouldn't like to bet a large amount of data and my company on it. I'd be quite happy putting that volume of data on Oracle, though.

    Suspect an Intel-based PC, even a big server, is going to have a bit of difficulty with that much data. Have you thought of getting a Sun or an Alpha?
  • ... unfortunately Linux doesn't support raw devices without a patch. Since it's not a standard part of Linux, Oracle -- which also supports raw partitions -- won't support it.

    It does work, though, although the performance gain isn't huge.
  • I wonder why Informix is not on the list. I dont have any experience with Informix, but had problems with older Oracles (version 7.x, not on linux). Informix looks interesiting and scalability seems to be more than sufficient for this job.
  • Solid's a fairly nice database, though it lacks advanced features such as triggers. (It does have transactions, though, which is all that I really need.) However, Solidtech reworked their business plan about 6-9 months ago, and they're now mostly interested in customers who want to embed their database in an application and redistribute it 10,000 times; they're not very interested in supporting individual users who just want a single installation for running a Web site. This change caused a lot of bitterness on the solid-list mailing list, and many users, who'd gone out on a limb by recommending Solid instead of Oracle, were very angry. Most of them seem to be drifting away to other databases now.
  • The make test failed loading a shared library, so I wrote a simple c prog just to load all the .so libs in a dir and it would load every .so file in the disk except any one in the oracle tree (Forget the error now, sorry). The kluging that followed was horrific.

    PLEASE tell me you added the oracle library dir to /etc/ld.so.conf?? That's generally the problem. (Oracle installer dosn't do that for you)

    --Dan

  • Is there anything similar to DBI for C? Obviously you can't do it exactly the way DBI does, but you can present a uniform interface to all databases.

    With that in mind, all the C database code I write is through wrappers designed to work the same way as DBI, but I only write it for the databases I actually work with. (Postgresql and Oracle)

    Am I re-inventing the wheel or is this something that's only been done for perl?

    --Dan

  • If you're trying to do graph traversals (tracing routes, or walking a tree) in SQL, get something that understands SQL3. Recursion was added specifically for problems like this. PL/SQL is an antiquated solution for this problem. Perhaps if you give more information about the queries you intend to run, we could come up with more useful suggestions.
  • A better solution to simply piping to gz is to also pipe through "split" and split it up into chunks, then you can use "cat" and pipe it back to a psql monitor to reload the db... (Also works great for splitting up a dump to fit onto cdr's)

    ie: pg_dump |gz |split -b 670m


    ----------------------------------------------
    bash# lynx http://www.slashdot.org >>/dev/geek
    Matt on IRC, Nick: Tuttle
  • by EricTheRed ( 5613 ) on Tuesday October 12, 1999 @02:17AM (#1621798) Homepage
    PostgreSQL on Linux handles large tables pretty well. It breaks each table into 1Gb segments to get round the 2Gb file size limit. We chose 1Gb as it was a nice round figure, which makes it easy to organise the tables when they get really big.

    When they do become huge, you can then move the segments onto different filesystems to ease the load (although currently this is not a simple task). This also applies to indices.

    I know of a few large databases out there using PostgreSQL, but one I have access to is the TASS project (http://www.tass-survey.org [tass-survey.org]), which currently has about 28Gb of astronomical data.

    The largest database I have is currently 5Gb, but that's primarily limited by disk space and not by postgresql.
  • i was just noticing there has been little mention of Sybase (so far anyway). Are there compelling reasons why a Sybase product would NOT be a good solution for this problem?

    For that matter, are there compelling reasons not to use Sybase for anything? just curious...

    --Siva

    Keyboard not found.
  • not so sure about that...when you go to download the linux version of either 11.0.3 or 11.9.2, you are directed to this licence [sybase.com] which says you can only have a maximum of 3 users and cant redistribute it among other thigns.

    particularly relavant is this line:
    5. SUPPORT. These download Programs are provided on an "as is" basis and are unsupported. This Agreement does not entitle you to any maintenance or other services or any updates or new versions of the Programs.


    --Siva

    Keyboard not found.
  • I guarantee that you are not running that system on a Linux box.. :-)

    But if you drop that specification, then I agree. Besides which I don't know if you would *want* to run a big database on Linux today.

    Cheers,
    Ben
  • On 64-bit CPUs Linux loses the 2 GB limit on file-sizes. Plus 64-bit architectures are significantly better for dealing with large amounts of data. Not to mention the nice speed of an Alpha.

    Unfortunately none of the "big boys" have released for the Alpha so you would need to use something like PostGres.

    Another tip. If you use ext2, increase the block-size. By default it is 1k, if you bump that to 4k you may see a performance increase and you will see a big improvement in fsck. Of course long-term the right solution to fsck problems is to use a journalled file-system. And so at the moment you may not want to use Linux for this...

    (Give the penguin time. It is still growing up.)

    Cheers,
    Ben
  • Just wanted to give you the heads up on something - pgsql lets you pop in your own custom functions, so you can do things that are simply impossible to do client-side due to the limitations of SQL. The other thing is that you can define your own user data types. The last feature that I've been wondering about is the ability to retrieve data in n dimensional - I have no idea what they're talking about, but it sounds alittle like using vectors in C++ to return objects. Might be useful, might not be - I don't know.

    --
  • that I don't see mentioned much yet... DB2 [ibm.com] and Sybase [sybase.com]. For a functional example of Sybase in serious action check out Distributed.net's Statistics. [distributed.net]
    DISCLAIMER: I work for IBM, but not on Linux or DB2.
  • We just finished talking to Oracle and a VAR for Sun. A simple setup to support ~30 users was going to be anywhere from $500,000 to $2,000,000 depending on the hardware configuration. The actual hardware price was well under $100,000 so you do the math.
  • If you've already narrowed the field down to the two, nothing I say or do is going to turn your mind from that.

    So, let's look at the pros and cons:

    Oracle:
    Pro: This is THE RDBMS. Really. There is no RDBMS better than Oracle. Period.

    Con: Expensive, like any other commercial RDBMS.

    PostgreSQL:
    Pro: Open Source.
    Pro: Widely respected -- only Oracle has a more experienced codebase.
    Pro: Actively supported.
    Pro: Free.
    Pro: In the words of Philip Greenspun, who is an unabashed Oracle fan:
    "> The open source purist's only realistic choice for an RDBMS
    > is PostgreSQL (see Resources for the URL). In some ways,
    > PostgreSQL has more advanced features than any commercial
    > RDBMS. Most important, the loosely organized, unpaid
    > developers of PostgreSQL were able to convert to an
    > Oracle-style multiversion concurrency system (see below),
    > leaving all the rest of the commercial competition
    > deadlocked in the dust. If you've decided to accept John
    > Ousterhout as your personal savior, you'll be delighted
    > to learn that you can run Tcl procedures inside the
    > PostgreSQL database. And if your business can't live without
    > commercial support for your RDBMS, you can buy it
    > (see Resources for a link). (From a LinuxWorld article on AOLserver)

    Cons: not as SQL92 or SQL3 compliant as Oracle
    Con: no referential integrity (yet)
    Con: absence of outer joins (right now)
    Con: recovery is not quite there (look for the next version...)

    PostgreSQL whips up on MySQL once the word 'transaction' enters the picture.

    PostgreSQL's multiversion concurrency control (MVCC) insures that even in the presence of multiple concurrent transactions there can be no deadlock. With large databases, processing large inserts, the deadlock issue is a very real one.

    And, most importantly, unlike many other so-called RDBMS's, PostgreSQL passes the RDBMS ACID test.

    Lamar Owen


  • >> its not something easy to sum up in a few words

    Sure it is :-)

    MVCC is a Very Good Thing, and is really the right way to implement multi-user concurrency; remarkably the high-dollar databases don't do it, probably because doing so would be a major change from their locking-based legacy designs. (Interbase does it, BTW)

  • How is it implemented in Oracle? Is it the default, always-on mode of operation?

    I read somewhere that Oracle can do a form of MVCC, but in that mode of operation it is drastically slower due to implementing MVCC in an inefficient mannet, something having to do with transaction logs.

    (Of course, I don't recall the source, and it could be completely inaccurate.)

    Can an Oracle guru comment on this?

  • What about if the RAID is something simple like RAID-1 (mirroring), with essentially no performance penalty on writing, and a speedup on reading?

    It seems to me that doing a big Oracle system (with the dozen or more recommended seperate drives) without RAID would be a bad idea from an uptime point of view. Even if you can recover to the last millisecond, the odds of one of those drives dying (and requiring that the system be down to rebuild) is just too high.
  • I would not hesitate to use Oracle. We tried postgres before Oracle was available on Linux and after about 3 months the database just tanked, corrupting itself, for no aparent reason.

    When 8.05 came out for Linux I installed it on my Linux server and I am very pleased with the product. Addmiditly I am bias because I use Oracle on HP/UX and NT on a daily basis.

    I have used Oracle for about 6 years now and I am more and more impressed by this product every day.
    It is a complex system, but you can get classes from Oracle, and from many other companies. If you don't like the cost/time involved in classes then just go to any book store and you will find tons of books on all the various aspects of using and administrating Oracle.

    Oracle has many thing's going for it. Point in time recovery, massive scalability and it supports most hardware/software platforms.

    It has years of real life usage behind it. I think the product has been around for 10-20 years now.

    Another option is DB2. I don't have any personal experience with this product but I have heard nothing but good things about it.

    Chris Kraft (krafter@zilla.net)
  • by IainBowen ( 10783 ) on Tuesday October 12, 1999 @01:08AM (#1621811) Homepage
    I've been using Oracle a long time now and Linux for a few less years and I've so far found Oracle 8i on Linux a stable and enjoyable experience - however, I have yet to base a real system on it. I've also toyed with Postgres under Linux, but after using Oracle for so long, it just doesn't compare.

    I would say that tablesize going over 2Gb is not a problem. the tables go in tablespaces which can be made up of many datafiles.

    PL/SQL is useful for fast and dirty code, but lacks a little to C for performance. If you can master it, Oracle Call Interface with 'C' will produce very fast code.

    I'd also recommend reading a good book both on Oracle Design and on Tuning. There's a couple by O'Reilly, oddly enough.
  • MySQL may be great for larger databases - but what happens if they're getting written to contstantly?

    As far as I know, MySQL only supports table locking, rather than the row locking of Oracle (and perhaps Postgres). We started using database logging for our Apache server, so that we don't eat so much CPU running analog - Statistics would just be SQL calls to the database.

    We started with MySQL and ended up postponing the project - MySQL just couldn't keep up. The hardware isn't the problem, it was a DS/20 w/ 1 GB of RAM, running Digital UNIX 4.0F.

    Since we have other projects, this one has been on hold for some time, waiting for one of us to port our logger to Oracle.

    Just thought I'd mention my experiences with it. I'm not knocking MySQL - We still use it for a decent number of mostly reader databases, but for intensive writers - no way.

    Disclaimer: I'm not a DBA, nor do I pretend to be. If anyone has information that contradicts this, it's entirely possible that I'm wrong.
  • Solid has a nice "black box" database that is pretty easy to run right out of the box. Their web-site is solidtech.com [solidtech.com]. They have had Linux support for a long time but have recently gotten a bit big for their britches and declared that they aren't really interested in selling small licensing packages any more. No one has really been able to figure out exactly what they are up to, and there have been some very heated threads on the Solid support list lately.

    My experience is that Solid is a very robust, easy to set up, low maintenance RDBMS (with referential integretity checks, transactions, etc.) that isn't open source. They seem to now be fancying themselves a new Oracle so I'd just stick with Oracle if you need something out of this class. Kinda sad because Solid has been around with Linux since the early days as a low cost solution, but new management has come in determined to tear the company down, it would appear...

  • DB2 Enterprise for Linux goes for ~15K/processor with unlimited connections. You can get your first version of DB2 for something like 100-300 bucks if you sign up for the ISV program. IBM is really looking to put the hurt to Oracle. We priced out Oracle for 279K....ouch.
  • is now with Cohera (www.cohera.com) They are doing "federated" databases
  • Journalling file systems usually don't do journalling for data, only for file system structure. Furthermore, the databases themselves do the journalling and recovery for the data.

    Using a journalling file system for the partition holding database tables will accomplish pretty much nothing (but it won't do much harm either).

    Journalling file system in general don't help much with data integrity either; the kinds of guarantees they usually make are minimal, they don't protect you against many sources of data loss, and they don't even guarantee fast system startups.

  • Big boys not on Alpha ... bah !

    We manage shitloads of data running on Oracle 7.3.3, 7.3.4 and 8i all on DEC/Compaq machines.

    Runs fine. I second the call to bring AdvFS and LSM to Linux. Vinum (for FreeBSD) is good, but AdvFS is the real deal.

  • Well, MySQL never said they would "never" do transaction support. In section F3 of the current documentation, "Some things we don't have any plans to do" [mysql.com]:

    Transactions with rollback (we mainly do SELECTs, and because we don't do transactions, we can be much quicker on everything else). We will support some kind of atomic operations on multiple tables, though. Currently atomic operations can be done with LOCK TABLES/UNLOCK TABLES but we will make this more automatic in the future.

    On the other hand, at the O'Reilly open source conference David Axmark told me they eventually plan to support the complete ANSI92 SQL spec. So in this case 'not now' may not be 'never.'

    Beating on a drum I have worn out doing so, not everyone understands what database people mean when we say transactions. PHBs and product specifiers, particularly. It really just means support of the SQL standard COMMIT/ROLLBACK feature. Operationally, this means a lot of heavy hauling for the database; just ask an Oracle engineer about how they do it (as I did once).

    If you don't really really need "transactions," it's worth thinking carefully about how to turn off the feature in the DBMS you're using (some do allow that), or use one like MySQL that doesn't.

    MySQL's developers made a deliberate choice here; it optimizes the program for some applications and rules it out for others.

    --------
  • Hmm...
    No, the page is still there (Just reloaded...)
    And No, it isn't the same page...
    With the http://www.mysql.com/crash-me-choose.htmy you can choose wich databases to compare, thus reducing the size of the result-page and enhancing the readibility drasticly.
  • is found here...
    http://www.mysql.com/crash-me-choose.htmy
    Take a close look at the MySQL solution's, they really rock at large databases!
  • They may have fixed this by now, but a couple of years ago there was no way to delete a variable from a table. You could add them, you could rename them, but you could not delete them. I found this incomprehensible.
  • Mainframes have their points, but only a few applications require them. Without knowing the environment proposed for the database it's hard to be specific about the hardware. Still, the earlier recommendation of an Alpha-based processor might be a good one. Or a mainframe might be the best. Or...

    For that size of file growing that fast, I doubt that an I*86 would be the right answer. But it depends... given low use and tolerance for waits, it could be.

    Backups could be a real drag though!
  • Most commercial databases prefer raw partitions and you don't run a file system at all. The database manager has log files for commit/rollback processing. They run faster on raw partitions, since the OS is not "in the way". The DBMS actually doesn't need much of what a general purpose OS offers, that's why Oracle was promoting the "raw iron" concept (BTW what ever happened to that?).

    Last I heard, maybe it's changed, Linux didn't support raw partitions.
  • Teradata's great (I work for the company with the world's
    largest Teradata database, and write code against it so I'm a little
    biased here), but, unless you are hitting 60 TB's or so, I'd look
    at a bit more userfriendly package (Oracle 8i, or DB/2 or
    Informix (Informix's drivers byte,however) I would reccomend taking
    a serious look at Oracle 8i (forget the stuff about it being
    so new that it's buggy, take a look at the highend customers
    using it, and the comments they have about it) Quite a bit of the
    Online stores and datawarehouses I've seen have used 8i
    with super success. Be aware however you may have to
    balance cost-vs-performance unless you happen to have an
    extra couple (or more) grand to toss around. I'd try a demo of all the products,
    and pick which one seems right to you. I would however stay away
    from MYSQL (you *MUST* have good transaction behavior and
    good cursor support for multi-gigabyte databases, unless you happen
    to enjoy downtime.....) I've never toyed with any of the other
    freebies, so I can't comment on them.....

    Tadghe
  • ASE 11.0.3 is FREE?



    Not quite. You can download RPMs of ASE 11.0.3 for development and evaluation purposes, but when you actually implement an application with it the license fees are the same as the NT version - about $300/user or a flat $50K if the app touches the web.


    Oracle's fees are similar but they are pushing use of "application service providers". Basically web hosts that have licensed Oracle instances that you can use for a premium.




    BTW, if you want a free set of Oracle Development CDs, check out:



    http://www.oracleanswers.com [oracleanswers.com]



    Register for the Atlanta or Birmingham seminars. They are fully booked - you won't get to go. But Oracle will send you a complimentary set of CDs for free



  • Gee, I wish you hadn't posted that as an AC...

    And I think I should try to find another forum for this discussion. This is a VERY important topic for many Oracle folks...

    My big question about this post is:

    When you say it blows chunks, are you saying SQL*Plus blows chunks or does the server instance actually give up the ghost? I'll have to try it here at work (I think we've got an 8i instance around here someplace...)

  • No RAW IO on Linux. I won't go into detail here, but Linux doesn't support Raw IO right now. Linus himself has stated that he has no interest in adding raw IO support because the performance increase from using buffered IO far exceeds the disaster recovery benefits of raw IO. But I hadn't though about FS limits. That's not so much a problem with Linux or Intel, but rather a problem with the POSIX file manipulation functions in GLIBC or LIBC. What does ftell return? A signed 32-bit long, which has a range of -2GB to +2GB. I think there are some kludges that use an unsigned long, but until 64 bit POSIX functions, we're stuck.
  • I've used PostgreSQL and Sybase for a few Linux based projects. I use Oracle 7.3 on HPUX at work. I'm helping a friend spec out a new project. He's ultimately going to use Oracle on Solaris (its government work - so the DB & OS choice is decided through politics). But development will likely occur on Oracle on Linux.

    We have been going around in circles trying to decide whether to use Oracle 8.0.5 or 8i. As far as I can tell, the only significant advantage to 8i is the JVM. But 8i requires a minimum of 128MB RAM and 256MB swap without the JVM and 256MB RAM and 512MB with the JVM. Note: those are the minimums. And Oracle's not kidding. Don't even try firing up an 8i instance with less than 128MB RAM. Ultimately, the memory requirements will be satisfied. I suspect we'll spec out a Sun Ultra20 with about 1GB of RAM. But We don't want to invest a fortune in the development hardware.

    Oracle 8.0.5, on the other hand, runs nicely with at little as 32MB of RAM. Plus, the install is CUI - whereas 8i requires X (I usually don't install X on my servers).

    Besides a few cool new features in PL/SQL, does anyone else have any experience with 8.0.5 vs. 8i that would help sway me one way or the other? Right now we are pretty much set on using 8.0.5...
  • Oracle 8i Enterprise is $200 per Mhz. Dual P2-450 = $180,000 Support is extra, so is a bunch of other useful to necessary options. You need to get at least it, support and 'Programmer' to do anything useful. Fun fun fun. Other databases are also stratospherically priced. Informix is around $50K per CPU (for the new 2K version) And Sybase is pretty close to that, maybe a little more expensive. DB2 is $12K per CPU last time I looked. None of them are by any measure close to what would be considered cheap. Or, in most books, affordable.
  • DB2 has a big advantage over Oracle - the documentation is human readable. It is also a mature product, which is very important ofr a large scale mission critical application. IMHO, I would choose between DB2 and Oracle if I really cared about my data.
  • Hi,
    When it comes to very large databases, Oracle is a good choice. I have run databases of about 5GB on
    NT without much problem! And still larger ones on
    Solaris. It is very rugged. Only thing is that as
    the database grows you will have to do lot of tuning to keep the performance to acceptable limits.
    Postgresql should be a good choice for a secondary database. ie, if u can use it for day
    to day data collection and put the collected data
    periodically to Oracle. Postgres also supports integrity constraints and quick scripting now.
  • Use Oracle. PostgreSQL is just not ready for prime time.

    I have used both, and I have lots of problems with PostgreSQL. Large Object handling in Psql is bad bad bad. I get errors from the database all the time. On the other hand I have dozens of Oracle 8 databases running night and day with no problem.

    Oracle has a huge learning curve though. Consider Interbase, it runs on Linux and is very good. It is so simple to admin, no real maintence required.
  • What platform/filesystem are you using Oracle on?

    Oracle does not recommend cooked file db's AFAIK, unless used in conjunction with Veritas vxfs.
  • Speed isn't the only reason to want to let the RDBMS use raw partitions (which essentially means it is managing its own filesystem...)

    Reliability in database consistency is crucial. Put a RDBMS on a cached filesystem and you are asking for trouble. When writing to a raw partition, a write is actually committed to disk. When writing to a cached partition that guarantee cannot be made. While there are some filesystems that support specific database engines (vxfs from Veritas comes to mind - it supports Oracle and Sybase), putting any RDBMS on any OS's native filesystem is a Bad Thing(TM).
  • by tomierna ( 22789 ) on Tuesday October 12, 1999 @04:02AM (#1621835) Homepage
    There are so many RDBMSs out there that it is very difficult to make a good choice quickly.

    Even if you limit your choices to one OS, you still have at least five or six great DB platforms.

    The choices you've given are by no means the only ones you have, and the reasons you give for narrowing down your choice of engines seem to be pretty sparse.

    For instance, Sybase ASE and ASA both support referential integrity constraints. ASE's are more limited than ASA's, but ASA has RI checks that are comparable to what I've seen of Oracle 8i.

    All of Sybase's products have T-SQL, which are their programmatic extensions to SQL92. Sybase has a robust C API for writing server extensions.

    Sybase ASE is fully capable of handling the data load you've specified. So is Oracle's enterprise-scale product. So are some of Informix's products.

    Interesting tidbit about Postgres - parts of it found its way directly into a product called Illustra. Illustra was bought lock, stock and barrel by Informix. Informix has in one of their products the object-polymorphism you crave.

    From the PostgreSQL Guide:

    The authors of PostgreSQL 1.01 were Andrew Yu and Jolly Chen. Many others have contributed to the porting, testing, debugging and enhancement of the code. The original Postgres code, from which PostgreSQL is derived, was the effort of many graduate students, undergraduate students, and staff programmers working under the direction of Professor Michael Stonebraker at the University of California, Berkeley.

    Michael Stonebraker spun Postgres into Illustra. For a while after the Informix buyout he was grafted into their executive management. I dunno if he's still there, since Informix doesn't seem to list that information on their site.

    Informix still has grants at UCB as well as many, many other colleges. (source) [informix.com] Some of those grants sound interesting...

    Now, I realize that the subject was "Linux Databases", but there are *solid* RDBMSs that run on the various BSD's and commercial *nix, too. I mention this because of the large scale data needs. I'm not sure that Linux running on any PC hardware is capable of supporting the amount of data you are looking at. (Notice I said PC hardware...)

    Because of that, I'd suggest a commercial *nix, running on a server-class machine and an engine from one of the "big three" database vendors: Sybase, Oracle or Informix.

    All of these vendors offer their enterprise-level engines at very reasonable prices, since you are buying at the per-seat level in most cases.

    Of course, I'll likely get flamed to hell and back for the above statements because I'm suggesting commercial, non-GPL engines running on commercial, non-GPL OS's on top of hardware that wasn't built by stuffing armfuls of parts from Fry's into a whitebox ATX case.

    If I were stuck cobbling things together because of budgetary constraints, I'd go with Linux running on a whitebox and use the free Sybase ASE installation that bundled with RedHat.

  • Cormac's post is a little misleading. Oracle writes straight to disk, and doesn't go through the filesystem at all, so fsck never gets involved. However, fsck is necessary for PostgreSQL.

    Unfortunately, there are not any journalling options available for Linux yet. Stephen Tweedie's ext3fs stuff is only barely in alpha, and SGI hasn't gotten very far through porting XFS. For now, if you're using a database that goes through the filesystem, you're stuck with fsck.

  • I really like MySQL, but just yesterday I switched to DB/2. I have had corruption in 11M+ record tables three times now, each time completely unrecoverable. Reimporting that many records once was acceptable. Twice scared me. The third time, I switched.

    Temper this with the fact that we are using a JDBC driver to access it that might be doing "bad things", and the fact that it seemed to occur during development. I'm guessing it was related to killing processes (during debuggin) and it not cleaning up dropped connections properly. (Lack of transactions, maybe?)
  • Just curious, but what version of PostgreSQL are you basing this on? When I st arted using PostgreSQL 4 years ago (approx), I wouldn't have used it for a missi on critical apps myself, but now...database corruption bug reports aren't someth ing I see...
  • Actually, we don't do 'row level locks'...we went one further and went straight to using MVCC. See the following URL for an explanation, since its not something easy to sum up in a few words:

    http://www.postgresql.org/docs/postgres/mvcc.htm

  • Actually, I must disagree here...at work, we use Oracle and *everything* is on the OSs file system, based on recommendations from Oracle themselves...this place doesn't do anything without checking wtih the vendor half a dozen times first, and even then they have to ask once more :(
  • I am a big fan of DB2, especially its speed with really large tables. Version 6.1 for linux installs really slickly, and has finally gotten rid of the need for ksh, which the 5.2 version had.

    If you want to do big database stuff, you should look at who does it well. The US Patent Database [164.195.100.11] runs on DB2, which is the largest public database in the world. (I think the numbers were 15 Terrabytes Compressed Data last time I read up on it)

    Free 60 trial downloads can be snatched up at this fun address [ibm.com].

    My main feeling that this is a better option over something like MySQL is that if you are really shuffling arround that much data, you need a really robust transaction control engine. Last time I worked with MySQL, it was fast, but lacked real transaction control. This may have been added by now, as I haven't checked of late.

  • Perhaps we ought to bear in mind that it's only expected to be amassing 40Gb a year - it's hardly as though you're going to want to do a select * across one table that size and get all the events for a year.
    So yes, journalling would be a good idea, but it would also be very sensible to invest in a regular (weekly? daily?) archival solution.
    He could always use postgresql, pg_dump | gzip, and then drop the tables, every night :)
  • That is an alternative, although if there's one thing you might want in a high-load network-management station, it's transaction support.

    From the PostgreSQL Admins' guide (http://www.postgresql.org/docs/admin/manage-ag175 6.htm#AEN1774), I note there's a section on large tables, including the line:
    "Since Postgres allows tables larger than the maximum file size on your system, it can be problematic to dump the table to
    a file, since the resulting file will likely be larger than the maximum size allowed by your system.
    As pg_dump writes to stdout, you can just use standard *nix tools to work around this possible problem:
    Use compressed dumps:
    % pg_dump dbname | gzip > filename.dump.gz"

    etc.

    As regards postgresql being "as grown up as Oracle", erm, it's certainly a viable solution, and doesn't suffer from the bloatware problems of Oracle.
    However, you might well find that it's not as fast as Oracle or MySQL for adding lots of events as rows in the table quickly - it all depends.
    Try it out and see :)
  • Yeah. The other thing that I remember from the page is thinking, the ODBC 3 results are somewhat screwy - it depends strongly on what ODBC driver you go through, as the rule with ODBC is not 'if the database supports it, we do' but rather 'if it's in the spec and we choose to implement that feature, you can then look up its availability and use it if you choose'.
    Rather long-winded but it sums up feature support in ODBC fairly well, I think.
    What I'm saying is, there are one or two bits & pieces in the results that I believe are artifacts of the chosen ODBC driver, not of postgesql (which is the one I was looking at most).

    And yes, I'd expect some confusion over ODBC's DATETIME type being merely 'DATE' to Oracle, especially if the script was written to look for only one-to-many relationships the wrong way round, as it were.
  • Sorry about this but I could only find it on Metalink which needs an account that is tied to your oracle license.


    If you have access the direct link is http://support.oracle.com/ml/plsql/knowledgebase.g etcr?textkey=239049


    If you can't get access to it I'll be summarising something similar on a new web page about Oracle that I'll be creating in the next couple of weeks. Send me your email and I'll drop you a note when its ready.


    Basically it discusses keeping objects grouped into tablespaces by size and enforcing keeping the same initial and next extent sizes for all objects in a tablespace. It also goes some way into pointing out why the 'my tables in 100 extents' panic is a myth.



    ZamZ@mzm1.demon.nl is my email BTW

  • by ZamZ ( 28920 ) on Tuesday October 12, 1999 @02:05AM (#1621846)
    If you are looking for a 'black box' to put your data in be aware that Oracle, even at only 40G of data, requires some heavy administration at times. You may want to consider MySql for an easier ride on admin.



    As well as knowledge on tuning the database you'll have very specific Oracle SQL tuning to do. I've been doing this type of thing for a number of years now and I still don't know 100% of what is needed - there are always knew features to get aquainted with.


    In saying that there are some good resources to get you started. Have a look at http://www.orapub.com for white papers and before you even start to lay the database down see the Oracle white paper (from their website) 'How to stop defragmenting and start living'.


    You should also take into account that 8i is a new-ish release of Oracle. Keep an eye on the bug lists for it and make sure you won't be impacted by anything.


    Apart from the warnings, I've run around 60G Oracle db's on Linux and as long as your server is beefy enough you should have no problem.


    One big thing, if you do go down the Oracle route have t-shirts made up with 'I've looked at v$system_event today, have you?' written on them and wear them. If there is one way of finding what your performance problems are/will be its this table. The only real exception is bad SQL which you need to trap in other ways

  • Both Oracle and Solid have MVCC support (though they don't call it that). Do they count as high-dollar databases?
  • by dublin ( 31215 ) on Tuesday October 12, 1999 @07:21AM (#1621851) Homepage
    I'm not a database expert, but I've done a fair amount of healthcare IT consulting over the years, and run across Intersystems' Caché database quite often.

    Most people aren't familiar with Caché from Intersystems, but if you're building a very large, sparsely populated transactional database (like an electronic medical record, for instance), it's at least an order of magnitude faster than Oracle, even after doing unnatural things with Oracle. (This info came directly from a major EMR vendor that benchmarked Cache and all the usual suspects and found nothing in the same league for this application. According to his tests, Caché was 30x faster than Oracle out of the box - Intersystems' website claims Caché is 20x faster than RDBMS competitors.)

    The technology is quite interesting in some regards: It is very mature, having its roots in the old MUMPS or "M" system (about as old as Unix), but has been updated quite nicely as time has passed to provide quite capable SQL and object-oriented interfaces, even though it is neither an RDBMS or an OODB internally.

    Another nice feature is its distributed caching protocol, which allows you to build a logically huge database server piecewise from a number of distributed servers. A number of large installations, including Boston Children's Hospital (IIRC), use this for their EMR - last I heard, BCH's database was spread over 100-150 servers. This would seem to fit nicely with the Linux way of doing things.

    It runs on Linux and a free download is even available on thier website, or they'll send you a CD for the asking if you're one of the unfortunates stuck on the end of a phone line.

    You can find Intersystems at http://www.intersys.com [intersys.com], not exactly the world's most intuitive URL, so I thought I should include it here...
  • If you're using Oracle, better to let Oracle take care of these things.

    Oracle tablespaces are, in effect, a lightweight, special purpose file system. By keeping the transaction logs and data tables on different tablespaces on different devices, Oracle can gracefully handle the failure of any one hardware device, allowing you to move the affected data to a different device, with NO DOWN TIME. Oracle does a better job of handling this than any operating system filesystem, so running the tablespaces on top of a filesystem just adds bloat.

    Speaking of bloat, people who call Oracle "Bloated" need to examine their definition of "bloat". Let's say you have requirements x and y, and piece of software has features X and Y that satisfy them. It also has feature Z that satisfies some other requirement z that you don't have. That doesn't make it bloated, it only means that it might be overkill for your application.
  • What's wrong with PL/SQL? It has a simple, pascal-ish flavor, with a few niceties borrowed from ada. It is going to be instantly understandable by any programmer. It seems a very good choice for most kinds of scripting that needs to be done on the database server side.

    Not that Java doesn't have a place in the database engine, but I'd be uncomfortable using it for everything. Isn't the VM is going to be a problem for performance critical uses like triggers?

  • Well, I'd say what's right about PL/SQL is that it is simple, is readable by almost any competent programmer, and executes reasonably quickly in things like triggers and simple kinds of stored procedures. I don't find its syntax particularly cumbersome compared to most database scripting languages, although Python elegant it emphatically is not.

    What's wrong about PL/SQL is that it doesn't provide anything in the way of object abstraction or a whole buch of other things you want in a full blown application development context. But of course, that's a can-o-worms.

    It sounds like you're using PL/SQL outside its reasonable problem domain. Web integration? If God wanted us to use PL/SQL for this, he wouldn't have invented Larry Wall.

    I like Java too, but I don't want to have to use it everywhere. Do you really want to have every statement in all of your triggers run through the Java VM? What happens when you update a million rows in your database, and get several million trigger executions? Think of all the memory allocation and garbage collections.

    That said, having a language like Java when you do need it is great. Choice is good.
  • by hey! ( 33014 ) on Tuesday October 12, 1999 @03:16AM (#1621858) Homepage Journal
    Yeah, but the problem with SQLAnywhere is that Sybase doesn't want it to get TOO good. One of the things they've done is provide a wacky implementation of SQL, and taken out things like statistical aggregate functions like standard deviation. It's small, fast and useful, but in terms of compatibility its the pits.

  • by hey! ( 33014 ) on Tuesday October 12, 1999 @03:31AM (#1621859) Homepage Journal
    I agree with this wholeheartedly, especially about the documentation. You need to be a good skimmer. If you can't sit down and skim through a couple thousand pages of documentation in a couple of days, then you could be in trouble. In the past, I've also got bitten by Oracle documentation that was just plain wrong.

    But... if you aren't constantly under the gun from cranky PHBs, tuning an Oracle database can be _fun_. If you've never worked on a system this configurable before, it's like you were a model rocketer and suddenly were given a chance to fire off a Saturn V. Chances are it will blow up on the launchpad, but if you get everythign right the results are spectacular.
  • by hey! ( 33014 ) on Tuesday October 12, 1999 @04:11AM (#1621860) Homepage Journal
    Except that from what I can see, four out of five 4D projects end in a bloody train wreck. I know, I've been involved in a number of rescue missions for 4D users.

    4D has a number of major problems.

    First, it is not relational in a technical sense; that is it does not allow for non-procedural programming. It uses a kind of foreign key structure to "relate" (ugh) one table to another, but the programming style is based on traversing links, and it implements that badly (imagine programming in a system which only allowed you one cursor). The programming style this enforces is much more like a hierarchical or network database. The lack of cursors means that 4D programs show poor modularity; whether any piece of code works depends on the global state of the system. This results in code that behaves inconsistently. Furthermore the weak object model tends to encourage use of global variables to control system behavior.

    4D was a database designed by people with only a vague understanding of database technology, and this shows by their confusing tedency to use technical terms like "project" (not as in project management but as in "project an image") but in ways completely unrelated to their accepted meaning.

    The 4D file format is at once highly fragile and stunningly slow. Past work I've done with recovering crashed 4D database indicate that it depends critically on internal file pointers, which tend to get corrupted. The internals are not documented, and the tools ACI gives you are utterly inadequate. You absolutely _must_ buy third party tools from people who have reverse engineered the structure, and you absolutely must run these any time the database has terminated abnormally. These tools are very slow. So after any abnormal termination, you can count on much down time. Whats worse is that the combination of slowness, and the lack of anything like SQL, means that designs almost always have to carry acucmulator fields in tables that must be updated (to keep monthly totals etc). So in addition to running the third party tools, you have to gin up your own tools to fix these fields.

    The 4D developer environment is one of the most poorly thought out I've ever seen. You MUST lay out your database in a graphical layout window, and "relate" your tables using a connection tool in this window. This works fine for three or four tables, but in a system with twenty, thirty or more tables it often takes minutes of painstaking mousing to figure out the "relationship" between two or three tables.

    This points out 4D's real weakness. Things like the "current record" model of programming and the graphical table layout make it fairly easy to get started with a small system, but unless that system stays tiny (in complexity AND number of records) you are bound for trouble.

    I could go on and on about the shortcomings of 4D; but I know some developers like it. Different strokes, I guess, but one thing is absolutely certain: 4D is a very BAD choice for large, mission critical applications.
  • If this is a mission critical system, you should seriously consider putting together an RFQ that characterizes the application access, external interfaces, data volumes, and sketches out the major data domains. Choose the technology based on the application, rather than choosing the technology and trying to make the application fit.

    No matter what you choose you will be spending a good chunk of money on hardware and maintenance staff. An good DBA with experience tuning the database you choose will be critical to success, as will an architect that can address the design issues *before* you invest in the development.

    Most of my large RDBMS experience has been with Oracle 7.x and DB/2 on commercial SMP Unix systems. Both have performed quite well when properly tuned and administered. I've also used Sybase for mission-critical apps, but the ones I've worked on weren't that big (1-2GB total vs. a few hundred GB.) I do know of companies in the banking industry who use Sybase for large databases as, so I expect it could handle the load as well.

    As others have pointed out, your key issues are likely to be backup and restore capabilities rather than a simple "which is better" issue.

    I guess it's obvious that I wouldn't recommend an open source solution (Postgres, MySQL, et. al.) to a client in this situation. There are a lot of situations where I'd recommend open source, but large mission critical data repositories isn't one of them.

    To me it's not an issue of whether an open source solution *could* handle the job -- it's that I don't have experience with deployments that *prove* it will work. I wouldn't consider products like MS SQL Server for the same reason.
  • Uhm, we use MySQL at my company for some projects, but not for projects where we need a DB that is "mission critical" and "very robust". For those needs we go to Oracle.

    The lack of referential integrity has already been mentioned. Far more critical is the lack of transactions! The ability to roll back is indispensible in a sensitive application. Also, Oracle has vastly more sophisticated tools for backup than MySQL has -- another essential element of mission-critical DB administration.

    Oracle is bloated and expensive, and I certainly don't recommend it unless you really do need it. But sometimes you do.
  • by Get Behind the Mule ( 61986 ) on Tuesday October 12, 1999 @02:40AM (#1621878)
    If you are looking for a 'black box' to put your data in be aware that Oracle, even at only 40G of data, requires some heavy administration at times. You may want to consider MySql for an easier ride on admin.

    That's quite true (but see my other post about shortcomings of MySQL). I can't help the original poster much because I have no experience with PostGres, but I can tell you that while Oracle is very powerful, the learning curve is very steep, and you'll need a skilled, experienced DBA to really get all of its benefits to work for you.

    I have found few things in the world more frustrating than Oracle documentation, not because there's not enough of it, but because you can't see the forest for the trees. The first time I had to administer an Oracle DB I was overwhelmed, and there was almost nothing there to help you learn how to just get started. By now, I feel relatively comfortable with Oracle, but it's taken months of practice.
  • You need to look into the journalling and diskmanagement side of linux.

    On databases this large it might still take an age for linux to start up again if it has to fsck the disks.

    There are a few journalling options out there, and this is something that is *very* important to look at if you want to ensure data integrity.

    Currently other Unix systems do have support for very large disk arrays.

    The use of databases on Linux is pretty much a cert, but your big problem will be in disk handling.
  • is found here... http://www.mysql.com/crash-me-choose.htmy

    That page doesn't seem to be available. The closest match is:
    http://www.mysql.com/crash-me.html
    Which seems to contain the information.

    S.
  • by moscow ( 68604 ) on Tuesday October 12, 1999 @01:43AM (#1621887) Homepage
    Are you aware of the perl DBI? It works for both Oracle and PostgreSQL (and mysql and ...). This can be used to provide access to databases regardless of type in a uniform manner, and are truly useful for remote access and regular running jobs, particularly those which use standard SQL. In Oracle terms, you can run Perl DBI anywhere that you can run sqlplus.

    There are a number of GPL products which use Perl + DBI. One of the best, IMHO, is Orac which also uses Perl/Tk and so provides GUI access from multiple platforms (Solaris, Linux, NT). Orac offers loads of SQL scripts to help with tuning, or just seeing the layout, of databases. It also provides realtime database monitoring, which is the current thrust for improvement of the tool. You can find it on CPAN, e.g. here [funet.fi].

    Another monitoring tool, which is capable of emailing you when it's unhappy as well as putting up current status on a web page is Karma. This is still developing rapidly and is intended for Oracle on Linux. This can be found here [freshmeat.net] on freshmeat.

    Hope this is of some use.

  • It works for me.
    But I find the results suspect (NB, I work with Oracle databases, so my opinions are biased too.)

    Consider the date & time datatypes:
    date, time, timestamp, timestamp with timezone, datetime, abstime, datetime (other version), abstime, reltime, smalldatetime, timespan, year.

    Alledgedly, MySQL supports 6 of these, Postgresql 8, and Oracle only 1.
    But that one, despite being called date, also includes the time down to the second. If you don't want the time ignore it.

    Similarly, the function list seems to have been needlessly extended to demonstrate the extra functions in some databases (not necessarilly MySQL, but that's where this example comes from):
    Oracle doesn't support FROM_DAYS or TO_DAYS (correct, it uses TO_CHAR or TO_DATE respectively, but they're not listed.) It doesn't include WEEKDAY either, but uses TO_CHAR again.

    There are more mistakes like this which are not in Oracle's favour. But I don't care enough to rant. :-)



    --
    Too stupid to live.
    • mike stonebraker is CTO of informix (see http://www.info rmix.com/informix/corporate/overview/execs/execs.h tm [informix.com]).
    • mike stonebraker is also CTO of cohera (see http://www.cohera.com/stone.html [cohera.com]).
    • Relational Technology (RTI, subsequently Ingres Corp., then part of ASK, now part of CAI) was the commercialization of the University INGRES codebase. INGRES was a Berkeley project from the early 70s to the mid 80s.
    • Miro Systems (subsequently Montage Software, then Illustra Information Technologies, now part of Informix) was the commercialization of the University POSTGRES codebase. POSTGRES was a Berkeley project from the mid 80s to the mid 90s.
    • stonebraker has always released his research software in source form, even in the 70s and early 80s when this wasn't particularly fashionable (these days, the funding agencies make a much bigger deal of it). external contributors were always able to donate changes which were folded into the postgres source tree at berkeley. (most ports happened this way.) the only real change that happened with regards to licensing is that there used to be a UC-imposed license fee for commercial use. a couple of grad students talked stonebraker into talking UC into dropping this fee.
    • systems like illustra have come to be known as ORDBMS (object-relational), which is very different from OODBMS. don't expect OQL - expect SQL3/SQL-1999.
  • We are currently running Oracle 8.0.5 Enterprise Edition on a RedHat 6.0 Box... and so far its been VERY stable...

    Good install procedure for 8.0.5 on RH6 is on http://joradn.fortwayne.com/oracle/

    We currently have around 3 tables with over 2 million rows on each table, and Oracle has so far performed to expectations...

  • by frossi ( 96194 ) on Tuesday October 12, 1999 @02:18AM (#1621916) Homepage

    MySQL does not support transactions. The good point is that there is NO way to make a transactionnal database as efficient as a non transactionnal one. So MySQL is faster than PostgreSQL for many benches.

    The bad point is that when you need transaction, you have to perform table level locks which are far LESS efficient than row level locks of PostgreSQL.

  • The raw disk access used by Oracle databases is for an increase in data access time not for increasing the database size.

    Cliff, one thing you did not mention was in which dimension is the table "big". If you are talking about many rows, Oracle has table partioning that allows for very fast indexed searches on "billions and billions" of rows. If you have rows that exceed 2GB Oracle can handle that quite nicely using standard file systems on any platform. If you have data points that exceed 2GB you may have a problem that only a change in platform can address. This would require a blob data type and has limitations (2 GB I believe) when stored in the database. There is a bfile subtype of the blob data type that allows for any size data point but is then limited to the file system's max file size because Oracle stores the data as an actual discrete file. Yes, Oracle is not the easiest product to administer. But, it is intended for use with difficult to manage databases so has all the bells and whistles necessary to manage large databases and unruley or unwieldy data. The first time you restore the database to 16:42:13 June 12, 1999 (or any other point in time) you will realize the power of the restore features too.

    PostgreSQL is a fine product but I do not believe that it is ready for this size of database in a mission critical application (I am assuming that you would not be considering collecting this amount of data for a non-mission critical app)

    I am really surprised we didn't get the "my database is bigger that yours" statements that is typical of the oracle news groups. But then again we at /. are a finer breed (cough).

Remember, UNIX spelled backwards is XINU. -- Mt.

Working...