Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Linux Software

Is there An Enterprise-Level Open Source RDBMS? 177

Colm@TCD asks: "This is something I've run up against, and I'm sure that anyone else who's trying to bring Linux and Open Source software into 'enterprise' systems has encountered the same problem: Are there any OSS 'enterprise-quality' RDBMSs out there? Many (most) business setups require a fast, solid database engine to run back-end stuff. Fast SQL support isn't enough, though -- these systems need to be able to fail-over automatically to a backup machine in the event of the main system falling over. What are the best of the available systems? Are there ways of taking the existing systems like PostgreSQL and making them failover-able?"
This discussion has been archived. No new comments can be posted.

Is there An Enterprise-Level Open Source RDBMS?

Comments Filter:
  • by Anonymous Coward
    Really cool DataBase from MicroSoft
  • by Anonymous Coward
    Interbase looks very promising and I'm looking forward to its open source release. I suspect I'm not the only person who is reluctant to start using it, however, until the source files are actually available. Even with the best intentions, there may be last minute legal snags.
  • randall_burns@hotmail.com www.outlander.com One additional solution that is Open Source and worth looking at is Interbase (which used to be marketed by Borland). I'm not sure of the exact features of Interbase, but is _has_ been used in commercial environments. Now, my own feeling here: 1) I think this is a _VERY_ fruitful area for open source efforts 2) I think there is a _lot_ of room for relative to Oracle and Sybase
  • You're totally right. I'll be interested to see what happens if and when that comes to pass.

    So far, we have Interbase, which is currently in that odd limbo state, so common to formerly-commercial opened releases, of being called Open Source, but still no code being out. Will this actually arrive under an open license, and become the core for a free-for-all of open improvements, eventually eclipsing even Oracle, the second-largest software manufacturer in the world, in features, performance and stability?

    We'll see. For now, we have the label "Open Source" on the website, a binary-only release, and a promise of code sometime in the next three months.

    Until such a thing comes to pass, it's still just in "let's suppose"-land. It's still up to the garage hackers, so I stand by my comments.
    --
  • Aye, verily. My comment was getting really really long as it was, so I didn't even delve into the 'sexy factor' that comes into play in the itch. But yes, exactly so. People have to get paid to do boring things, they kind of definitionally don't do them for fun. That's why all these open-source projects focus on flash like themability before they actually get around to finishing their functionality -- you don't get a kewl screenshot of a database to put up on your project's site...
    --
  • This won't work very well, since the proxy should handle multiple simultaneous updates and transactions. Problem is it's impossible to make sure things really get executed in the same order on all the replicated servers, and the servers will eventually get out of synch. Transactions might fail on just some of the servers but work on the rest, how do you handle those errors? You could of course serialize updates from everything else, but that would kill performance. You really need some sort of global locking between the servers to pull this off.
  • just seen this announcement....

    http://linuxtoday.com/stories/19117.html

    May help..in October:-)

    Applixware also seem to have one..http://freshmeat.net/appindex/1999/05/11/9264 80111.html

    Martin
  • Umm...C'mon folks, lets at least read up before going with the kneejerk anti-MS flame.

    MSSQL 7 does support row level locking.

    MSSQL 6+ support online backup and restore, better than any of the others - you can backup the DB while it is still up and running, instead of making a copy of the DB and backing _that_ up.
    (No idea why they have this and at least Sybase, who had the same codebase, don't. Some Microserf must've briefly channeled a code deity. Weird.)

    What the hell did you need raw I/O for anyway? You get a perf boost using it under some OSes, but only because of quirks in the given OS. Raw I/O != better for all systems.

    MSSQL has its roots in the Sybase code. Be fair, slap them too if you don't like where they came from. As if several years ago history were in any way relevant...

    MSSQL can and does support enterprise applications - I've installed apps for many Fortune 1000 companies, and it's done as well or better for us than Oracle or Sybase, even when those are running on Un*x boxen. Don't get me wrong, for the serious datacrunching, you've got to go with Oracle or Sybase on the big iron, but for anything other than monster projects it'll do what you need.

    Yes, like any product, it's got bugs. But so do Oracle and Sybase, and I'm sure all the others though I don't have the first hand experience with those.

    I'm not a M$ bigot, nor have I ever worked for them. I just can't stand folks who just fire off mindless salvos against a product just because of the logo on the box. Especially when they don't know what they're talking about.

    -reemul
    If you want to attack M$, find me the bastard who thought a talking paperclip was a neat idea!
  • Read this article:

    http://www.zdnet.com/intweek/stories/news/0,4164 ,2473264,00.html

    Many years ago (15 to be exact) I was the technology manager for a very high tech system specification tool. The first version was developed on VAX/VMS. We chose DEC Rdb v1, which just hit the market when we launched the company, over Oracle (which was four years old at the time and in V3.x). Rdb was just better performing and had a much more sophisticated feature set (including blobs, which Oracle didn't have at the time and was a must have feature for us).

    After we finished our first version of our tool, we had to port it to workstations, including Suns and Apollo. Just around this time Starkey left DEC to found Interbase, along with my old boss, Dave Root (whom I had worked for in a DBMS development company called SEED). Choosing Interbase was a no brainer - it was the most advanced RDBMS of its time (the competion then was Oracle and Ingres), Starkey and Root were brilliant developers, and Apollo (now owned by HP) OEM'ed Interbase, plus the API of Interbase was almost 1 to 1 with Rdb.

    I lost touch with the company many years ago, but I followed it in the news, as it was bought by A-T and then Borland. It was a great example of brilliant technology that was never marketed properly.

    When I first heard Interbase was going open source, I was pretty excited. Of course, my concern was that since the original founders hadn't been involved for years, the quality of the product might have deteriorated. I haven't used Interbase in over 10 years. Hence my great excitement in reading the above article. The combination of Starkey's involvement in the new company with the open source model ensures, IMHO, that the new/old Interbase will be one hot SQL DBMS solution.
  • We are developing a B2B exchange portal?
    (Is'nt everyone now days)

    We are going with Postgres and it works.
  • AFAIK

    Informix was the first of the big RDBMS vendors to port to linux. I think the quote was "We just typed make"


  • I've been wondering a similar question myself.

    Is there any gnu/freeware that are robust enough to be categorized as "Enterprise-Level"?

    Please don't take this as an insult, I am just genuinely wanting to find out the answer.

    If you know about any, whether it is ERP (Enterprise Resource Planning) or office-suites, or databases, or whatever that is/are freeware/opensource/gnu (Free as in freedom and/or beer), please drop me a not or two.

    My email address is: knife@newmail.net

    Many thanks in advance.

  • Another test message, this time, with a login.
  • I agree with everything you've said, however you are neglecting the other way to achieve replication and failover: middleware transaction monitor systems.

    I am not impressed with what I've heard and observed about the Oracle failover and MS SQL failover. I would completely trust Tandom non-stop, however that's a very high-end solution. I'm not sure about DB2, Informix, and Sybase. When I used Sybase at AOL, we didn't have failover, only RAID.

    Most of these systems (except I think Tandom) are limited in a number of ways. Requiring shared hardware (in Oracle's case, as in some others) is not scalable and very expensive, not to mention not ideally reliable. Most are limited to one master machine that replicates to others, often hot spares rather than distributed load processing.

    Systems like Tuxedo act as a front-end to database systems and manage distributed transactions, failover, and load balancing. IMHO, this is the way to go.

    I have been studying this problem and working on solutions for a while now. If there is anyone who would be interested in working on a transaction monitor project with me, please let me know.

    Thanks sdw
  • I don't think it's right saying an Open Source RDBMS shouldn't be built because an RDBMS has to be complex.
    Perhaps it's the chance for a young hacker to begin writing an RDMBS with a simpler and modular design and breaking the myth about RDBMSs being necessarily complex.
    In fact, I'm not 100% convinced about the relational model (with it's SQL) being the best one for using at enterprise level, there should be better solutions for the 100.000% availability problem.
  • This is, perhaps, one of those situations where you are rather unlikely to find an OSS solution. An enterprise-level operating system kernel is not exactly an overnight hack. It would take an enormous kind of time and effort to put an enterprise-level UNIX kernel together.

    I'm not even sure that an enterprise-level OSS kernel makes sense. If you're talking about an enterprise-level product, you probably have the resources to afford to purchase a commercial Operating System. Both Sun and QNX have decades of experience building robust, rock hard, operating systems that you can bet your company jewels on, and both companies have x86 versions of their OSs. I'm not sure about Compaq (Digital), they may have an x86 version of their OS too.

    Right now, attempting to deliver an OSS kernel that can beat time-tested operating systems from Sun and QNX would be quite a challenge, to say the least. People will just have to accept the fact that OSS can't solve every problem in the world. It's certainly feasible that one day Sun or QNX might decide to release their respective kernels under an OSS license (which should certainly be quite a shock to many people), that's probably the
    most likely scenario.
  • You know, there was a time when I had some faith in moderation. This marks the absolute end to that time. Would somebody pleaes put an extra filter onto the set up page to allow us to ignore any comment moderated (5, Insightful), please?
  • Flamebait this is not! Jesus, where does everyone's sense of humor go whenever the topic of Linux's (or Open Source in general)'s viability in the marketplace? I laughed more at this post than I have at Microsoft's "The Truth" pages on TCO of Windows vs. a ball of twine.
  • I agree fully with you, but what if the actual database file was stored on a shared disk (via Fibre Channel, or some other sort of SAN or NAS). Couldn't 2 SQL servers access the same file??? or does SQL lock the file or something???
  • I disagree. Views are very very helpful in many cases - for example, when you need a join between >1 table to get consistent rows returned on a query that happens often. For example, one table holds user information, one holds news items. News items refrence the unique user ID, since a duplication of information would occur if it stored the plaintext username. When you list news items, you want to print the name of the user who posted it, which is held in a different table - make a view, and tada, no extra work for the coder. Creating a view and selecting off it is a real wrist-saver in cases like these.

    And I really don't think that it takes that much longer to select off a view than doing a huge join - the SQL server has to do the same work (select with a big join) either way, it's just less typing when you use a view.

    Maybe this elusive column-level security does the same sort of thing, I don't know. Care to explain what it is, so there's a better chance of it getting implemented? :)


    --

  • Column-level security is just what it sounds like it is - the ability to assign access rights to columns. I'm not edumicated enough to know if this has been implemented anywhere (except in some non-RDBMS systems like Lotus Notes), but you can see that it might be problematic for inserts.

    The workaround is to deny the users access to the base table(s) and allow them access to a view that contains a subset of the original information.

    As for a join in a view versus a join in a stored proce, you're right, it's the same difference.
    --
  • you mean the sort of time and effort it would take to hack together say...an operating system? :)


    -dk
  • I used to do LOTS of stuff with what became OpenVision's HA and Sun's HA.

    To add new services, you pretty much need a start, stop and test script. The data reside on shared disks that get transfered between the machines on failover.

    On failover, the new primary will run the start script and serve PostGresql. On failure, it will try to run the STOP script before it dies and the other machine become primary (see beginning of paragraph until bored).

    Of course, dedicating two machines to an HA setup and running (good fast hardware) RAID with SCSI joined to the two machines means then buying/hacking your HA software means that you're dropping some cash on this. And you're running PostGresql.

    If the service is worth dropping that sort of cash on, do you really want to use a product with nobody to call when some stored procedure causes a crash? Or might it be worth putting a little cash into support as well? (and count the time when you spend 4 days debugging a problem. if it takes 4 people 4 days @ (say) US$600/day, then you've just spent around $10000. And that's cheap labor.

    So, that said, it's not hard to add jo-blow services to the HA systems I've worked with.

  • Oracle used to offer "Trusted Oracle" which was supposed to impliment this. I never used it myself and may have mis-understood the product. I heard it was quite the resource hog and rather complicated to use.

    This product was apparently included in Oracle 8i

    See This PDF [oracle.com]
  • <i>[since people have this nasty tendancy to create views that span multiple tables]</i>

    Okay, granted, just about anything that performs multiple joins against a bunch of different tables is going to slow down the query engine, but have you ever created a view without joins in in a properly normalized database? I'll let you in on a secret: my users and I know that running the monthly calculations (oodles of stored procs one right after another) take forever to do - they just let me know when they're starting, and then we all go out for coffee and a long lunch.
    I've played with several databases in my life, and I think I can safely say that taking a performance hit in the face of complex queries is pretty universal amongst all of them.
    And yes, I've thought about writing my own query engine, something that would be more efficient and put all those Oracle and SQL Server slowpokes to rest once and for all...
    Easier said than done. I realized, much to my chagrin, that Common Joe DBAs like me can write query engines that run pretty quickly <u>if</u> your Joins only deal with Indexed Fields. This is fine in a <i>properly</i> designed DB, I guess, until your other DBAs start to play Pin The Blame On The DBD.
    When you have to add functionality to allow Joins against Non-Indexed columns, you begin to acquire an appreciation for what the software developers who wrote your RDBMS had to go through - suddenly, waiting a measly 20 minutes for your DB View off a server with 250 other clients doesn't seem as bad anymore. :)

    Skevin
    Database Design and Programming
    Disney Televentures
  • "..., so people can't say computer geeks don't write well. ;)"

    I thought he/she did that on purpose.
  • Hi Chris, my statement is based on what is currently shipping as InterBase 5.6 and what I know about PG 7. Of course, neither IB 6 or PG 7 has shipped yet, and so we don't know for sure what will be in the final bits. For PostgreSQL, having declarative referential integrity and outer joins (when they make it in, whether in 7.0 or 7.1) will be big steps forward.

    In any case, it's great that there will be two transactional open source databases availble, and I'm sure each will push the other in good ways. More choice is a great thing.

    -Tim Dyck
  • I am also an Oracle DBA working in silicon valley and would basically second everything the poster says. That said, many people are very attached to the idea of automated failover, and we will probably be implementing some form of cluster software in addition to the described standby database solution.
  • You don't have to fake it. It's thread-safe and (from what I can tell) queues concurrent queries. Again, I'm sure it will develop something more mature, but it's early days yet. You can vote for features and fixes on the SourceForge survey [sourceforge.net]
  • Check out Ozone [ozone-db.org]. According to the blurb it's a "fully featured, object-oriented database management system completely implemented in Java and distributed under an open source license". And it fully supports XML [xmlhack.com].
  • Actually GROUP BY has just been added. Support is described as 'limited [www.lynx.ch]', but it looks like an active area of development.
  • RDBMS design is relatively new in the long view of the art of Computer Science.

    Has Computer Science really been around long enough to consider a 30-year old field "relatively new"? Even if you were to say that CS got going in 1940 (and I'd be interested to hear that argument), RDBMS ideas would have been around for 50% of the time, 1970 - on (or whenever Codd released his stuff).

  • I have to disagree... There is a belief around that all things OSS are good, and all things closed are bad.

    And whatever he said, I will say, linux does not scale well at all. Once you have more than a handfull of processors, linux loses, and NT, Solaris, etc all win. And K/Gnome??? Sure, they're configurable, and powerful, but when it comes to doing work (other than trumped up shell windows), neither comes close to win98/macos for ease of use, consistency, or navigability. *inhales sharply*

    Don't get me wrong, I like linux, I think OSS is a Good-Thing(tm), and we run all our webservers on Linux where I work. However, we know, when it comes to enterprise-level databases, it just won't cut it, and for that we have a Sun with Oracle 8, because we know it will stand up, and if it doesn't, there are people to sue.

    My .02
    G-funk
  • by Anonymous Coward
    With 1stPostDB running on slashdot you no longer need to worry about failover. Just post all your information to slashdot using the 1stPostDB and let the andover.net guys take care of the rest.
  • by Anonymous Coward
    Actually, I've been doing a lot of work recently on trying to get linux db's to scale. The simplest solution, if you have a static db, is to just mirror the contents across several machines, running (depending on your needs) MySql or postgresql. Then, the linux virtual server works great to load balance across your new semi-parallel db. If you have changing data, such as syncing several machines, things get harder. LSDProject.sourceforge.net has one solution for syncing multiple DB's, by executing SQL statements in parallel, and logging failures, but more work is needed in this area (and if you want to help that project, it'd be appreciated). There is no solution in the class of oracle parallel server, etc, but if your needs are more on the reliability side with simple queries, the linux virtual server and mysql will outperform oracle, especially taking price into account. -Sean Ward (forgot my slashdot login ;)
  • by Anonymous Coward
    Since I saw that many people are talking about transactions, I wanted to give a short view of what actually a transactions. People who are really interested should refer to the before mentioned book of Gray/Reuter "Transaction Processing" or even the original paper by Härder/Reuter (1981) which introduced ACID transactions. Since when you'r into computer science and databases and are talking about transactions you probably mean ACID transactions. ACID stands for: ATOMICITY: trannsactions either succeed (commit) or fail completely CONSISTENCY: a transaction transforms the DB from one consistent state into another (since transactions are atomic, the DB is logically always in a consistent state) ISOLATION: transactions are processed without the context of other transactions (logical one user system). Thus they are not interfering with each other DURABILITY: changes caused by transactions are durable (that's where recovery, logging and so on comes in) In order to ensure those properties a lot of research has been done and one can say that all basic things have been implemented. The hard part is of course efficiently implementing it, e.g. using B-Trees (or the new UB-Tree) for accessing and storing data efficiently. Logging and recovery mechanisms for ensuring durability. Locking mechanisms for isolation and also atomicity..... A database system actually offers almost the same depth as an OS, since it needs physical storage management, indexing, caching, query processing & optimization, parallel processing, distributed physical storage, distributed query processing, logging, fail saveness... So you might imagine that implmenting a good DBMS is probably the work of hundred of work hours. An effort which is almost comparable with creating Linux itself...
  • by Anonymous Coward
    I faced a similar problem about one year ago.

    I was annoyed with my proffesional collegues using Oracle 8i etc (excellent products) but not setting them up to facilitate their features. The 'use Oracle and don't worry about our databases fullstop' attitude is everywhere.

    Anyway, to the point:

    I used PostgreSQL with an additional layer underneath my application which done all selects on the same node as the process and all other operations on all nodes. There are a few tricks to getting this done, such as counters etc need to be setup to be multiples of, say, 100, with the cluster number added to that number. (this means if things screw up, nodes can operate independantly as much as possible then sync up when possible). I hope this makes sense, it's late. Other hacks, such as code to sync/check integrity etc databases after failing are relatively straight forward.

    So, the bottom line:
    Performance over n nodes only as good as one node (_slightly_ worse).... but then, hardware's cheap...

    Reliability: We have had all sorts of things go wrong, but at the end of the day, when the macines are live again, we have never had a problem.

    Im no expert by any stretch of the imagination, but with continuous incremental backups etc and a little common sence we are using PostgreSQL with a $$$ (8 digit) online business, and I can sleep at night, even though im personally responsible for the systems.

    PS: Total expenditure on software for the system: $0 (we have/had the $ to purchase expensive software; OS solutions just came out on top)
  • It has been my experience that postgres is just hideously slow. After removing "time travel" from the database engine, things improved greatly but postgres is still one of the (if not the) slowest SQL engines available. However, postgres is one of the most complete SQL implementations you can find (free and in source form.) It's not a speed demon, nor was it designed to be -- make it work, then make it fast.

    Let me follow up to that.

    Once upon a time I spent a year and a half developing applications for Illustra, the commercial follow-on to Postgres. Illustra had the same features as Postgres, plus SQL-compatibility, plus all of the standard SQL datatypes, while still including all of the ORDBMS features of Postgres. They even had support.

    What Illustra didn't have, couldn't have, and what Postgres will never have is speed. The Object-Relational data model is not as efficient at doing plain Relational data work -- it can't be. Using Illustra/Postgres for normal relational databases is like trying to fight Mike Tyson with one arm tied behind your back -- you're playing the other guy's (in this case Oracle/DB2/Sybase) game, and giving him points to start out. Why is this important? Because all of the database APIs (SQL, ODBC, JDBC) only understand rows in tables with cursors, and all of the back-end reporting engines only cope with relational tools.

    Illustra and Postgres are much more interesting for storing object data ... but the only way to take advantage of their capabilities is to use their special APIs. At which point you've thrown away portability and standards, and you've bought yourself a proprietary (even if it's open source) solution. It may be the perfect solution to your problem, but like a black hole, it will warp everything around it. Take replication -- replication is hard for ordinary RDBMSes to implement reliably; when you add object IDs to every row of every table in every database, you multiply the problem, since you can't just copy the rows directly from one database/table to another, for the same reason that you can't easily serialize a C structure containing memory pointers -- all of the pointers (object IDs) will be wrong.

    I should really be talking about Illustra in the past tense -- the company got bought out by Informix several years ago, and Informix folded most of Illustra's technology into their Universal Server database. Dr. Stonebreaker (the creator of Ingres, Postgres, and Illustra) went on to become the VP of Technology for Informix.

  • Hey now, I'd never work *with* databases, but working *in* a database can be quite neat - in many ways it's somewhat like working in an operating system since you handle so many things (paging, data storage etc) at a low level.
  • Actually, they are spinning off an entire company around it and I would not be surprised if they go for more money soon. There could have been some problems before they came to that decision, but I've emailed with their president and they seem to have a clear set of goals in sight now.

    I doubt it was K&R style code - it was probably written for the Borland C compiler and it's been on Windows for quite a while. It is certainly possible that it might have some ugly code. We can fix that.

    Thanks

    Bruce

  • This post is, albeit to a lesser extent, in the same pointless vein as the one about buying Iridium and "open sourcing" it. Things that are boring, don't impress your friends, or don't scratch an itch typically will not exist open source. Programming RDBMS is (I imagine) highly theoretical, scientific work, and while a few CS wizards out there might get a kick out of it, such a project would never reach the critical mass needed to sustain development. It just doesn't meet any of those criteria.

    At times people seem to get carried away with the Open Source phenomenon. We need to remember to take a step back and realize what an incredibly miniscule blip on the radar of innovation that OSS is, and that for a good three decades conventional development models produced some great things. Relational database code is one of them.

    --
  • MySql is neither Free nor is it (yet) an RDBMS

    Actually older versions are GPLed. Newer version are free enough IMO: you can modify, redistribute and use at will, you're just not allowed to sell them for a profit (as in selling a product that would only work with it). So for me it's Free, if not in the general sense. Then, Monty in an interview hinted that he was trying to have it fully FREEed (Open Source'd).

  • I am designing a data-mining system at work, and I believe that this far surpasses Linux's ability to scale. The problem is, Linux only supports files up to 2GB in size(4GB with reserf), and MySQL only supports 4GB tables. I estimate that several tables will exceed 20GB in size, and breaking them up into smaller tables will be highly inefficient. We've just pushed Linux as far as it can currently go.

    So A big Sun box and Oracle are going to be the back end. Even Oracle on Linux runs into the filesize limit, because of the lack of raw device access in the kernel.

    Sometimes, free software isn't the way to go. You have to pick the right tool for the job.
  • we know it will stand up, and if it doesn't, there are people to sue.

    Please re-read your licensing agreements with these respective vendors. You can't sue them, you can't even think about suing them. With oracle, I think they can actually have all of your children killed if you think about suing them and they find out.

    It's amazing how long elementary FUD like this can abound, especially after it's been refuted countless times in OSS diatribes. Who do you know who has ever sued Microsoft for lost data and won? Have you ever heard of Sun getting busted up over a hardware failure? It is EXTREMELY difficult to prove that the system failed in the first place (who's to say you configured it correctly? not the experts from Oracle, that's for sure), and even if you can, you agree to a LOT of things when you put that software on your server.

    I'd trust Postgres against Linux any day (and YES, I do work at the "enterprise level") simply because at least they're straightforward about there being no-one to sue. If I really needed to get hard-core (not too many machines, lots of RAM and processor, zero tolerance for failure) maybe FreeBSD would fit the bill.

    Most likely though, I would be averse to anything that involved x86 hardware for real reliability. So it would probably be NetBSD.
  • Now way back in the glory days of that last great war, way back in 19 and forty something, there were two developments which have been entangled down to this very day, causing no end of grief while each tries to ensnare and evade the other. I am referring, of course, to the digital computer and the Office of Strategic Services (OSS), which has morphed into the current CIA and NSA.

    I put it to you: is it just a coincidence that we have here a claim that Micro$oft coined the term? Or is there some deeper conspiracy that has been in progress these many years?

    Or is it just another example of Micro$oft falsely claiming innovation?

    --
  • You're looking for InterBase. While it's not yet Open Source, IB 6.0 will be released under MPL 1.1. While they sanitize their code, you can get free (beer) beta releases of InterBase 6.0 for Linux, Windows, or Solaris from http://www.interbase.com [interbase.com] at http:// www.interbase. com/open/downloads/60beta_downloads.html [interbase.com]. There are RPMs and Tarballs there, and Debian packages are forthcoming.

    You can also check out the community springing up around InterBase at http://www.interbase2000.org [interbase2000.org].

    From my own personal experience with InterBase, I can tell you that it is most certainly an enterprise-grade database, with amazing speeds and all the features you could ask of a modern DB.

  • I have to agree with the previous poster that Postgres is the best open-source RDBMS. I switched from MySQL to Postgres some time ago, MySQL just didn't have the feature set that Postgres does.

    Postgres seems quite robust, but slower than MySQL. If you can live with the slight speed hit for the cool features (views, and stored procedures come to mind) it's well worth it.

    Postgres works great with PHP, btw. :)

    --

  • firstly (fûrstl)
    adv.

    In the first place; to begin with.
    Usage Note: There is ample reputable precedent for using both first and firstly to begin an enumeration: Our objectives are, first (or firstly), to recover from last year's slump. Whichever is chosen, however, consistency is best served if any succeeding items are introduced by a parallel form, as in first . . . second . . . third or firstly . . . secondly . . . thirdly.


    a) My usage is far from incorrect, though in more formal writings i'll frequently use other forms.
    b) Slashdot is a casual discussion forum; colloquial language is generally fine, so long as the writing remains coherant.

    Nonetheless, I generally agree that the bulk of slashdot writing is piss-poor, even for this forum.

    ...gotta run
  • Irrelevent question. Since it's near impossable to know ALL the hackers (# can change at any time), and their respective itches. It would be an impossible question to answer. Pro or Con.

    It might be impossible to answer with absolute certainty, but nothing in this world is absolute. Would you say it's similarly impossible to know with any real certainty that a group of, say, medschool grads from Harvard are going to engineer a chip that can compete head to head against the latest offerings from Intel (et. al)? This of course, is just a random example, but you should get my point. You can accept perhaps one or two of them having the desire, but the odds of a whole community of them getting together to create something quite complex, and outside of their range of experience, for a long duration, are very slim.

    That being said, I agree with your later point.
  • You can put together systems that are as efficient, reliable, full-featured, and robust as Oracle or DB2 out of open source software. But it's a bit of work, it requires that you understand exactly what you are doing, and you are going to end up with a custom solution that you have to document and maintain yourself.

    What sets Oracle or IBM DB2 apart is that they have large user communities, and that means a lot of people trained in using them, a lot of third party add-ons, a lot of "how-to" books, etc. The real selling point of the commercial enterprise level RDBMS systems is not software, it's market position and widespread use.

    Whether that kind of support is something that matters to you enough to pay the steep prices that those systems cost, you have to decide for yourself.

    The only system that comes close to Oracle or DB2 in terms of popularity is MySQL. If you can live with its limited feature set in your application, consider using it.

    In the long run, hopefully some open source database will become as popular as the big commercial guys. The most likely path to that is no an Oracle clone, but the development of a new kind of database paradigm and popularizes it with an open source implementation.

  • What you want is atomicity. However, rollback sort of goes hand in hand with it. If you try to group together a select and a delete from one table with an insert into another table and your insert should fail for any reason, the dbms must roll back the select and the delete. While I agree that transactions (atomicity) would be nice the developers will need to implement at least an internal rollback mechanism in order for it to work. I'm sure it's been mentioned here already but do check out PHOTO.NET [photo.net]. Philip explains it all and he's funny too.
  • Hmm, an SQL database without GROUP BY.

    bzzzzt. . . . next . . .

    -------
  • Red Hat 6.2 (so bloody new you can't even get it in the store yet, I'm probably going to suck mine down the bitpipe tonight or tomorrow) has Piranha, which supports 2-node failover functionality. It's not Qualix, but it's a start.... and Red Hat is supported.

    So, we have an enterprise-class RDBMS on a world-class operating system with RAID and failover capability, and several world-class organizations supporting it. Sounds like a Real World Solution to me.... So it doesn't scale into the terabyte range. How many folks really need that? and how many folks are employed by small businesses?

    --
    "We came, we saw, we kicked its ass!"
    -- Bill Murray, "Ghostbusters"

  • > Slashdot is filled with different ppl very opinionated

    Yeah, I noticed.

    > Linux is not where your major enteprises turn when they need 24/7 huge scalable solutions.

    Some [netcraft.com] do, though I suppose you could define "major interprises" to filter them out.

    > this according to Gartner Group

    O Bastion of Bullshit. And PHB Fertilizer.

    You're right in that some of the commercial solutions are still better. How long that will last, anyone can tell. But what makes you think OSS can't provide a RDBMS that is at least useful, and useful for lots of people?

    > It takes more than opinions to form valid statements but it takes less than an minute to spout something on Slash.

    [This tart retort intentionally left blank.]

    --

  • Apple's Enterprise Object Framework is the best way I've seen. There's also the free GNUStep alternative GSDB. The downside with the GNUStep version is that there aren't a lot of drivers (postgress, sybase, and MSSQL at this point). EOF/GSDB completely abstracts everything and presents a true OO view of your data with no queries, transactions, or anything to worry about, the framework handles fetching and resource management.

    Excluding that, your best bet is to keep operations on your data source sperate. Thing's like Perl's DBI are just a API abstraction and not a Database abstraction. The way around this is to seperate out all your data fetching functions so that all your app does is call things like fetchCustomerList(start, end). That way to switch DB's you just create a new DB module for the new DB and your application could care less.
  • I gave up dB work a year or two ago. Maybe it was starting with MS Access and enjoying its compact, RAD, toy quality. Unscaleable and inapproriate for Enterprise stuff of course (and making no pretence to such garlands), but fast, small (well, the .mds were small if you compacted them) and fun.

    The problem was the inevitable upgrade lead me to MS SQL Server 6.5 (I was stuck on NT at the time). SQL Server had a lot of stuff I was missing in Access - triggers, stored procedures, scaleability - but it also brought a lot of frustrations. The domain aggregate functions were poorer than those offered by Access, which was a pain as I was trying to roll my own OLAP [dbmsmag.com] before it all got proprietarized into a Babel of different buyouts and skill-subsets. Its big-iron feel didn't stop it having a ludicrous 255 byte limit on varchar fields. For bigger you had to futz about wastefully amalgamating BLOBs of text with READTEXT/WRITETEXT. Plus it was grotesquely high-maintenance. I didn't want to become a DBA. I just wanted to hack SQL. And wasteful. The 'devices' ('Honey, I bloated the database') were huge and couldn't be shrunk, no matter how svelte the actual data.

    By the time I escaped the NT shop I was naively looking to Oracle to save me from these frustrations. Unfortunately, I'll never know the joys or horrors of that particular 'platform', because at 600 Mb for the Linux installation I just bailed out and cried 'Enough of this grotesquely bloated crap!' and pursued XML or BerkeleyDB solutions to anything remoteley persistence-flavoured thereafter. I knew that fast and small were synonyms, but the vendors were growing fat on the antonym line and there was nothing I could do about it. Even MySQL [mysql.com] and PostgreSQL [postgresql.org] were part of the problem. They're all emacs. None of them are vi.

    Recently I ran my periodic, wishful, wistful Google-grep for 'fast', 'small' and 'rdbms' and found myself, after rejecting Brian Jepson's [uwinnipeg.ca] TinySQL [ids.net] as ridiculously small and cute but strictly pedagogical, finally discovering The One.

    Hypersonic SQL [www.lynx.ch], a tiny Open Source Java database weighing in at less than 100K, supports correlated subqueries, transactions, referential integrity, indexes, stored procedures and JDBC - everything basically, but GROUP BY, cursors and triggers. I never used cursors myself. I'd rather iterate in Perl. The other two, admittedly are fondly missed, but not life-threatening. It doesn't support failover. But with such a small, developer-friendly codebase anything's possible.

    Did I say 'Perl'? 'But it's a Java database', I hear you cry. How can this beast talk Perl? Well, it can't, which is why I'm working on a Perl DBI interface to it talking to a native driver over TCP/IP. If anyone wants to contribute (I wouldn't need to hack it if some brave soul wants to polish up the languishing JNI [cpan.org] module in JPL [oreilly.com] to support embedding Java in Perl on Linux (currently it only works for Win32)), please get in touch. I'm almost certainly way out of my depth and entering a world of pain.

    Oh, did I mention? It's 7 times faster [www.lynx.ch] than SQL Server.

  • You need DBD::Multiplex, which is part of Perl's
    DBI. It allows you to do round-robin connections,
    among other things.
  • I don't think you're comparing apples with apples Tim. Postgresql will have V7.0 released way before 3 months time at which point the SQL should be equal or superior in standardisation to Interbase. OTOH, who actually knows where Interbase will be in 3 months? It might be nowhere fast.

    And in terms of "advanced", Interbase will probably never catch PostgreSQL. PostgreSQL is the king of features in RDBMS with its ORDBMS features. Interbase may one day be faster, but more featureful or advanced, I doubt.
  • Frontbase looks pretty nice - I'd never heard of it, though. Although... it does look a lot like some product the OpenLink ODBC people had. Is it related in any way?

    I'm not too sure about the benefits of full ANSI 92 compliance.

    For sure, standards are a good thing, but if no one supports them, what's the use?

    For instance, take Sybase & Oracle & SQL Server. For simple select, insert & update queries the SQL is farily portable. Once they get a bit complicated you'll probably need to do some fixing to get it to work.

    But that isn't the major problem with porting between databases. The real "problem" is the way different DBs have different strengths.

    For instance, on Oracle you use cursors all the time, and they work really well. They are fast, efficent and a great way to do some thing.

    Try porting them directly to SQL Server or Sybase, and your server will die really quickly - they just don't support the same number of similtanious cursors that Oracle does. Even if they were all ANSI SQL 92 compliant, it won't fix problems like that, which are much more difficult to fix than a few syntax differences.

  • What is the difference between cold standby & a database server with replication facilities?

    I'm thinking of Interbase & Replication, or even Interbase and the database mirroring it has built in. Is there anything else needed for it to be classed as cold-standby at least?

    I'd think it would be pretty close to warm standby if you used it with some kind of heartbeat monitor, wouldn't it?

    I agree there is a fair way to go before Interbase supports hot standby, though.

  • Interbase has been available on many non-windows platforms for a long time (15 years+) now. On Win32, they use MS VC++ to compile it.

    The short answer for why MSVC was used for InterBase is that we wanted to keep options open to compile the same source on NT for Intel, NT for Alpha (if market demand warranted) and NT for PowerPC (when that operating system existed). Borland compilers only support Intel. Adding compiler-specific #ifdef differences for different flavors of a single operating system seemed inelegant and needlessly complex.

    Bill Karwin

    The code was never a mess like people have been saying, but it (apparently) has a fair amount of conditional defines for various platforms. That's the stuff they are fixing now, adn why it is taking so long.

    This is an email Ann Harrison (president of Interbase) posted on a list back in January:

    >Subject: Delivering sources > > >As the opening of the source gets closer, I find myself >wondering more and more about the details. > >Specifically, where to make the tradeoff >between clean code and code now? > >At the moment, there's a single source for InterBase >with magic in the source control tool to deal with >differences between the various environments. The >most significant of those is the use of $ in names. > >Should NewCo make a pass through the code once to >so that the engine is actually, rather than almost, >one code base? Should they go a step further and >use GCC for all platforms? > >At one point, some of the compilers used to build >InterBase did not accept ANSI prototypes. As a >result, every routine has conditionalized prototypes. >They're really ugly and completely useless. Should >NewCo remove them before releasing the code? > >At various times, InterBase has been ported to >platforms that are no longer strategic - like >HP MPE/XL. Should NewCo remove the conditional >code for those platforms? Just the really ugly >ones? > >High level internals documentation is almost >non-existent. Complete documentation would >be (much) larger than the code, so that's not >likely to appear. Should NewCo take the time >to write module by module documentation? A >~30 page overview? > >As you think about these questions, please remember >that you (the knowledgeable InterBase developer) are >not the only user of the source. It's important >that the code that's released be buildable by humans >on all the platforms it runs on. > >Appreciate your thoughts, > >Ann

    Sorry the formatting is so screwed up! I hope it clears up some misconceptions, though.

    BTW, it's good to see you're not so cynical about Inprise any more, Bruce. I was their sole defender on Technocrat when you posted that story on them a couple of months back.

  • I've done a fair bit of stuff on SQL Server, and while it is limited in that it can only run on NT, it is so much better than Sybase (for instance) that it isn't funny.

    For instance, I'm using Sybase Adaptive Server 11.92, and guess what? No row locks, non-ANSI join syntax, and you can't seem to have outer joins where you filter the joined table by another (non-join) field.

    SQL Server has wonderful development tools, too - like Query Analyzer. After spending the first year and a half of my working life trying to work out Oracle query plans, MS Query Analyzer was such an amazing revalation.

    You are right that there a lot of (surprisingly) big enterprises relying on SQL Server. I think version 7 was a big step forward in this area.

    SQL Server still can't match Oracle on joins of more than four table with lots of data, though - of course, not much can really.

  • Yeah... I know what you mean.

    They do seem to be very dedicated, though.. they listened to the community on the licence and everything.

    I doubt there will be any problems, but it will be nice when the source is out.

  • Interbase [interbase.com] is a RDBMS, and it is free (beer) now and will be Open Source (MPL) soon (June).

  • I'm pretty impressed! It seems a pretty nice bit of work - joins seem to work okay (I didn't try any out joins, and there is no ANSI 92 syntax, though)

    I think the benchmark is crap, though - simply testing how fast you can create a table and an index, and then do an insert might be fair enough, but testing select speed by doing a select * from that table is a bit of a joke. I'd love to see how it performed with a 4-way join and lots of data.

    That shouldn't be seen as a flame, though. I'm really, really impressed with this as a whole.

  • By the way, "OSS" is an acronym coined by Microsoft, and one I've never liked. Can we please call it "Open Source"?

    If we don't have an obnoxious Corporate eXecutive iMission Critic@l eAcronym (oCXiCeA) for OSS, then OSS is not suitable for obnoxious Corporate eXecutive iMission Critical eTasks (oCXiCeTs). Several Corporate eXecutive iMission Critic@l eConsultants (CXiCeCs) have demonstrated that OSS should be deployed in the eCorporation within several quarters, or iTimeUnits as they are now called. This will enhance our VC-funded eCorporation's rapid execution of oCXiCeTs.

    Furthermore, numerous CXiCeCs have given me an executive summary about OSS, from which I have learned that because OSS is all capital letters, it will fail to succeed in eBusiness, E-commerce, and/or iWare deployment. One eConsultant suggested a paradigm shift to oSS, while another one suggested O-ss. But the latter was actually an E-consultant and not an eConsultant, and he hardly ever uses the IT acronym, so I called an eXecutive iMeeting where we all met in a conference room where we reached a consensus that the so called "E-consultant"'s credibility is lower.



    Seriously, I agree with Bruce. "OSS" is highly obnoxious. Don't use it.

    So is eAnything, E-anything, iAnything, and many more. Terms like those reek of marketing, rich presentation (read: content free tripe), and all those other things which make the web, computing, and many other things less fun. (Maybe Bruce agrees with that point too, but I wouldn't want to speak for him. He's been misrepresented enough as it is.) Whenever I see eFoo or E-Bar, I turn the other way as fast as I can. You should too.

  • There is replication code available for Postgres at ftp://ftp.sdc.com.au/pub/repl/ I've never used it myself, but they mentioned it on the general list a while ago.
  • according to numbers I saw on the web Postgres is about 20% slower than Oracle for TPC tests.

    Not very far off, especially when considering that Oracle probably optimizes their DB specifically concerning benchmarks.

    Have no link, sorry.

    Oracle et al however have some edge or two regarding reliability features (and features in general).

  • I have a similar solution. I'm using mysql to develop a large-scale commecial website with gnujsp [klomp.org] and apache jserv [apache.org].

    My problem is this:
    This is a database of commercial real estate data. There is really only one main table, but it has lookup fields that reference as many as 7 other tables, most with funky outer joins. Performing even simple queries on it is horrific.

    My hack is this:
    Every night, at 3am, do a GIGANTIC 7-way merge of all the relevant tables, and insert the results into a MAMMOTH SINGLE TABLE, the cache table. Lookups are lightning fast, and the data is at most 24 hours old (this is reasonable in my case). Building the cache actually only takes 15 seconds or so --- abominable if you're waiting for a web page to load, but no so bad at 3am :)

    This technique may or may not be available to you, depending on your circumstances. I find it to work quite well with mine.

    - jonathan.

  • Not all DSS environments are/should be read-only. My employer [retek.com] deploys one of the largest (if not the largest) read-write DSS client-server application.

    Transactions/Rollback are not critical (we have our own rollback solution over Acumate, made by Kenan [kenan.com]) but in a read-write environment, they sure are good to have around.

    OTOH, you're right; in a DSS environments reads are always way more common than writes, so a rollback layer could actually be preferrable to a built-in transactional functionality.

    So, forget about transactional control, it's not even essential; are there *any* OSS OLAP apps out there? they sure would make a nice toy ;-)...

    engineers never lie; we just approximate the truth.
  • While I agree with almost everything you said (including the part about Linux being an excellent imitator and a rare innovator), I am tired of seeing the same premise being used again and again wrt OSS: you don't necessarily need a buncha 'lone hackers' hacking away at a problem.

    Let's suppose that a fairly good-sized company came out with an open-sourced RDBMS/OLAP framework, that has all the groundwork to be fast and extensible. I am sure there are a few companies out there that would benefit to contribute to such a project for their own uses... Why can't OSS act as a framework and collaboration scheme, instead of just a toy for hobbyists?


    engineers never lie; we just approximate the truth.
  • This would have to be VERY specific to the application in question, if they're doing trivial failover. Database connections of all kinds keep a WHOLE lot of state around. If you want transactional failover, you're going to be hard pressed to do it without a full-blown 2PC implementation.

    Imagine the following scenario: You, client C, initiate a service/transaction with server A. Server A crashes. Your service/transaction fails over to server B. However, server B knows nothing about what's been going on in your stateful connection, and is thus unable to help you.

    The only solution that you can have within a database for failover is in the case when you have a stateless connection, and/or full state replication across the cluster of server machines. However, this alone is going to almost certainly bog down any of your transactions to the point where you don't want to continue anymore!

    A company which has done this extremely well is the old Tandem, with their NonStopOS and NonStopSQL. Their mastery of failover in transactions was superb, and quite a bit of wall street is still running their stuff, even though Compaq has been quite unable to successfully market their services now (NonStopNT.... that's a complete oxymoron!).

    I suppose you COULD implement the type of service you're talking about with failover stateless connections, but if you're going to do that you're much better going with an app server in the middle, which handles failover of connections in an EJB type model, and use the features of the app server to handle this.

    So to sum it all up: For simple services like RPC and HTTP and FTP and whatnot, which are stateless (or very nearly so) by their nature, simple failover is fine. For extremely stateful services (like a database system), you've got to use much more complicated measures.

  • How about an Open-Source Object Database? I've been fooling around with Cache' [e-dbms.com] on Linux for a week or so, and it seems nice. I think OODBMSes are the way to go for web applications, especially ones that make any use of XML. Any Open Source ones?

  • Question... RedHat 6.2 claims "Generic service failover support. High availability support is no longer available only for web and ftp servers. Now you can have virtually guaranteed service from any of your servers including mail, pcp, and dns."<P>

    While they don't mention database servers specifically, "Generic service" indicates that they've figured out some way (probably similar to watchdog coupled with some form of ARP spoofing) to fail a down server over gracefully. Which leaves data replication.<P>

    Would this work in conjunction with a simple mirror program? What I'm thinking of is... <P>

    A small daemon that runs (and listens) on the 'master' box. When a database connection is initiated it reads the commands sent by the client and sends them on to the 'master' database server, and again to a secondary server.<P>

    Both the master and secondary server have current copies of the data, and then leave the actual failover to whatever RedHat has concocted (still downloading the .iso, so this may or may not make sense).
  • It has been my experience that postgres is just hideously slow. After removing "time travel" from the database engine, things improved greatly but postgres is still one of the (if not the) slowest SQL engines available. However, postgres is one of the most complete SQL implementations you can find (free and in source form.) It's not a speed demon, nor was it designed to be -- make it work, then make it fast.

    MySQL is certainly fast as heck, but it's still missing alot of functionality one takes for granted these days. (Transactions being one of them.) Alot of people (myself included) use MySQL for business class stuff.

    As the original question was one of "it cannot fail, EVER"... you're going to have to buy that level of security. There are several commercial databases for Linux that will do what you need -- they are not free (save Sybase.) But I suspect, you aren't concerned about "free". (security, speed of bug resolution, etc.)
  • Umm, I'm an Oracle DBA, is that considered "enterprise level"? I read /. every day. Oracle is an excellent product, but I do think that it would be great if there were an open source alternative. At work, we are currently sitting around waiting for Oracle to release some bugfixes that impact our application. If it were open source, it would probably be fixed by now. Plus, imagine all the neat features you could add!!! I would definitely get involved with that project, if it ever came about.
  • Let me get this straight. So you're saying that anything that requires an "enormous amount of time and effort" and is "quite a challenge" can't be done by traditional open-source methods? You mean, for instance, an operating system? [snipped open source zealotry]

    Enterprise database applications are subject to FUD that runs far deeper than what we see in the desktop OS competition. I implement enterprise datawarehousing solutions for a living, and I see that most clients will ONLY accept an implementation that's built on the RDBMS they already use in house.

    You can open source the operating system, or the desktop environment, because your test bed starts out as hobbyists (the early adopters). Good luck finding the first company that will take the plunge and bet their entire transactional system on an untested platform, when they can pay for the insurance of an Oracle or a SQLServer.

    -JTB

  • Let me get this straight. So you're saying that anything that requires an "enormous amount of time and effort" and is "quite a challenge" can't be done by traditional open-source methods?

    You mean, for instance, an operating system [kernel.org]?

    Or maybe a graphical desktop environment [kde.org] or two [gnome.org]?

    How terribly shortsighted. I'm sure there are many people out there who are up to the challenge.

    Besides, what you're basically saying is that open source projects don't scale very well, and all of the above examples disprove that.
    ---

  • In fact, I'm not 100% convinced about the relational model (with it's SQL) being the best one for using at enterprise level, there should be better solutions for the 100.000% availability problem.
    If not relational, then what? Object-relational? Flat-file? What technology exists that can perform the same range of tasks as a rdbms with the same level of efficiency? O-R is still in it's infancy, and is very much a special-purpose tool. Flat-file does not scale well except for some specialized cases.

    RDBMS is a very mature, flexable technology. SQL (& RDBMS) does have some shortcomings, but then every tool does. For many (most?) typical business applications, RDBMS is the tool of choice.

    Because modern RDBMS's are so pervasive and accessable, many people forget that designing databases and writing SQL is a specialized and difficult art. A well-designed database can outperform a poorly-designed one by an order of magnitude or more. When people talk about RDBMS's having poor performance my first instinct is to question the db design and not the engine. Sadly, in my experience, most programmers treat the database almost as an afterthought. (I design databases for a living, in case you havn't guessed)


    "The axiom 'An honest man has nothing to fear from the police'

  • DB2 should definitely be added to the list of "bet the company's jewels" RDBMS products. This is the database of choice for many banks and financial institutions. BTW, the company that provides data processing for roughly 40% of *ALL* mutual funds stores *your* data in DB2 (mainframe flavor) -- I used to work for them.
  • Erlang (www.erlang.org) is open-sourced sw that Ericsson use internally. This has a "production quality" DB called felicitously mnesia. It has failover, replication, very easy to use transactions, ... It's not relational but the programmer's interface is very easy. There is an rdbms front-end advertised in the erlang contrib area - don't know how good that is.
    Erlang deserves to be better known: it is one of the biggest and best open source projects.
  • It works, it's not beta i've used it on a cluster of 6 machines. Course it's going to work for me since i wrote it. The mysql guys are working on replication support themselves.
  • by Anonymous Coward on Monday March 27, 2000 @03:15PM (#1166741)
    Interbase has great features such as database shadowing (to help with failover safety), online backup, is self "cleaning", is hands of (administration wise), is an absolute cakewalk to install, is fast as a bitch, has stored procs, views, triggers, decent security *and* supports UDF's that can be written in C/C++ or Delphi (so probably Kylix) and most of all is currently available for Linux in non beta, v 5.x, and in beta 6.0.

    It also is ANSI SQL-92 compliant, as far as I'm aware, as well as adding a couple of nifty extensions to the language that negate the need for looping through cursors (for those who have used Sybase/MSSQL).

    I have used Interbase on several Windows projects and it scales brilliantly from the single user desktop version to the server.
  • by Anonymous Coward on Monday March 27, 2000 @01:53PM (#1166742)
    You're not going to find open solutions that come close to these products, at least not yet.

    Added to which, you can get excellent support (at a price) for these products, which may ultimately be worth more to you than anything else.

  • by Eric Green ( 627 ) on Tuesday March 28, 2000 @03:46AM (#1166743) Homepage
    Failover generally relies upon transaction logs and when machine A fails, uses machine A's transaction logs to bring machine B up to date. The result is that machine B is current as of the data that machine A failed.

    Replication generally does not take place on as current a basis. Replication can be done with any currently-existing RDBMS simply by adding a "last changed" timestamp to each record, and then occasionally querying out all records before a certain timestamp and sending those records to the remote server. Replication thus ensures that you have a snapshot of the data at some point in time, but not necessarily an up-to-date version of that data.

    Those differences are why replication can be done by third-party software, while "real" standby requires specific support at the RDBMS level. I have done replication with technology as crude as dBase II by explicitly programming it into my applications. Standby, on the other hand, is a different issue.

    Finally: For small businesses, none of these particularly matter. Small businesses don't have the money for big replication servers and such. They rely on an older version of replication: they have a paper copy of everything that goes into the computer, and make nightly backups (small businesses, unlike large businesses, don't run 24/7 and thus the RDBMS is stable at that time of night, they don't need the ability to do "hot" backups). If the database crashes and eats their data, they restore it from tape, then manually type in any transactions that weren't on the tape (using their paper copies). PostGreSQL is already "there" as far as small businesses are concerned, from a price/feature standpoint. PostGreSQL's only real issue is performance, especially performance when adding records, which is pathetically slow (using transactions helps, in that at least you don't get a sync for each "insert" but rather a single set of syncs at the "end transaction" time, but it's still less than 1/3rd the speed of MySQL even if you turned off the syncs altogether and ran it in the dangerous "async" mode). However, most small businesses don't need to add 200,000 records in a short period of time... I can add 200,000 records in little over 3 minutes with PostGreSQL on modest hardware, which is pretty pathetic, but far beyond the needs of a small business.

    -E

  • by Eric Green ( 627 ) on Tuesday March 28, 2000 @04:03AM (#1166744) Homepage
    PostgreSQL's query speed is reasonable, within 10-20% of Oracle's and "close enough" to MySQL's. Postgres's insert speed is hideous. I benchmark it as less than 1/3rd of MySQL's speeds on inserts.

    Much of Postgres's reputation for slow speed stems from attempts to use it as a CGI back end database. PostgreSQL, like Oracle, takes a long time to open and close connections. This is because PostgreSQL spawns off an entirely new process for each connection, complete with checking the user permissions etc., while MySQL spawns a thread, which is considerably faster to do but potentially less reliable if a thread dies due to a bug in the RDBMS (ask Microsoft what happens when a thread dies without releasing any locks that it has! Ever wonder why Windows dies so often?!).

    In short, if you want to use PostGreSQL as a web back end, use a persistent connection -- do NOT open and close the database for each CGI transaction!

    -E

  • Inprise decided that the way for their database to remain competitive was for it to be Open Sourced. The thousands of man-hours have already been invested, but they think the community participation is important enough that they are opening it now, after that work has been done. Interbase is a rather powerful enterprise database, maybe not as powerful as Oracle yet but I think the Open Source folks might be able to help that.

    By the way, "OSS" is an acronym coined by Microsoft, and one I've never liked. Can we please call it "Open Source"?

    Thanks

    Bruce

  • Hey I'm developping a B2B exchange portal as well. I'm using MySQL. But damn, the thing sucks for everything more than simple selects. Plus selects are sometimes NOT optimised at all. Computing some basic stats on the content of the DB brings it down to its knees. I had to add another server duplicating the content in a very hackish way so that users would'nt be slowed down to a crawl every time we had to compute the stats (often). Duh.
  • by costas ( 38724 ) on Monday March 27, 2000 @01:47PM (#1166747) Homepage
    I've done some research on this topic, and by far the most open/tested OSS RDBMS out there is PostgresSQL, which is actually based on the same tree from which a commercial RDBMS, Ingres, was spawned. However, I do not believe that Postgres is solid/fast enough for enterprise level applications yet (mind you, this is from reading a lot of Usenet and Web opinions on it, not personal experience, so YMMV).

    Since I am a DSS ("Data Mining") guy though, I am more interested in an extension to your question? is there any decent OSS OLAP/DSS DB system out there that can handle multiple dimensions, transactions, partitioning, ? If not, how many people would be interested in one?

    engineers never lie; we just approximate the truth.
  • Christopher Browne has a good reference page for this. I stumbled across it while researching possible database solutions for one of my projects, and it seems to be well maintained and updated. The page can be found at: http://www.ntlug.org/~cbbrowne/rdbms.html
  • by Falsch Freiheit ( 7780 ) <freiheit AT gmail DOT com> on Monday March 27, 2000 @02:24PM (#1166749) Homepage
    First off, databases look for connections to a port, just like any other service like httpd, smtp or ssh. Doing things so that you have failover and/or load-balancing on *that* is easy. There are a large number of solutions, from application level to library level to dedicated hardware to general purpose software (LVS).

    That said, database failover is *not* easy.

    The basic problem is that you don't just read from a database, you read and write to the database. Database replication is a very non-trivial thing to do. You have to deal with the possibility of data being inserted, updated or removed from any of the servers that are involved. This requires that you have things like global locks, timestamps on everything, primary keys that are generated in a way unique to each replicated version of the database, etc.

    Most of the work of replication is handled very nicely by the big commercial databases (but you will find that they want you to pay for it.), but even with those you might have to make minor changes in your application (or maybe just the database structure) in order to handle replication properly...

    AFAIK, there's nothing you can get for free that will be able to handle any kind of real-time n to n (or 1 to n and n to 1 simultaneous, even) replication. You can do stuff like have mysql dump a log of everything that's done that changes the database, and then import those changes into another mysql instance someplace else; but that's nowhere near what the big commercial databases can do when it comes to replication.

    Given that replication is gonna cost real money, just get your free database running on a good-quality machine with good backups (that you make sure include the database) and maybe things like redundant power supplies. You can make a really robust box for much less than Oracle will charge you to license 2 instances with replication.

    (Or hire a team of 10 database programmers to spend a year or 3 adding replication to postgresql. Well... maybe they could do it wrong in a month -- I'm trying to give a wild estimate for doing it vaguely close to right.)

  • by Dacta ( 24628 ) on Monday March 27, 2000 @03:16PM (#1166750)

    Sorry.. I forgot to talk about failover.

    There is a third-party, non-open source replication tool available fro Interbase 5 and 6 that will handle that data replication stuff you need for failover. The actual server switching I think you might have to write yourself... I suspect you might find there are a lot of people willing to help you.

  • by starlingX ( 67942 ) on Monday March 27, 2000 @02:22PM (#1166751)
    Check out:

    http://www.freshmeat.net/appindex/2000/01/15/94795 2558.html

    It's a toolkit for clustering MySQL, which may or may not provide the failover facilities that you need. It looks beta, but may work... I haven't played with it yet. And of course, this is only if you're willing to use MySQL. Looks fun, though.

    Otherwise, if you're doing serious enough work to need serious RDBMS backend, I hope you're making enough serious dough to afford a serious RDBMS. But before I invested big bucks into a failover setup using a commercial RDBMS, I'd cover my other bases first. Most DB failures I've seen were not a fault of the RDBMS itself, but hardware, power, diskspace, OS, human error, etc. The only time I've seen an RDBMS crap the bed at work, it was Micros~1 SQL Server and you should know better than to buy that. With our Sybase machines, it's always NT that takes down the box, not Sybase. Get these bases covered (hi-quality hardware, reliable power with a big UPS or generator, non-Micros~1 OS, and a DBA that knows what they're doing) and you most likely won't need a failover system.

  • by emerson ( 419 ) on Monday March 27, 2000 @04:00PM (#1166752)
    Operating System design has been a solved problem for about a quarter of a century. 18-year-old kids write OSes as freshmen projects now. There's an astounding amount of freely-available material out there, including the source to several OSes, to get you started on doing such a thing, and there has been for a long time.

    The graphical desktop environments you point out are still in the infancy stage, after 3-4 years of development. Although they're very cool, (as is Linux), both are basically aping existing art, and not very well in some cases. (Go on, moderate me down as flamebait and argue with me; this is not my primary point.)

    RDBMS design is relatively new in the long view of the art of Computer Science. The commercial vendors that pay thousands of people fulltime to develop them still don't have it down correctly. It's certainly not progressed to the point where it's textbook knowledge, where any fresh-faced CS grad can get in there and build one.

    I think the most telling evidence that this is beyond the current state of the art of Open Source, though, is the fact that it doesn't exist yet. It's not like this Ask Slashdot is the first time anyone's ever thought about free alternatives to the big boys of databases. It's a known need.

    And the "Cathedral and the Bazaar" theory of Open Source as 'developers scratching their own itch' would seem to mean that if it COULD be done, it WOULD be done. Intrepid hackers would have taken Postgres or the GPL'ed old MySQL or the like and had a project underway years ago if it were just that easy. Apparently, it's not.

    Open Source projects do not _innately_ scale well, merely from the fact of them being open. There are a lot of preconditions for a project to work. At least these three need to exist:

    1) A critical mass of developers see the need.
    2) That mass of developers (or a subset thereof) can be organized into a team to do the work.
    3) That mass of developers has the knowledge and skills to do the required work.

    As an example, there's a large set of people that would like an open-source engine to crack DES keys in real-time on commodity hardware. That takes care of 1. Getting a set of crypto hackers together to do this work would be a snap, cf OpenSSL et al. There's 2. But, the art isn't there. The general public, the mass of developers out there, simply don't have the knowledge or experience to do that. Maybe it can be done, maybe it can't, that's not the point. The point is that just saying "I'm starting an Open Source project to do [something really cool]" doesn't mean it's something you actually CAN do.

    So, back to the topic at hand, yes, RDBMS is something that's outside of the Open Source world at this point. The skills required to do such a thing are reporting for work at Redwood Shores and Emeryville and the like, and not willing or able to share their really-quite-esoteric knowledge with the free software universe.

    And pointing to a handful of other projects that are working doesn't prove that this project, or any other, will. Look around; the code's not coming, mostly because, no, there aren't many people out there who are up to the challenge. Or else they'd be doing it.

    The proof, as they say, is in the pudding.

    --
  • by mrsam ( 12205 ) on Monday March 27, 2000 @01:45PM (#1166753) Homepage

    This is, perhaps, one of those situations where you are rather unlikely to find an OSS solution. An enterprise-level RDBMS is not exactly an overnight hack. It would take an enormous kind of time and effort to put an enterprise-level RDBMS.

    I'm not even sure that an enterprise-level OSS RDBMS makes sense. If you're talking about an enterprise-level product, you probably have the resources to afford to purchase a commercial RDBMS. Both Sybase and Oracle have decades of experience building robust, rock hard, RDBMSes that you can bet your company jewels on, and both companies have Linux versions of their RDBMSes. I'm not sure about Informix, they may have a Linux version of their database engine too.

    Right now, attempting to deliver an OSS RDBMS that can beat time-tested RDBMSes from Sybase and Oracle would be quite a challenge, to say the least. People will just have to accept the fact that OSS can't solve every problem in the world. It's certainly feasible that one day Sybase or Oracle might decide to release their respective RDBMS engines under an OSS license (which should certainly be quite a shock to many people), that's probably the most likely scenario.
    --

  • by FallLine ( 12211 ) on Monday March 27, 2000 @04:28PM (#1166754)
    The position that any Open Source, and most importantly "free", RDBMS could NOT hold a candle to Oracle (amongst others) where the needs of "enterprise" as concerned is reasonable. The development of Linux is (was), in many ways, the antithesis of what it takes to develop an equivalent to Oracle. Think about it:

    Firstly, kernel's are not all that complicated.

    Secondly, Linux's kernel is about as simple you can get.

    Thirdly, Linux, thus far, has had the advantage of being behind the curve in development. It has had the benefit of essentially just being able to copy features, design, and even some code. While the code may be from scratch, virtually nothing else is. In other words, Linux's code was built with 20/20 hindsight. Furthermore, it's done nothing particularly spectacular with that opportunity.

    Fourthly, the motive and feedback in developing an enterprise class RDBMS are vastly different. In developing Linux, there is something quite tangible, such that most "geeks" can get relatively instant gratification. Do geeks really personally need such a database (of course not, not even their hardware approaches this)? Contrast this with an RDBMS, if "they" make the system %.001 more reliable, by whatever measure, under heavy loads, how can they appreciate that? (never mind testing it) These things are about numbers. A company that does 1 million a day in sales with a broken database can't afford that %.001, but a geek screwing around developing a database in his spare time for his uses is not likely to experience such a problem. And even if he is, does that %.001 really mean anything to him? He can just reboot. Sure, you can say "given enough eyes all bugs become shallow", but ask yourself: Whose eyes, and how do these bugs become apparent? A minor bug in Linux, and a couple geeks have to reboot their PC (which presumably prompts them to patch it). Which company is going to volunteer to expose itself to such a bug---possibly millions of dollars in lost revenues (amongst other concerns). The scale of the two uses changes the nature greatly (likewise, empirically, Linux demonstrates such leanings--towards areas which most geeks appreciate, but not necessarily enterprise)

    Fifthly, the development process itself is vastly different. Because Unix, and particularly Linux, is quite modularized, it can be developed is a ragtag fashion. This is not true with an RDBMS. They're are very complex and interconnected.

    Sixthly, support is a major issue. Despite all that hype amongst the linux community about "support" it is absolutely undemonstrated. And when you consider issues such as proper documentation....

    Seventh, given some of the risks i've mentioned, what company is going to prefer to try to save 10k in software fees for the increased risk. The expected value in this case could very well be negative.

    ...I can think of more, but that is plenty of doubt right there. Think critically, not dogmatically.

  • I don't think you will find database failover in an open source product -- it's too specialized an area to have gathered interest from the necessary critical mass of developers.

    I cover databases for PC Week Labs, and in my view (and in my tests), the most sophisticated OSS relational database is PostgreSQL. In three months, though, that title will be held by InterBase. It has a more complete SQL implementation, has a (third-party) replication option and is generally more mature. However, it doesn't have any failover.

    The products that do provide failover are (non exhaustive list, but these are the main ones): Oracle Parallel Server (or Oracle with the Failsafe option if you want just 2 node failover), Sybase ASE, IBM DB2 UDB, Informix, Tandem NonStop SQL and Microsoft SQL Server. The cheapest option in terms of purchase price will be Microsoft SQL Server with Microsoft Cluster Server, but it only runs on NT and only supports 2 node failover, and does not cluster.

    Note that when you talk about failover (particularly with stateful connections), you need to also get failover-capable hardware systems. The databases mentioned above all need special, tightly spec'ed high availability hardware configurations from Sun, HP, etc. In general, a shared SCSI bus is required.

    You also need to decide what level of failover you need: (from simplest and cheapest to most complex, expensive and best):

    * cold standby (replicate to a standby server) -- no shared storage, mostly up-to-date, cheap, manual intervention required for failover

    * warm standby (shared storage or transactional replication to a standby server) -- ensures no data loss for all committed transactions, requires fault-tolerant hardware and a fault-tolerant dbms, failover is automatic but may take several minutes to roll the log forward, warm up the cache and reconnect users

    * hot standby (shared storage) -- no data loss for committed transactions, fast failover (30 seconds or less) -- this is a very specialized area, and you should have your dbms vendor work with you on setting up these kinds of HA setups. This is an ongoing area of research; Sybase 12 and Oracle8i 2.0 introduced new features specifically in this area to do things like pre-connect users and pre-warm the cache to speed failover times.

    If you want to have an OSS solution, I'd advise using PostgreSQL or InterBase and writing stored procedures in C to replicate inserts, updates and deletes to a second server, then coding reconnect and heartbeat logic into your front-end apps. This will be easiest to do with an app server since you have a single point of data access. This will not provide atomicity or durability though (unless the updates use 2PC) -- you'll need to run a consistency check on the db on failover.

    This is hairy, though. Overall, I advise paying for something that someone else has gotten working.

    Regards,
    Tim Dyck
    Senior Analyst
    PC Week Labs
  • by JohnZed ( 20191 ) on Monday March 27, 2000 @01:52PM (#1166756)
    Check out interbase.com or interbase.org. Interbase has been in use in enterprise environments for something like 15 years. It has very advanced crash recovery features and replication throught a commercial plug-in. Version 6.0 is under the MPL and in beta right now for Solaris, Windows, and Linux. No, I don't work for them, I just sound like that.
    --JRZ
  • by Dacta ( 24628 ) on Monday March 27, 2000 @03:08PM (#1166757)

    Interbase v6 is in (free beer) beta right now, and will be released under the MPL (Mozilla Public Licence) by Mid-June

    You can download the beta now for Linux, Windows or Solaris from Interbase.com [interbase.com] or The Interbase Developers Inititive [interbase2000.org]

    Supports databases up to 32 TB (!!!) spread over 2GB files and is fast and reliable. (It's been around for 15 years.) The biggest known Interbase DB is over 200GB.

    It's is ANSI 92 SQL compliant (well.. as compliant as any DB I've ever seen - better than Oracle for instance), and the support is Amazing

    Join the email lists at www.mers.com [mers.com], and you'll be able to get answers from Ann Harrison (the president of Interbase.com), or from a lot of other people - it's the best support I've even seen.

    As for enterpise features - well, apart from large DB support, it has row level locking, transactions, referential integrity, blobs, Multi-Generational commits, stored procedures... ummm... I can't think of what else to say.

    Basically, if your Databases are less than 50GB then Interbase is the number 1 choice - above that maybe Oracle would be better, but that isn't exacly open source.

  • I am an Oracle DBA, working in Silicon valley, I've had a lot of experience with various flavors of failover. You can have all sorts of degrees of automation and rapid failover, but most of the really high end solutions are very complex, and the added complexity buys almost nothing in terms of uptime. The "sweet spot", IMHO, for Oracle is a "standby" databse, or simply a raid box that is physically attached to two unix boxes, but only one mounts it at a given time. Not a clustered or Parallel Server implementation. There are some places that will pay the complexity and $$ price to have parallel server, but it really only gets your downtime down from, say 30 minutes to under 5 minuts *IF* it works right, automatically. Even parallel server, where both systems are up at the same time, you'll still see half of your connections aborted when one side of the cluster goes down, and you'll have to redirect that half to the surviving node. All the while dealing with complexity of these things *every day*.

    Here's a good way to do it in oracle, that'll probably genralize to almost any RDBMS:

    The simplest way is to have two unix boxes, each physically attached to a raid unit (dont use raid 5, use mirroring for performance). Don't use clustering software, just mount the filesystems on only one machine at a time. If the primary system fails, then mount the filesystems on the secondary , startup the database, and you're on your way with minimum downtime. Script it and tie it to some ping or other fail detection strategy if you feel lucky, but in my opinion, keep a human in the loop to actually execute the switchover.

    Another great solution is a so called "standby database". You make a copy of the primary database to your standby machine, using your backup tapes. Then you start to "roll forward" the standby database by applyin all offline redo logs of the primary database, as it is generated. This method should work for any database that logs transactions including Sybase, Informix and probably MySql. The equivalent of a redo log in MySql appears to be the "update Log". but I have no experience with MySql. One big drawback of this solution though, is that even though a transaction has been committd to disk on the primary, the event may not make it to the offline redo logs before the system crash. So a standby database can only be within some delta-T of the primary database, the last fiew transactions before the crash are lost. Sometimes this is acceptable, but if you billed the customer then lost the order, maybe not. You tune the redo logs buffering parameters to trade of performance for small "lag time" of getting redo logs across to the standby database.

    The truth is that it takes 10x effort to get failover and cluster software correct, and 9 times out of 10, the automatic failover either triggers accidentally (bad news) or triggers correctly but fails to come up on the secondary. This is because you have to be incredibly scrupulous about keeping non-shared resourses in sync on the two machines and you can't test the failover (politically that is) until it actually fails. Heck, I just took a call this past saturday for this very problem: Site paid $$ for a bigger consulting firm to implement cluster, firm cant be reached on saturday when cluster trips but does not failover successfully. Customer calls me. I am to gracious to say I told you so.

    My free advice: Make two machines, not a cluster. Keep the two machines in sync, using the "standby technique", or make the raid unit accessable by both machines. Keep a human in the failover loop.

It is now pitch dark. If you proceed, you will likely fall into a pit.

Working...