Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Operating Systems Software Linux IT

The Linux Filesystem Challenge 654

Joe Barr writes "Mark Stone has thrown down the gauntlet for Linux filesystem developers in his thoughtful essay on Linux.com. The basic premise is that Linux must find a next-generation filesystem to keep pace with Microsoft and Apple, both of whom are promising new filesystems in a year or two. Never mind that Microsoft has been promising its "innovative" native database/filesystem (copying an idea from IBM's hugely successful OS/400) for more than ten years now. Anybody remember Cairo?"
This discussion has been archived. No new comments can be posted.

The Linux Filesystem Challenge

Comments Filter:
  • New FS (Score:5, Interesting)

    by stecoop ( 759508 ) on Wednesday July 28, 2004 @03:16PM (#9823969) Journal
    Linux must find a next-generation filesystem to keep pace

    What are the winds of change saying? R..E..I..S..E..R...4... [namesys.com]
  • by valen ( 2689 ) on Wednesday July 28, 2004 @03:21PM (#9824052) Homepage

    I want a disk equivalent of top - something that'll tell me what processes are kicking the shit out of the disks, and by how much.

    If Linux could do that - it's more a VM thing than a filesystem - I'd stick with ext3 for years to come.

    Who needs a filesystem in a database when you have a database that lives on your filesystem (updatedb). Get that updating in realtime, with more things (like permissions, access times etc.) and a lot of the work is done.

    john
  • Gnome Storage (Score:5, Interesting)

    by leandrod ( 17766 ) <{gro.sartud} {ta} {l}> on Wednesday July 28, 2004 @03:24PM (#9824087) Homepage Journal
    Gnome Storage should be a step in the right direction, and it gets it right by not reinventing the wheel, just using PostgreSQL as its database engine.

    This way we can test the waters without messing with the kernel. When the concept is tried, we can decide if we make PostgreSQL a required part of a GNU/Linux system, or a Hurd translator, or whatever.
  • But... (Score:3, Interesting)

    by sk6307 ( 797832 ) <sk6307@btinternet.com> on Wednesday July 28, 2004 @03:24PM (#9824097)
    Is there anything that a true database filesystem offers that something like a realtime updatedb index and maybe a background updated glimpse index of /home cant offer?

    I have about 18GB of files in my main home dir, and I can search it in seconds with slocate and if I need a content search, with glimpse.

    I know that this kind of database FS provides a lot of cool opportunities in terms of meta-data, but how useful is it for non-techies, who usually dont name their files coherently, let alone correct ID3 tags or other other meta-data.
  • by nostriluu ( 138310 ) on Wednesday July 28, 2004 @03:25PM (#9824110) Homepage

    What do you want these next generation features for? Mainly features like access control, security, robustness, and above all organizing and sharing data.

    Why not go higher level, use a reliable and simple underpinning such as ext3fs, with something like WebDAV (Distributed Authoring and Versioning) on top of it? Like SubVersion, it is based on HTTP, with specification for versioning and rich access controls.

    Or maybe even go to the level of a Java JSR, so you could have a cross platform API for accessing files so it really doesn't matter what the back end is, KaZAA, Google or a DataSette, as long as your programs have a high level view of the information.

    You might even end up with something liek the original TB-L Web, with everyone running their own Web server.

    Of course, excluded from the above is performance, which would be ok for office type apps but not something that requires direct disk access, but perhaps the simpler file system would be most suitable for that.

    Of course, I'm just rambling here, so would be happy to hear more developed responses to this suggestion...
  • by minginqunt ( 225413 ) on Wednesday July 28, 2004 @03:26PM (#9824114) Homepage Journal
    In addition to Reiser4, there are a whole whost of projects that aim to provide all or part of what BFS achieved, Spotlight (MacOS X Tiger) and WinFS will achieve.

    This includes Beagler/Dashboard

    http://www.nat.org/dashboard
    http://www.gnome.o rg/projects/beagle/

    And of course, the ambitious Gnome Storage project, being pushed by Seth Nickell. He recently wrote a paper comparing all the technologies, found here:

    http://www.gnome.org/~seth/blog/document-indexin g
  • by bsd4me ( 759597 ) on Wednesday July 28, 2004 @03:26PM (#9824115)

    In the early days of Linux (1992/1993ish) a new filesystem seemed to appear each week. Most were pretty unstable, though. My first Linux machine, which started out as v0.11, kept its root partition as minix-fs for a long time for this reason (and also because I didn't feel like recreating my system).

  • File versioning (Score:4, Interesting)

    by Alain Williams ( 2972 ) <addw@phcomp.co.uk> on Wednesday July 28, 2004 @03:33PM (#9824187) Homepage
    I know that some don't like it, but we need the option of file system versioning, so that if/when you delete half the lines in your letter/program/... you can get them back from the previous copy on disk.

    There is an expectation that the application should do it, that means extra code in each application and they all do it slightly differently.

    OK: need an O_NOVERSION on open(2) if the app *really* doesn't want this - eg a database.
  • by JBMcB ( 73720 ) on Wednesday July 28, 2004 @03:33PM (#9824191)
    Make the core filesystem small, robust and fast. Journalling, realtime and not much else. Make add-on modules for fancy things like ACL's, quota, compression, encryption, compatability, extended attributes, etc... Put in shims for calling attributes from a database (db or SQL or whatever)

    XFS comes close, ReiserFS 4 is nice, too. The most important thing is keeping the base filesystem simple and FAST. You think NTFS is fast? Try deleting a complete Cygwin install (>30K files) It takes AGES, even from the command prompt. I've deleted 15K files (That's 15 THOUSAND files) on Reiser 3 on the same machine, it took a few seconds.

    DO NOT make a database driven filesystem. Some day we will have a true, document based desktop paradigm (OpenDoc anyone?) but probably not for several years, until then we need SPEED.

  • Speed and Versioning (Score:3, Interesting)

    by silas_moeckel ( 234313 ) <silas@@@dsminc-corp...com> on Wednesday July 28, 2004 @03:34PM (#9824201) Homepage
    OK we have all these DB things that seem more for meta data and seach realy thats a bit secondary to a filesystem. Most filesystems are access by applications for surprise surprise files with very little user files and lots of application files. While it might make snece to mount /home as some DB is a filesystem with piles of indeed and seachable data so the users can be even more clueless to where anything is. The rest of the system needs faster all around and cluster aware from my point of view. Versioning in the FS ala VMS would be a nice thing as well. Disks are the slowest thing on your average system with Gigabit ethernet moving more data than the highest performing single disk in the real world.
  • Next generation? (Score:5, Interesting)

    by stratjakt ( 596332 ) on Wednesday July 28, 2004 @03:36PM (#9824224) Journal
    Lets get the "this generation" filesystems working correctly, shall we?

    Solid, universal support for ACLs, and while we're at it, let's fix the whole user/group namespace mess Unix has with it. Let's use an SID-style id like Windows does.

    For example: my small network at home, centrally authenticated through ldap.

    Now, windows knows the difference between the user "jim" on local machine A, "jim" on machine B, and "jim" the domain user. They'd be shown as MACHINEA/jim, DOMAIN/jim, etc.. The various SIDs take the domain (or workstation) SID and append the UID. So if his number is 100, his sid is "long-domain-sid" + uid. So when you pass around sid tokens, you know exactly which jim you're talking about.

    Now in linux, we just have numbers for users and groups. If user 100 on machine A is "jim", user 100 could be "sally" on machine B. Moving that stuff to ldap becomes messy, now I have to reconcile the numbering schemes of all the machines I want to migrate. Ick. And you get all kinds of screwy stuff sharing folders, if you ls it on one machine it'll show wholly different ownerships.. Is the source of about a billlion and one nfs security holes.

    And of course, since a file can only have one permission set - owner, user, group, it sure does make for some sucky shit. The lazy among us would just run as root all the time to avoid the whole damn mess.

    I know there's a circle jerk of workarounds, patches and gotchas to avoid this, but it should never be a problem in the first place. The basic unix security model is out-of-date, and is the source of many systemic problems.
  • In a nutsheel (Score:3, Interesting)

    by Sepper ( 524857 ) on Wednesday July 28, 2004 @03:37PM (#9824238) Journal
    It keep a journal file of last modifications
    (ie "Replacing node 37827 with node 5279867....replaced")
    One the modification is done, it erases the entry.

    After a crash, the system only need to look in the journal file to know which file 'might' be corrupted and restore the old version of each...

    At least, that's how I understand it...
  • by Keruo ( 771880 ) on Wednesday July 28, 2004 @03:39PM (#9824261)
    Since new systems now are running with +1gig memory and with 64-bit systems over 4gig memory is possible,
    why not run the entire operating system in ram instead of hd and just write differential changes in ramrootfs to disk.
    It would probably slightly slow down booting, but that could be avoided by loading software to memory as needed
  • by Anonymous Coward on Wednesday July 28, 2004 @03:43PM (#9824313)
    Reiser keeps up with my demands and needs pretty damn well. I run it on an IMAP server that serves up Maildirs, and it never ever hiccups, which is something compared to how the same server behaved on EXT3. The issues with a next-gen filesystem have little to do with everyday users, however, but cut to the heart of one of the key battles: what groups (be they companies, foss renegades, whatever) will control key network infrastructure. MS may be serious this time -- in spite of their spin, they've lost a lot in this space to *NIX over the past several years, and they need really damn good reasons to convince pros that they should really be going with MS products.
  • by dekeji ( 784080 ) on Wednesday July 28, 2004 @03:43PM (#9824315)
    All indications are that Linux, Windows, and Mac OS are moving in a common direction with filesystem innovation

    Whether or not it is useful, one thing is clear: this sort of thing is not "innovation". Databases as file systems have been around for decades, as has the question of file system metadata. The UNIX choices in this area are by design, not by an accident of history, and the motivations behind those choices are as valid today as they were several decades ago.

    Linux is a ways yet from having a fully attributed, database-driven, journaling filesystem. The direction of future development looks promising, though. Linux will certainly compete as the search wars come to the desktop. Linux's value to the enterprise depends on it.

    There are two things one needs to keep apart: what functionality do we want to support and how do we want to support it. Search clearly is important. Metadata clearly is important. However, whether a "fully attributed, database-driven, journaling filesystem" is the best way of implementing those features is an open question. There are many possible alternative designs.

    And, in fact, it seems right now as if Microsoft is, in fact, not building what the author seems to think they are building, but is choosing an implementation strategy that combines a more traditional file system with user-level databases.
  • Re:Why not use... (Score:3, Interesting)

    by 0racle ( 667029 ) on Wednesday July 28, 2004 @03:47PM (#9824368)
    Something that exists can't compete with something that doesn't? Given the time between now and whenever the next version of Windows shows up, I think that there might be some time to whip BFS into shape, assuming it is as outdated as you say, personally I don't know and right now think that adding a relational DB to the file system is just going to have a significant impact on the performance of anything less then the absolute bleeding edge.
  • by Anonymous Coward on Wednesday July 28, 2004 @03:51PM (#9824401)
    I've gone well past the point where my data is worth more than the total cost of a new computer, and I don't want to lose it to a HD or computer failure. I'm particularly concerned that we are digitizing our family photos and that they could poof one day. So for me the killer application for a filesytem is 100% reliability. Ideally I'd just run some sort of transparent distributed raid thing and it would just automatically copy everything to one or more of my computers. That way when one crashed I could just plug in another one and be on my way.

    The relational database filesystem seems like a big boondoggle to me. We already have several free RDB products (Postgress, MYSQL, etc.) as well as stable programming interfaces to those products in numerous languages. We also have good support for small files (Reiser3 already), and we support hard links. It looks like most of that could be tried out in userland too see how well it worked without any changes to the underlying filesystems.

    I'm not a fan of metadata. The WWW has shown that having extra content about the contents of your content simply does not work very well. The metadata is redundant information and is thus prone to many syncronization problems with the original data that cause it to be invalid. There is a reason that we all search the web via GOOGLE instead of some metadata scheme.

    Michael
  • by dasunt ( 249686 ) on Wednesday July 28, 2004 @04:01PM (#9824503)

    As a linux user, I don't sit back and think "this filesystem sucks". For the most part, I'm happy with ext3.

    When I do try to make a wishlist, the only things I really want is KDE's IO Slaves integrated into the system at a lower level so that all programs can use it, and a more secure version of NFS. That's it. Perhaps some sort of revision control on certain files, but RCS works fine for me.

    I don't want data forks -- it creates more problems (with transferring files) then it solves.

    For a similiar reason, I don't want my filesystem to be a DB. I'm happy with files. Damn happy. I don't see what problems a database solves.

    Just my $.02.

  • by IamTheRealMike ( 537420 ) on Wednesday July 28, 2004 @04:07PM (#9824593)
    The explanation I heard was that there were a lot of bugs in the early 2.4 kernels that ReiserFS exposed and other filing systems didn't due to the way it worked internally. Whether it's finger pointing or not doesn't really matter - people are saying "I can't trust ReiserFS" but what makes you think you can trust the kernel team absolutely either?
  • Why no MS DBFS? (Score:3, Interesting)

    by Doc Ruby ( 173196 ) on Wednesday July 28, 2004 @04:11PM (#9824648) Homepage Journal
    Exactly why hasn't Microsoft released a SQL-queryable database filesystem? They validated the architecture with their marketing years ago, after IBM proved it technologically. And its advantages are obvious. In addition to better features, it offers Microsoft the opportunity to sell its SQLServer product to serious users, with a natural upgrade path. And its an opportunity to promote the MS version of SQL across the world, raising the tide against Oracle and the rest (including MySQL). It could also make an end-run around the Samba project, which Microsoft initially helped, but now apparently fears. And of course it's a better platform on which Microsoft can offer "open yet proprietary" file formats. Is Microsoft really so incapable of actual innovation, or is there something else wrong with this picture?
  • Why do you copy before you backup?

    Copying from three actively used locations, merging, and putting them on a slow external drive. Then I back up from there. I don't want to stop using my drives while I burn 30 DVDs.

    so I guess renaming a bunch of files is faster than moving them to another partition

    Noooooo, if you read my post I said that copying, moving and renaming, with a large amount of ID3 parsing, on HFS+ was faster than JUST copying on ReiserFS.

    My dual Athlon never once locked up since I removed the audigy2 plat

    A file system lock up is quite different from a REAL lock up. A file system lockup stops a particular Explorer window's contents to stop refreshing or responding for the period of the lockup. It happens regardless of hardware (happens all the time on our RAID at work, deleting a few hundred thousand bad mail files from Exchange).
  • Re:New FS (Score:5, Interesting)

    by prisoner-of-enigma ( 535770 ) on Wednesday July 28, 2004 @04:11PM (#9824658) Homepage
    I've been shouted down before about this, but I'm going to keep asking for it because it's a useful feature for my company: what about per-file compression in the file system? Now before anyone has a hissy fit, let me explain.

    We output a lot of digitally-created video files that are huge (think HDTV resolution). Most of these files are output uncompressed because either (a) the file format doesn't support compression or (b) the multimedia program doesn't support compression. Either way, a few minutes of HDTV-quality uncompressed video will absolutely destroy a few hundred gigabytes of space in no time.

    We have to hold on to some of this video for quite some time, but we only need to get at it infrequently. It's too big to fit on DVD-R's, tape is too slow, ZIPping it up hinders easy access later, and removable hard drives are expensive. File system compression, on the other hand, does wonders. We routinely get 60%-80% compression on archived video files, and it's allowed us to stretch our disk capacity a long, long way because of it.

    We've considered archiving our video in some kind of compressed streaming format like AVI, Quicktime, or MPEG-2, but none of these offer lossless codecs that are appropriate for us, and we're unwilling to accept using a lossy compressor.

    So, I ask the question again: when, if ever, is anyone going to implement file compression on a Linux file system? Or does it already exist but is buried somewhere in some arcane HOWTO or website?
  • by SteamyMobile ( 783822 ) <support@steamymobile.com> on Wednesday July 28, 2004 @04:11PM (#9824660) Homepage
    If Linux is going to do this, the best way would be in the form of an object persistence system [jdocentral.com] like JDO. What is a file? A file is something that we can access sequentially, by reading and writing, or perhaps random access, by going to a specific location in a file and then reading sequentially from there.

    That's great, but is that what we (application developers) want or need? No, it is not. In fact it is extremely different from what we want.

    What we want is the ability to make objects (a word processor file, etc) persist. We also want to be able to search for these persistent objects in various ways, by saying "show me all the word processing files that have so-and-so as the author, and were created on this date and contain the word 'nigritude ultramarine'". That's what we want.

    A file system that allows sequential access to bytes, and creates a hierarchy of names (/your/file/is/here.txt) is really very different from that. That's why we need to have persistence tools (like JDO, XML, etc) and also search tools (Unix locate, file browsers, etc). But these are just hacks because the real problem of object persistence and retrieval isn't solved in one place.

    One problem with solving it is that Linux is all C and has C mentality all through it, even at the application layer. People still scoff at object oriented design. This gets in the way of implementing cool filesystems like this.

    Reiser4 does not exactly have these object oriented features, but it's much closer than anything else, and the object persistence could easily be implemented using Reiser4. I'm glad to see that Suse is using it as the default FS. I hope it becomes part of the standard Linux kernel. I also like its plug-in architecture, so we may finally get some advanced FS-layer security features in Linux.

  • Re:not so fast ... (Score:3, Interesting)

    by Anonymous Coward on Wednesday July 28, 2004 @04:16PM (#9824721)
    P.S. The whole thing - filesystem as a DB - is complete crap. You can't do a bunch of fs operations in a single transaction and have ACID semantics on the transaction as a whole. Sure - searching is great. But database means much more than just a searching interface.
    The real killer is stored procedures. It'll be a cold day in hell before those are allowed into a kernel.

    And how do you email files with attributes or other metadata? They're not part of the regular file data, so all the usual email clients won't see them. Ditto for file compression utilities. The backwards compatibility problems are insurmountable.

  • Re:File versioning (Score:4, Interesting)

    by prisoner-of-enigma ( 535770 ) on Wednesday July 28, 2004 @04:20PM (#9824781) Homepage
    I know that some don't like it, but we need the option of file system versioning, so that if/when you delete half the lines in your letter/program/... you can get them back from the previous copy on disk.

    Interestingly enough, Microsoft has implemented just that very feature in Windows Server 2003. They call it "Shadow Copy Volume" and it's accessed through a "Previous Versions Client" add-on to any file's properties. If you overwrite or delete a file on a Shadow Copy Volume-enabled network share, you can just right-click on the file, select "Properties," and go to the "Previous Versions" tab to see all the prior versions of that file. You can recover any one of them you like and save it anywhere you like. Further, the server only saves the deltas between changes, so it's very space efficient.

    This is one feature I'd *love* to see implemented on Linux. I don't think this is in Samba yet, is it?
  • by at_kernel_99 ( 659988 ) on Wednesday July 28, 2004 @04:25PM (#9824839) Homepage
    I disagree. Why can I go to google and search the entire web for something and get an answer in less than 1 sec, and I can't do that on my computer or lan?
    A fine question. Which makes me wonder: Is google's next killer app a new filesystem? As the search kings, and rumored Linux users, might they be about to enter the hard drive search / filesystem market? Various pundits speculate that gmail is their first foray into searching beyond the web; surely, at some point, their technology will reach our hard drives. Will it be via a stand-alone tool, or a whole filesystem?
  • Re:Next generation? (Score:4, Interesting)

    by pclminion ( 145572 ) on Wednesday July 28, 2004 @04:37PM (#9825001)
    We are all in this make something "better," aren't we? Or is this whole OSS thing just one big echo chamber?

    No, the entire OSS community is not an echo chamber, but the kernel development community is. I've seen flamefests caused by some poor soul suggesting very minor changes to, for example, the semantics of pipes. Unix isn't just written in stone, it's laminated and stored in an evacuated nuclear-blastproof case 500 meters underground.

    Is the Linux/Unix community so "steeped in tradition" (also known as stubborness, obstinance, intolerance, and narrow-mindedness) that it willfully clings to an outdated, inferior way of doing things?

    Again, it is not the community as a whole which is stuck, but the kernel people. I was simply pointing out the truth, not trying to say it's a good thing. Although I think it is wise to be suspicious of radical new ideas until they have proven themselves, I think that many times ideas are rejected for purely dogmatic reasons, and that really restricts innovation.

  • by mabhatter654 ( 561290 ) on Wednesday July 28, 2004 @04:39PM (#9825035)
    OS400 native FS is still the neatest model for DB-FS integration out there. It's unique because the File system is written to allow file-member-record-field access directly from a command line call... It's similar to what was posted above with Reiser4 and having plugins. The key to AS400 success is that the "file system driver" is pushed down into hardware controller roms so the DB like access is nearly fool-proof. Stuff like queries and SQL are just "plugins" on top of that model. The only thing I see preventing linux from adopting such a scheme is that it would require an entire dedicated HDD/partition to do properly because you need complete control of the disk structure to the FS driver...and where AS400 has very few native file types it deals in, linux would need to litteraly create a file model "plugin" for every type of Mime type..otherwise you still have a bunch of meaningless BLOBS to parse. OSS/Linux is uniquely qualified to write this because they have the "keys" to everything, but it would take enormous cooperation to implement it correctly!
  • Re:not so fast ... (Score:3, Interesting)

    by Sloppy ( 14984 ) * on Wednesday July 28, 2004 @04:46PM (#9825139) Homepage Journal
    Ghost?! He's not even dead yet!

    I do think this is really funny, though. The more functionality people want to cram into the FS, the more they're going to look back at that famous Usenet thread, and reconsider... ;-)

  • Re:Next generation? (Score:5, Interesting)

    by mattdm ( 1931 ) on Wednesday July 28, 2004 @05:19PM (#9825516) Homepage
    Is the source of about a billlion and one nfs security holes.

    Or rather, it is the source of the NFS security hole. But it's okay. NFS4 (or 3, even) with Kerberos totally solves this problem, much more elegantly.

    Everyone's all excited by ACLs, but I'm sceptical of their real world value. The "keep it simple" principle of security can't be emphasized enough. With ACLs, you have to really examine the access rights of a given object to figure out what's going on. With the standard Unix user/group system -- with simple directory-based inheritence -- it's completely transparent.

    And, most importantly, I've yet to see one thing worth doing with ACLs which couldn't be set up with user/group permissions instead -- and more simply.
  • EXT3 FS (Score:3, Interesting)

    by jonnystiph ( 192687 ) on Wednesday July 28, 2004 @05:30PM (#9825625) Homepage
    At work here, the previous admin installed a number of machines with EXT3 FS on the drives. These machines (RH 8.0 - EL3) crash sporadically often giving indications the FS was at fault.

    While I personally believe Redhat is known to push "unstable" releases, I was suprized that from 8.0 - EL3 the EXT3 fs was still crashing and Redhat was still offering this as default on an install.

    Anyone else had better expierences with EXT3? I am curious if anyone has more information on why this FS seems so damn unstable.

    For test purposes we run "out of the box" installs, so there should be no kernel tweaking or any other "anamolous" things going on with the install's or the boxes.
  • The first feature I would like to see added would provide the capabilities of a Partition Data Set from the IBM TSO days. We would not copy what IBM did of course because they did some stupid things meant to sell more hardware. One thing was deleted files were not really deleted and you needed to "re-generate" the PDS when it filled up. Nice trick!

    Suppose you had an infinit number of loop back devices and these were hidden and used internally by the file system and when you started an application you could "mount" what for many intents and purposes looks like a TARBALL and the application in question and ONLY the application in question got to see all the files in this TARBALL. Well, the files inside a "TARBALL" of this nature would probably not be compressed, but, they could be if desired... Well, that is the concept of a Partition Data Set.

    In the case of a user logging in, when the shell is started a mount could take place against the user's private data set. By doing this on a shared machine, file security can be guaranteed. For export and import the system could mount a "shared" dataset.

    This sort of secutiry is far superior to ACL's and anything present file systems offer for the very simple reason that normal people including systems administrators would not normally see any of the files inside one of these datasets. Consider the advantages of running an apache server where you KNOW all associated files needed by that release of apache are in a single dataset. There IS not easy way to lose a file or clobber it or accidently delete it and so forth. Next consider that when that copy of apache starts up it _could_ simply mount a set of files each of which contains the whole website for a given domain.

    Upgrading to a new copy of apache would be as simple as copying in a new dataset and mounting it against the web datasets. If a glitch is found, simply restart the old copy.

    Backing up a set of files becomes a simple copy operation. Replication can be accomodated as well.

    Systems Administration in those old IBM mainframes was MUCH easier than with UNIX systems and this is in large part because of the way the system handled partition datasets.

    ------------

    Now, with this we would want to be able to mark certain files as being external sort of like opening up a window, and through this window we could for instance access certain files which might be the executables and supporting scripts.

    Of course people will point out we can accomplish some of this with a loop back mount. The problem with the loopback mount is that it populates the directory tree and this is what I really want to avoid. Frankly there really *IS* no reason for even a sysadmin to be able to see 90% of the files that say consitute a web server, or say PostFix, or PostgreSQL. We accomplish a lot if the executable which needs access to its supporting files has a "private" loopback and only this executable by default gets to see the mounted dataset.

    --------------

    Next idea is versioning the way Digital Equipment Corporation did it in the VAX. We simply append a version number and what the delete does is append a version number. With disk drive capacities heading into the stratoshere there is no reason to be conservative.

    And this leads to the next idea which has been mentioned before... that is replication - across machines.

    I can buy for $20 bux a machine (P1 200mHz) that can run a 20 GB hard drive and in fact I think they can run 80 GB hard drives as well. Rsync is useful, but a full replicating filesystem at the kernel level, or at least at the level of a deamon close to the kernel would mean that a machine can be backed up to another machine in perhaps another building automatically and with little effort.

    Well, I'm sure other people have other things they might like to add. This is my wish list.
  • Re:New FS (Score:3, Interesting)

    by vandan ( 151516 ) on Wednesday July 28, 2004 @05:46PM (#9825769) Homepage
    Damned right.

    With Reiser3, doing `emerge -up --deep world` on my Gentoo box would usually take about 10 seconds after the progress spinner had started.

    Now with Reiser4, it takes about 2 seconds after the progress spinner starts.

    The speed really is absolutely amazing.

    And from what I've read of Reiser4, it has all the database niceties for managing files and contents of files that WinFS is promising. Of course Reiser4 currently exists and is working on my home gaming maching and 4 machines here at work. WinFS is just marketing speak.
  • by vsprintf ( 579676 ) on Wednesday July 28, 2004 @07:36PM (#9826642)

    Well, the key to a database filesystem will be seamless data entry and simple, powerful access to search and reporting features.

    I'm not sure what you mean by "seamless data entry." Maybe I missed something in the article. Are you suggesting people will be willing to provide meaningful metadata for a file when they aren't even willing to provide a meaningful file name? And "powerful access to search and reporting features"? As opposed to wimpy access to search and reporting features? It sounds a bit like marketspeak to me.

    It seems to me that this push for "database filesystems" is from people who want to throw everything in one directory and name their files "1", "2", "3", etc. It just dumps more work on the CPU to handle more filesystem overhead. When you give a filesystem the attributes of a relational database, you also get all the related problems, overhead, and constraints. Records in a database table are very closely related. Files in a directory may have nearly nothing in common.

    What I'm guessing is that we'll see a thinly veiled front end to grep -- and as much as I like grep, it's a serious pricker bush.

    What is WinFS, except an admission that you can't *grep* or *strings* a Windows *folder* full of files? Files that have secret, constantly changing formats require metadata to make them searchable, which makes the filesystem bigger and slower. Being able to see the same file from two different directories doesn't impress me. I can do that with ln -s (if I really have a valid reason for it). I really don't see a lot of benefit to *nix in following Windows down this road. I'm sure I'll get enlightened shortly. :)

  • Re:Next generation? (Score:5, Interesting)

    by Malor ( 3658 ) * on Wednesday July 28, 2004 @09:07PM (#9827244) Journal
    Properly done, an ACL system will give you a MORE secure system, not a less secure one, because there are fewer chances for mistakes.

    In the NT 4.0 days, one of the better ways to handle permissions was the 'AGLP' standard. User A)ccounts go in G)lobal groups, G)lobal groups go in L)ocal groups, and local groups get P)ermissions.

    This allows a nice level of indirection. I implemented this standard by specifying that Global groups described groups of people, and that Local groups specified access privileges. I built Local groups on each server describing the kind of access privileges they offered. Generally, I would make four groups for each of my intended shares: Share NA (no access), Share RO, Share RW, and Share Admin. I would assign the appropriate ACLs in the filesystem, and then put Global groups from the domain into the proper Local groups. The Accounting group, for instance, might get RW on the Accounting share. Management might get RO, and the head of Accounting and the network admins would go into the Share Admin group.

    What this meant was that, once I set up a server, I *never again* had to touch filesystem permissions. Not ever. All I had to do was manipulate group membership with User Manager... with the caveat, of course, that affected users had to log off and on again for permissions to take effect. But this is also true with Unix, in many cases. (when group membership changes).

    Note that Windows 2K and XP have more advanced ways to handle this, so don't use this design in a Win2K+ network.... this is the beginnings of the right idea, but 2K added some new group concepts. Under Active Directory, this idea isn't quite right. (I'd be more specific but I have forgotten the details... I don't work much with Windows anymore.)

    ACLs are key to this setup, because I can arbitrarily specify permissions and assign those permissions to arbitrary groups.

    By comparison, User, Group, and Other are exceedingly coarse permissions, and it is very easy to make a mistake. What if someone from Human Resources needs access to a specific Accounting share, but nothing else? Under Unix, I can't just put them in the Accounting group, because that will give them access to everything under that Group permission. I'd probably have to make a new group, and put everyone from Accounting and the one person from HR into that, and then put the special shared files into a specific directory, and make sure the directory is set group suid. That is a lot of steps. Everything is always done in a hurry in IT, and lots of steps are a great way to make mistakes. Messing up just one can result in security compromise.

    In my group-based ACL system, I'd still have to make a custom group, perhaps "HR People With Access to Accounting Share". But I'd only have to touch one user account, the HR person's, and wouldn't have to disrupt Accounting's normal workflow at all, or touch any filesystem permissions.

    Instead of a whole series of steps, any one of which can be done wrong, I have only three: Create new Global group, put HR person in new Global group, put Global group in the correct Local group. All done. Hard to screw this up too badly.

    Now, I'll be the first to admit that a badly-implemented ACL setup is a complete nightmare. But a clean, well-thought-out ACL system, in a complex environment, is virtually always superior to standard Unix permissions.
  • Re:Next generation? (Score:3, Interesting)

    by pclminion ( 145572 ) on Thursday July 29, 2004 @11:47AM (#9832245)
    I, for one, would flame someone who suggested changing the semantics of pipes. Well-defined interfaces are the heart of reliable software.

    I fully agree, but I'd hesitate to call pipes a "well defined" interface. POSIX attempts to standardize semantics, but there are still mismatches between Unix-like platforms.

    The changes this fellow was proposing (don't remember the details, too long ago) were on the same order as the typical differences in semantics between platforms. In other words, I think there's a region of reasonable wiggle-room but the majority of kernel developers seem intolerant of even these small divergences from the status quo.

    In addition, people have a tendency to reject new features even if they do not conflict with traditional features. For example, look at the discussions on the kernel list back when futexes were being discussed.

    Incidentally, I came up with a nearly identical idea about a year earlier, but I called it "user wait queues." Threads could block on a queue and be woken by other threads. However, my mechanism for waiting on a queue involved a new system call, whereas futex waits are implemented (I think) by using an ioctl to bind the futex to a fd, and then select()'ing on the fd. I did not submit my idea to the kernel people precisely because I felt it would be rejected as "not Unix enough." Usings fds and select() is traditional, but a syscall would be more efficient, IMHO.

    It's that sort of conservatism that I'm talking about, not the perfectly reasonable effort to preserve interfaces.

The rule on staying alive as a program manager is to give 'em a number or give 'em a date, but never give 'em both at once.

Working...