Linux Gains Lossless File System 331
Anonymous Coward writes "An R&D affiliate of the world's largest telephone company has achieved a stable release of a new Linux file system said to improve reliability over conventional Linux file systems, and offer performance advantages over Solaris's UFS file system. NILFS 1.0 (new implementation of a log-structured file system) is available now from NTT Labs (Nippon Telegraph and Telephone's Cyber Space Laboratories)."
Bloat? (Score:3, Insightful)
Re:Bloat? (Score:3, Interesting)
Of course, you can delete files and re-use the space. But the performance slows down greatly once you start filling in "holes" left in the log after wrapping to the end of the allocated area. (A similar situation to database where you might want to compact, vacuum, condense, etc. a table).
Re:Bloat? (Score:2)
Data is the new currency my friend (Score:3, Interesting)
Walmart's most prized possesion is their billion-billion-billion transaction customer sales database. They use it to find things like, among other things, men tend to buy beer and diapers at the time.
With disks costing $1.00/GB or less these days, many people including myself simply DON'T delete data anymore. I keep all my original digital photos (in
Re:Data is the new currency my friend (Score:5, Funny)
So, basically, you're going to keep Duke Nukem forever?
Re:Data is the new currency my friend (Score:3, Funny)
Re:Data is the new currency my friend (Score:3, Funny)
Re:Bloat? (Score:5, Informative)
The version I wrote took advantage of the client's bursty IO pattern and used the slow periods to offload the data to an ext2 filesystem on a seperate disk. Hopefully your system memory was large enough that the offload to the secondary filesystem happened without any disk reads. Once that was done, the older sections of log could be re-used.... But only once the disk filled up and wrapped back to the beginning, because you want to keep your writes (essentially... There's other timing tricks you can play to get more speed) sequential.
There's been lots of research done on this method of write structuring. Look for papers on the "TRAIL [sunysb.edu]" project (also closed source), for example.
Re:Bloat? (Score:2, Informative)
Re:Bloat? (Score:2)
Log file systems are faster, safer, and just better. Period.
Re:Bloat? (Score:2)
if you have a regular file system, where some data is at point X and your new high-end apps just mmap it for usage, it's super easy for the kernel, just copy the data over ata/sata/scsi commands as it comes and goes, nothing is easier and for sure nothing is faster. if you have 1 journal to update in between here and then, it's rather easy to have that one in too.
now if you have a long log what should be where written by whom and why, this just wont work, awesome cool ove
Re:Bloat? (Score:2)
http://citeseer.ist.psu.edu/hartman93zebra.html [psu.edu]
Whatever you say boss.
Re:Faster (Score:3, Insightful)
Different FS (Score:2)
Not really +1, Insightful (Score:3, Informative)
The biggest problem of this filesystem [nilfs.org] (link is missing from the original posting) is that it's Not Really Ready (among other important stuff, mmap() is not implemented yet).
Re:Bloat? (Score:3, Informative)
http://cm.bell-labs.com/sys/doc/venti/venti.html [bell-labs.com]
http://cm.bell-labs.com/magic/man2html/8/venti [bell-labs.com]
Sean Quinlan [bell-labs.com] now works at Google, I'm not sure if Sean Dorward does, but it seems most of the other people who built plan9 at Bell Labs do.
Re:Bloat? (Score:5, Funny)
Re:Bloat? (Score:5, Funny)
New Improved? (Score:5, Insightful)
Re:New Improved? (Score:3, Informative)
Re:New Improved? (Score:3, Funny)
Logs structures are suceptible to termites, carpenter ants, and various forms of rot.
Re:New Improved? (Score:2)
The larger log structures don't cooperate with the flush procedure; leaving things unflushed is just asking for problems - you're going to get an overflow sooner or later.
-- Steve
Re:New Improved? (Score:3, Funny)
Er
Re:New Improved? (Score:5, Funny)
Even worse, when many logs are added together, the problems multiply.
Re:New Improved? (Score:5, Informative)
for common servers, or day-to-day use. it isn't
but notice how this was developped by a telecom company? a log structured filesystem is perfect or even required, due to speed and integrity constraints (depending on the size of the network), when you're dealing with billing and monitoring data on a telecom network. you want something that's simple and extremely resistant to failures. a complete system crash (which never happen, short of nuking the box) should not result in any data loss, or the extreme minimum, and you should be able to recreate that data from somewhere else (eg, the other endpoint in a telephone network).
a log structured filesystem allow this, the "head" is never over previous data in normal operation. you don't typically read the data back until the end of a cycle (whatever that cycle may be) or in a debugging condition. you simply append to the end. minimizing head movement, and thus increasing mtbf (replacing a disk in those things is costly)
this is also extremely useful for logging to WORM media (write once, read many), for security logs mostly. you don't want a hacker to be able to remove them, no matter what they do
Better than some, anyway. (Score:2)
I guess you can argue that if a project is actively maintained, any problems are potentially fixable. Even with Open Source, an abandoned proj
Horrible headline (Score:5, Insightful)
Or is this filesystem somehow able to recover data once the hard drive crashes? That would be neat...
Re:Horrible headline (Score:5, Informative)
A log-structured filesystem doesn't modify existing files. Every time you write to the disk, you simply append some deltas. This gives very good write performance, but poor read performance (since almost all files will be fragmented, and the entire log for that file must be replayed to determine the current state of the file). To help alleviate this, most undergo a vacuuming process[1], whereby the log is replayed, and a set of contiguous files is written. This also frees space - something that is not normally done since deleting a file is done simply by writing something at the end of the log saying it was deleted. In addition to the good write performance, log-structured filesystems also have an intrinsic undo facility - you can always revert to an earlier disk state, up until the last time the drive was vacuumed.
The snapshot facility is not particularly impressive. It's a feature intrinsic to log-structured filesystems, and also available in other filesystems (such as UFS2 on FreeBSD and XFS on Linux). The performance advantage claims must be taken with a grain of salt - write performance for log-structured filesystems is always close to the theoretical maximum of the disk, but this is at the expense of some disk space, and read speed (although LFS did beat UFS in several tests on NetBSD).
[1] This is usually done in the background when there is little or no disk activity.
Re:Horrible headline (Score:5, Insightful)
Re:Horrible headline (Score:3, Interesting)
Re:Horrible headline (Score:2)
Re:Horrible headline (Score:3, Informative)
RCS or something similar (Score:3)
For binary document formats (eg: MS Office's .doc format), things get tougher. There are versions of diff that'll work on binary files (which is why you can get binary patches), but it's more common to see logging done as a series of macros wher
Re:Horrible headline (Score:2)
I learned a new word today!
VMS isn't entirely closed source... (Score:3, Interesting)
As I recall, RMS is an indexed file management system. I wrote a molluscan taxonomy database system that used it in the 80s... but I usually encapsulate all OS-specific stuff in subroutines, so somebody has probably ported i
Re:VMS isn't entirely closed source... (Score:3, Funny)
I have heard Stallman referred to as a great many things on this site, but I think this is the first time anyone has used that comparison.
Re:Horrible headline (Score:2)
Re:Horrible headline (Score:2)
Re:Horrible headline (Score:2)
So... (Score:5, Funny)
Re:So... (Score:2, Funny)
Re:So... (Score:3, Funny)
No, but you can use the soon to be released MILF 1.0 file system for your jpg and mpg needs.
Now that's one filesystem I would like to fsck upon every boot(y) ;)
Re:So... (Score:2, Funny)
That's why I have Windows, because I can afford to lose what's on it ^_^
</toungeincheek>
Old news (Score:5, Funny)
Oh, wait. NILFS. My bad.
I suggest N = "Nobody" (Score:2)
Re:Old news (Score:5, Funny)
(Sorry...Couldn't resist)
Re:Old news (Score:4, Funny)
I can think of at least one Norwegian-ILF (Kristanna Loken.)
Re:Old news (Score:2)
Database Servers (Score:5, Insightful)
This sounds a lot like how database servers work. They keep both a log file and a database file. The log file is continuously written to and is only truncated when backups occur.
Privacy (Score:2)
Not sure I like logs listing that 3 years ago, I had a file named bad_kiddie_pr0n.jpeg (or whatever) on my computer.
They'd better have a good cleanup script!
--LWM
The dreaded question (Score:3, Funny)
If there isn't, this has no chance on taking off. Consumers today want portability. They don't like lock-in. A linux exclusive format is lock-in.
Create a good windows (and Mac OS) driver, and it's got massive potential.
Re:The dreaded question (Score:4, Insightful)
That's unfortunately not true, which is proved by all the people using NTFS (or Office).
Re:The dreaded question (Score:4, Funny)
Will there be a Windows Driver? If there isn't, this has no chance on taking off.
Yes, that's why I only use FAT filesystems on my Linux server.
Re:The dreaded question (Score:3, Informative)
Re:The dreaded question (Score:2)
Say, what?
The following is left as an exercise to the reader:
1. Please list all the linux file systems avaliable.
2. Please list all the linux file systems avaliable with read/write support in both linux and windows.
3. Please add up the total amount invested by various corporations in the development of the file systems listed in #1.
Please don't forget that although you may use differently tweaked filesystems between servers and desktops, there is a great deal of overlap. Linux as a desktop system may no
Stable? (Score:5, Informative)
The system might hang under heavy load.
The system hangs on a disk full condition.
Aren't those kind of important to saying that something is stable?
Here's an overview for lazy people like me (Score:3, Informative)
* Slick snapshots.
* B-tree based file and inode management.
* Immediate recovery after system crash.
* 64-bit data structures; support many files, large files and disks.
* Loadable kernel module; no recompilation of the kernel is required.
NTFS (Score:2, Interesting)
Isn't this similar to NTFS's journaling file system?
Bundling (Score:3, Interesting)
Re:Bundling (Score:2)
Re:Bundling (Score:3)
Reiser and JFS have been in the mainline kernel since umm, I think early 2.4. They were put in around the same era that ext3 showed up in the mainline kernel.
I don't know about you but I never use the included
Shutdown versus power off (Score:2, Funny)
Re:Shutdown versus power off (Score:5, Interesting)
That's a very bad idea. Normally, journaling file systems only guarantee that the file/directory structure remains intact. It does not necessarily guarantee that the data in the files hit the disk. Also, your disk will probably have a cache that is lost when you remove power. Whatever is in the cache will also be lost.
So your file system may be intact, but your practices will probably destroy data.
Re:Shutdown versus power off (Score:2)
But if it is _not_ supported - well, it could be very bad for your laptop, for example. I know lot of laptops without support for them in Linux will heat themselves too much and it can cause big trouble. With prope
Re:Shutdown versus power off (Score:3, Interesting)
Here's a little (simplified) tutorial on what happens when you a program writes a file to disk:
Re:Shutdown versus power off (Score:4, Informative)
1. It goes into the OS filesystem cache. After 5 seconds the modified data gets flushed to the disk (sometimes set to 30 sec).
2. It is written to the hard drive. Here, it sits in the hard drive controller's on-board cache until the head arrives at the write point, which is a fraction of a second.
3. It is written to disk.
So it *can* happen that data is not written properly, but unlike the scary picture you paint it is extremely unlikely. Even if you just saved your data, just do a sync and you'll be fine turning the power off.
Re:Shutdown versus power off (Score:2, Informative)
That doesn't mean that you won't lose data that hasn't been written yet, of course.
Re:Shutdown versus power off (Score:2, Informative)
"Turning the system power off causes the WD Caviar to perform an automatic head park operation."
It wasn't a high-end drive at the time, (just a consumer-level IDE drive), and was utterly obsolete years ago, yet it still had the technology to park the heads out of the way when power is disconnected. There's no w
Re:Shutdown versus power off (Score:2)
The only major issue with abruptly removing power to the HD or PC is if there were writes in cache waiting to be written to disk.
Re:Shutdown versus power off (Score:5, Funny)
Some applications keep files open for a long time: MySQL, gDBM-based apps, Squid. Most of those application implement their own mini-filesystems within a file optimised for task. These systems are supposed to preserve their integrity by journaling their modifications in case the underlying os doesn't.
Switching off a computer because it has a journaling filesystem is like stopping a car by driving into a something because it has seat belts.
actual info about the fs (Score:5, Informative)
Excellent information retention (Score:5, Funny)
Note: instead of modding this +1 funny, mod it +0.1 pathetic.
I don't know about anyone else.... (Score:2)
There's no replacement for ext3fs yet for me... (Score:5, Insightful)
1. Distro support. I don't want to have to compile my own kernel. The FS needs to be supported by the distro (Debian in this case). I want to be able to create root partition and RAID with the FS.
2. ACL and extended attributes.
3. extended inode attributes would be nice ("chattr +i" is handy sometimes).
4. optionally I would like to be able to create large Bestcrypt partitions (e.g. 30GB) with that FS.
5. fast large dir and small files performance (I have millions of small files on my desktop).
6. no need to fsck or fast fsck (i.e. journalling or some other technique or whatever).
7. disk quota!
8. optionally, transparent compression and encryption will be a big plus point.
9. Snapshots would be nice too, for consistent backups.
10. Versioning is also very welcome.
XFS: very close but it still has problems with #4. It also doesn't have undelete like ext2/ext3 (not that it's a requirement though).
JFS: it just lacks many features.
Reiser3: How's the quota support, still have to patch kernel everytime? Plus it doesn't have ACL.
Reiser4: not ready yet.
I might have to look at FreeBSD after all. Background fsck, hmm....
Re:There's no replacement for ext3fs yet for me... (Score:4, Funny)
It seems as if you're holding out for perfection, not willing to upgrade from ext3 to anything else unless you find The Perfect Filesystem. I think that's kinda silly; better to get 90% of what you need now, than to wait another 2-4 years, surely?
Re:There's no replacement for ext3fs yet for me... (Score:4, Informative)
It does have ACL, and quota support is fine at least in gentoo kernels (can't check a vanilla one atm)
Re:There's no replacement for ext3fs yet for me... (Score:2)
9. Snapshots would be nice too, for consistent backups.
10. Versioning is also very welcome.
I sure hope that none of these things are ever part of the filesystem itself. I want my filesystems 100% portable, and fast. You know why NTFS isn't so much, right? All the extra, nearly useless features that should be handled by the OS, but that are done by the file system instead.
These should be layers on top of the file system that ar
Re:There's no replacement for ext3fs yet for me... (Score:3, Insightful)
9 is basically a cron job
Ummm, no.
9 (snapshots), is a very important feature that makes it possible to create a cron job to do nice, consistent backups easily. Unless you don't mind writing a cron job that remounts the fs as read-only before doing the backup... an approach that's likely to cause one or two small problems.
That said, snapshotting doesn't need to be implemented in the file system. LVM, for example, implements it below the file system, at the level of a block device. That approach has
Re:There's no replacement for ext3fs yet for me... (Score:2)
It might be worth it:
Re:There's no replacement for ext3fs yet for me... (Score:2)
You can transparently encrypt an entire volume via GEOM, but if you want per-file encryption or compression, then you're out of luck. The only FS I've seen do this nicely was NTFS.
My 8 and 9 should then be your 9 and 10.
Re:There's no replacement for ext3fs yet for me... (Score:2)
Anyway, some other things... AFAIK Ext3 undeletion doesn't work anymore (at least not using debuge2fs, lsdel always gives me an empty list these days). Also, snapshots are possible with any filesystem in Linux, as long as the fs is on LVM.
I'm sticking with Ext3 too. Heard too many scary stories about XFS, JFS and even Reiser, so I'll stick with what's known to work.
Pity not what I thought it was (Score:2)
I am probably not the only one to come back to an old file saved years ago only to find a glitch in it. I noticed it with a couple of movies. Movies I know were perfect as I watched them without copying them. So the only explanation is that part of the disk got corrupted.
The soluti
That is what raid can do? (Score:2)
Re:Pity not what I thought it was (Score:2)
HDFS (home-dir FS)? (Score:5, Interesting)
With FUSE [sourceforge.net] it might even be possible for mere mortals like me.
Basically, I very rarely push more around more than 100-200kb at a time of "my stuff" unless it's big OGG's or tgz's, etc. Mostly source files, documents, resume's, etc. In that case, I want to be able to go historical to any saved revision *at the file-system level*, kindof like "always on cvs / svn / (git?)" for certain directories. Then when I accidently nuke files or make mistakes or whatever, I can drag a slider in a GUI and "roll-back" my filesystem to a certain point in time and bring that saved state into the present.
Performance is not an issue (at first), as I'm OK if my files take 3 seconds to save in vim or OpenOffice instead of 0.5 seconds. Space is not an issue because I don't generally revise Large(tm) files (and it would be pretty straightforward to have a MaxLimit size for any particular file). Maintenance would also be pretty straighforward: crontab "@daily dump revisions > 1 month". Include some special logic for "if a file is changing a lot, only snapshot versions every 5-10 minutes" and you could even handle some of the larger stuff like images without too much work.
Having done quite a bit of reading of KernelTraffic [kernel-traffic.org] (Hi Zack) and recently about GIT [wikipedia.org], maybe it's time to dust off some python and C and see what happens...
--Robert
Re:HDFS (home-dir FS)? (Score:2)
Re:HDFS (home-dir FS)? (Score:3, Informative)
You could possibly implement it with DavFS [sourceforge.net]...
V0.1 vs V1.0 (Score:2)
Lossless vs. Lossy Filesystems (Score:2)
but hey, that's never slowed me down.
This new filesystem is like old ones, with a big difference and a few small ones.
It has something called 'snapshots', which seems to mean that you can work off of a partition, but seperately load up the version of that partition you had before you last had a power failure, or whatever went wrong.
it also claims to:
Isn't this just like a tape drive on hard disk? (Score:2)
Sounds to me an aweful lot like a tape drive. Start at one end and start writting until you're done. I can see the point of wanting to keep all parts of each single file together in one block so that it's not broken up. That way there is no need to defrag, but I thought ext2 and ext3 did that type of thing already. Correct me if I'm wrong, but I was told that ext2/ext3 would keep a file whole at just about every cost pending a really really full drive and absolutley no contiguous room to put it, then it'd b
Nothing has been gained just yet... (Score:2, Interesting)
The system might hang under heavy load.
Why compared with Sun and not BSD? (Score:2)
Re:Needs a new name (Score:2)
Re:Needs a new name (Score:5, Funny)
Re:Needs a new name (Score:2)
Too close? Not close enough!
Re:Needs a new name (Score:3, Funny)
Re:Good news (Score:2)
1) Uninterruptible Power Supply. Is keeping your data if the power gets cut mid-write worth $80 or so?
2) Different file systems with more journaling support: I use ReiserFS. You may like JFS or something else exotic. Sure, it's a pain to convert a drive - you have to have another one with a lot of space to hold your files - but it's worth it if you're losing data due to a w
Re:erm.. lossless file system? (Score:2)
These 'holes' in the data allow new data to 'evolve' through a darwinian selection process.
Yes, I use FAT16 for everything. MS clearly understood how to generate mutations in the 'ecosystem' of the software world, and MS clearly understands how mutations drive us towards the inevitable evolution of Skynet!
I, for o
Re:getting rid of unwanted data (Score:3, Interesting)
will probably work. It overwrites every available block on your drive with random data, then deallocates them. If anyone knows why that wouldn't work, I'd be interested in hearing it.
Assuming that it does actually do the trick, it might be even better than wiping a single file. Since the whole drive would be filled with random data, there wouldn't be any conspicuous wiped
Re:Linux files systems suck ass.. (Score:3, Informative)
Apart from the big (production quality) alternatives like IBM's JFS (which I use myself) and SGI's XFS (and Reiser - "Reiser sucks when it breaks" is so 1999) Linux additionally supports the following filesystems (from http://www.xenotime.net/linux/linux-fs.html [xenotime.net], also try http://www.tldp.org/HOWTO/Filesystems-HOWTO.html [tldp.org]):
* accessfs: permission filesystem
* A