Advanced Filesystem Implementors Guide Continues 60
Tom writes: "This is part six of the Advanced filesystem implementor's
guide. I've been following an outstanding series of articles about implementing the advanced filesystems that are available with Linux 2.4. The author really knows his stuff and has done a great job with explaining Reiserfs, XFS, GFS, and the other file systems that are available." The series gets into greater depth as it goes on; you may want to start with Part One and work on from there.
Maybe its time for convegence (Score:2)
Re:Maybe its time for convegence (Score:2)
What if features that have different advantages to different people are mutually exclusive?
Re:Maybe its time for convegence (Score:1)
Re:Maybe its time for convegence (Score:3, Insightful)
ReiserFS, is a top-tech journaling file system which can be _very_ fast with some situations (large directories, etc), but as hans reiser pointed out, his purpouse is not to make a stable FS, but to keep development up, inventing new and cool technikes.. so not your #1 production choice for some.
XFS is known for its high output and parralism. In its roots it was tuned for streaming video and audio, and to work wel with _many_ cpu's (think >> 32).
JFS has a bit more mainframe background, stable (slower?), and secure..
ofcource each day they grow a little closer together (each wants all advantages), but untill one of them reaches the status 'ultimate FS', i think there is plenty of room for multiple visions and implimentations.
Re:Maybe its time for convegence (Score:2)
There is nothing wrong with lots of different filesystems. They all use the same API, so one can use whichever is best suited for one's task. XFS sacrifices metadate performance for awesome large file performance. ReiserFS sacrifices large file performance for small file performance. ext2 sacrifices saftey for update performance. Yet, you can use whichever suites you best, because, thanks to the VFS, they all look the same to user programs.
PS> I never understood why people complain about this, but not about the fact that there are so many toolkits, which unlike filesystems, have different APIs (and thus incompatible application bases). Maybe its time for a VTK (virtual toolkit) layer in X?
Re:Maybe its time for convegence (Score:3, Insightful)
Some journalling filesystems exist because there are UNIX companies with expertise in them that support them, like XFS and JFS.
Some journalling filesystems are a natural migration for most linux users - like ext3.
And some people want to re-invent filesystems en todo like Hans Reiser, and a good journalled filesystem is just the first stop.
More than one is just "value added". They all work. They are all secure and stable. Some are faster than others - but XFS, ReiserFS and ext3 are all "fast enough" for almost any uses.
The parent echos a common complaint about Free Software - that developer resources are not dedicated appropriately. Well, developers work on what they want, or what they are paid to work on. This often leads to multiple efforts that accomplish similar goals - like window managers, desktop environments, word processors, journalled filesystems, VM management etc. But ultimately competition is good if intelligent test results are publicized.
Look at the Mindcraft web server benchmark results about 18 months ago. Now, linux blows the doors off IIS in the exact same test. The same is becoming true of filesystems. Test results show ext2/3 is slow with lots of small files - so a developer named Daniel Phillips added a directory hash that fixes this shortcoming.
NTFS ? (Score:2, Interesting)
Word (Score:1)
Re:NTFS ? (Score:3)
Notice that the resolution is "backup, format, restore becuase we are too lame to write a filesystem integrity checker that acutally works." (or words to that effect.)
If that's state of the art, I'll keep tried and true, thanks.
As far as the Linux driver never going "stable" don't you think that it might have something to do with the facts that 1. NTFS is a moving target 2. NT and NTFS have bugs that "cooperate" making it very difficult for someone else to write a compatable driver?
-Peter
Re:NTFS ? (Score:1, Insightful)
Re:NTFS ? (Score:1)
Uhh (Score:1)
Re:Uhh (Score:1)
What! (Score:1)
Oh wait, it is!
Re:Uhh (Score:1)
Re:NTFS ? (Score:1)
On top of the problems of having to basically reverse engineer the fs you get the joys of a team of Lawers from MS just waiting for you to do something they can sue you for.
So basically you need somone who has a lot of time oh their hands with a partition they don't mind frying on a constant basis and who isn't worried about potential lawsuits.
Good luck.
Re:NTFS ? (Score:1)
Encrypted filesystem (Score:1)
Re:Encrypted filesystem (Score:1)
Re:Encrypted filesystem (Score:2)
Re:Encrypted filesystem (Score:2)
USA mirror of TCFS site is here [jhu.edu], but looks a bit out of date.
Re:Encrypted filesystem (Score:1)
Re:Encrypted filesystem (Score:1)
Re:Encrypted filesystem (Score:1)
Encrypted loopback root example. (Score:4, Informative)
I am posting this from a notebook computer that has all partitions encrypted except for a boot partition at the front of the disk. The kernel boots an initial ramdisk with an /sbin/init script that does essentially the following, using cryptoapi [sourceforge.net], the successor to the linux "kerneli" patches.
modprobe cryptoapi /dev/discs/disc0/part6 /dev/loop/0 /dev/loop/0 /newroot /newroot ./bin/chroot . ./sbin/init $@
modprobe cryptoloop
modprobe cipher-aes
losetup -e AES
Password:
mount -t ext2
cd
exec
This should work with any disk file system, not just ext2.
I have been using this arrangement for several months now on a couple of computers, the slowest of which is a Kapok 1100M that uses a 233MHz Pentium II process and, I believe, PC-66 SDRAM. On that computer, the change in interactive responsiveness is hard to notice, but it is noticible for disk intensive activities. I have not timed it, but I think that big rsync runs are at least a factor of two slower.
I do not run swapping on these computers, as I've seen claims that there are more potential deadlocks when attempting to swap to an encrypted partition than when attempting to swap to an unencrypted partition.
I hope this information is helpful.
Re:Encrypted loopback root example. (Score:1)
And what about swap?
File systems obselete? (Score:3, Interesting)
Firstly, a much better user interface to objects would be a relational database the user can query anything on.
As for a system interface to objects, why force the objects to be serialized? Use orthogonal persistency. This method is more efficient, and easier for the applications. It actually makes persistency transparent, except for critical applications, that need to persist something now in which case, they can use a journalling interface.
In summary:
- Replace file system persistency with orthogonal persistency.
- Replace the hierarchic-string uesr interface with a relational database.
Re:File systems obselete? (Score:1, Interesting)
Re:File systems obselete? (Score:2)
But could you explain what you mean when you say objects in a filesystem are forced to be serialized? And what orthogonal persistency is. It sure sounds good, but I would really like to know what it means.
Re:File systems obselete? (Score:4, Informative)
Persistency in operating system is usually achieved by writing things to disk, in order to persist them.
Not all data in a file system can be stored as it is in memory, because pointers, and other information must be converted to persistent form. Often objects are stored in very difficult ways to write to disk (by being spread on many small linked objects, for example). This means you must serialize the data into the disk, by converting it to a stream of 1's and 0's, that allows reconstructing the objects' structure. This requires a lot of work for every application and object implementor, as they have to create methods to serialize, and de-serialize the objects, from their normal repserentation to a persistent streamed representation.
And what orthogonal persistency is. It sure sounds good, but I would really like to know what it means.
Orthogonal persistency is persistency implemented by the underlying operating system, rather than every application writer.
The entire system state is saved to disk every once in a while, in a checkpoint.
Mechanisms are used to ensure there's always a stable/reliable checkpoint to go back to. Some schemes even let you roll back to any checkpoint in the past. Typically, checkpoints are done every 5 minutes.
Orthogonal persistency is totally transparent to applications. They seem to 'live forever', and do not need to explicitly persist or serialize their information. They can keep it represented as objects, or whatever representation they choose for their own simplicity.
Orthogonal persistency treats RAM as a cache to the disk, and thus achieves two purposes.
Simplicity: There is only non-volatile memory, rather than volatile, and non-volatile memory, that are allocated and managed separately
Performance: It is much easier to optimize this system, as there are no file caches, and memory swap areas on disk. Instead, you treat the entire RAM as a cache to the disk, allowing simpler and more powerful page caching algorithms, that do not have to guarantee things such as quick disk writes for files, as file systems do.
An amazing advantage for orthogonally persistent systems, is that due to the entire chunk of dirty pages from memory being copied to disk at once, it can sequentially move the disk heads across the disk to update all necessary areas. This process is called migration, and is a far more efficient method of updating the disk from the volatile state, than the explicit update used by current file systems.
Yet another advantage, is that due to the entire system state being preserved as a whole, more powerful security schemes can be used. The whole load-from-file process can be avoided, and with it, the security problems of identifying who has access to what file, and why.
Re:File systems obselete? (Score:3, Informative)
"Serialization" means you take your object and turn it into a stream of bytes of some sort. Some more introspective languages, like Python, Smalltalk, and Java allow very easy serialization, but in something like C you spend a lot of time figuring out how to do it. Even if it is indirect, most files somehow represent an object that was in memory and can be put back into memory at a later time.
"Orthogonal" means that something is seperate from something else -- or more specifically, that while two aspects of a thing are related, you can work with one without effecting the other. Kind of -- it's a subtle (though very useful) notion.
"Orthogonal Peristence" means that all objects persist indefinitely with no effort from the programmer. "Orthogonal" refers to the fact that the persistence happens without any relation to other aspects of the program -- everything just persists by default. While it may involve serialization, this is hidden from the programmer, as is any other technique that supplies the persistance.
In such a system there wouldn't be any distinction between objects in RAM or on a disk -- often that is then expanded to objects that are also remote (similar to CORBA, but again, the network access is orthogonal and invisible). Anyway, the system moves things to disk as it needs to, and pulls them off as needed.
I brought up the cleanliness issue before, but the other issue is scaling. Particularly something like garbage collection is a bit difficult, because you can't just do a mark-and-sweep every so often, because anything on the entire disk could contain a reference.
EROS [eros-os.org] has this, Smalltalks have generally had this (you might wish to look at Squeak [squeak.org]), and the old Lisp machines also tended to have orthogonal persistence.
Re:File systems obselete? (Score:3, Informative)
The other option is a database with dynamic tables, that would somehow fit the data. I don't know how you are going to manage that, though... can any application make tables that make sense for its problem space? How are those tables partitioned off so that you have some degree of safety, that one application doesn't step on another? How are they then integrated, so information from one application can be used in another?
A non-relational database might make more sense, I believe they are often called Object Databases (not to be confused with an OO RDBMS). That's really just a way of saying "orthogonal persistence", except maybe that they aren't completely orthogonal (they require some extra programming to use).
The problem with orthogonal persistence, that I see, is all the junk that can collect. Having used Squeak [squeak.org], which offers a certain sort of persistence in its images, transient objects can pile up fairly easily and lead to a sort of faux-memory-leak in the system. It's a convenient system, but not stable.
Serialization provides a certain discipline -- it's like you have a checkpoint in the application when everything gets consolodated into something well-defined and granular.
Now, you don't have to serialize to apply this sort of discipline. But orthogonal persistence just makes it so damn easy to be undisciplined. I feel like there's some major work to be done to find a way to manage such a large collection of interrelated objects with indefinite lifespans.
Re:File systems obselete? (Score:2)
Hmm, does Squeak lack garbage collection or something? One would imagine that a persisted object would be eligible for collection once there were no more references to it from wherever your persistent object graph is rooted. A persistence system without any roots can even work provided you have a lot of space to store objects that are out of scope and periodically compact it by selecting roots and discarding everything that isn't referenced by them -- basically a copying collector, which you can get away with when you're swapping and have good reference locality.
Managing lots of objects requires a lot of discipline and work, but the existing body of theory is perfectly fine for managing billions of objects. It's just finding the right application of it that's tricky.
Re:File systems obselete? (Score:2)
Also, garbage collection on a few gigs of interrelated data isn't easy. Single-pass definitely won't work, but there are a lot of good incremental garbage collection algorithms.
Current theory can mostly deal with a large number of persistant objects -- though in many areas it's just theory. Current practice definitely can't deal with this sort of persistance.
Re:File systems obselete? (Score:2)
As far as I know, Linux can have uptimes of years, without any memory leaks. This is to show that memory does not necessarily leak throughout time, and an 'infinite uptime' achieved with orthogonal persistence should be possible, as Linux achieves it.
An orthogonally persistent system should actually gain simplicity in many aspects, probably resulting in more stability, too.
Re:File systems obselete? (Score:2)
By very careful development, the kernel has been made very stable. However, applications are far from that stable. So when I say that practice needs to be improved, I mean that the correctness of the kernel has to be extended to the system as a whole. Or some other technique of partitioning has to be created, because the partitioning we use in Unix (processes) is part of what orthogonal persistence seeks to eliminate.
Re:File systems obselete? (Score:2)
Secondly, I was not aware long-uptime Linuxes required restarting their processes due to leaks over time.
If this is the case, process restart support may truly be necessary in an orthogonal persistent system, but its still not relevant to the original dilemma of explicit versus orthogonal persistency.