The State of ZFS On Linux 370
An anonymous reader writes: Richard Yao, one of the most prolific contributors to the ZFSOnLinux project, has put up a post explaining why he thinks the filesystem is definitely production-ready. He says, "ZFS provides strong guarantees for the integrity of [data] from the moment that fsync() returns on a file, an operation on a synchronous file handle is returned or dirty writeback occurs (by default every 5 seconds). These guarantees are enabled by ZFS' disk format, which places all data into a Merkle tree that stores 256-bit checksums and is changed atomically via a two-stage transaction commit.. ... Sharing a common code base with other Open ZFS platforms has given ZFS on Linux the opportunity to rapidly implement features available on other Open ZFS platforms. At present, Illumos is the reference platform in the Open ZFS community and despite its ZFS driver having hundreds of features, ZoL is only behind on about 18 of them."
Working well for me (Score:4, Informative)
Re: Working well for me (Score:2)
Re: Working well for me (Score:2)
Re: (Score:2)
Not all Adaptec controllers are supported by FreeBSD. It would be a "safer" choice to use LSI, since they work great in Linux and FreeBSD: that gives you the option to migrate your host OS should you desire.
Admittedly, if you're changing over that much then buying new controllers isn't a big deal, but I like to have the option of having the "reference" implementation of ZFS just a few minutes away.
Re: (Score:2)
But isn't that the whole point, ZFS is designed to avoid the most common failure modes but it relies on reducing the errors in the data it is using for check summing. Thats why ECC is important, RAM errors are the 2nd most common bit flip error and if that's your comparator, it needs to be accurate.
Sure, but what good is having the RAM store the checksum correctly if the CPU calculated it incorrectly, or if the CPU compares the correct checksum to the identical correct checksum and determines that they don't match?
This is just about diminishing returns and where to draw the line. There are lots of things that can go wrong. Believe it or not people use systems that actually detect and correct CPU logic errors, and I'm sure the people selling them could tell you how often they detect errors.
I'd seriou
Re: (Score:2)
If you decide to chance it, make sure you don't use the "scrub" functionality on ZFS. Scrub can cause memory errors to eat your pool like a cancer.
Or, just use ECC :)
I agree... (Score:2)
The only minor negative things I can say is that when you do have some odd kind of failure ZFS (and this may be the case on BSD and Solaris) gives you some pretty scary messages like "Please recover from backup" but usually exporting and importing the FS brings it back at least in a degraded state. My other caveat might just be my linux distro but I've often had problem
Re: (Score:2)
Maybe your ZIL comments are specific to Linux? It used to be the case in FreeBSD that you had to have the ZIL present to import, and a dead ZIL was a very big problem, but that was many versions ago (~3-4 years?). I personally went through this when I had a ZIL die and the pool was present but unusable. I was able to successfully perform a zpool version upgrade on the "dead" pool, after which I was able to export it and re-import it as functional without the ZIL.
Note that this was NOT a recommended seque
No thanks, I'll stick with ReiserFS (Score:2, Funny)
It's a killer file system. Once you've used it, you won't be able to leave it.
Re: (Score:3)
Groan
Proformance (Score:2)
Is this something that a normal consumer would use for their main storage?
Re: (Score:2)
The checksums don't really take up more physical overhead than a more traditional RAID + LVM setup, and performance is equivalent in my experience (albeit on Solaris 10 and not Linux). There is also the ability to turn on compression, which trades a little bit of CPU overhead for increased disk I/O performance. On a lot of workloads the difference can be dramatic.
If you are already comfortable with RAID + LVM, then I would wholeheartedly recommend ZFS for your main workstation. I would also recommend taking
Re: (Score:2)
Re: (Score:2)
Not really. A SATA3 SSD can push 550MB/sec both ways (limited by SATA3 itself) nowadays - just yo
Re: (Score:2)
I have pushed 4GB/s through a SAS SSD array on ZFS, but even so I maxed out on other stuff way before the CPU and much less checksumming ever began to be an issue (e.g. had to go through two LSI SAS 9200-8e HBAs, because one maxes out the PCI-e 2.0 x8 lanes; with two HBAs I maxed out on the two 6G SAS links to my JBOD). That the point of my post. I've yet to see a system which is constrained by the checksummin
Magic (Score:3)
I've been using ZFS on Linux for about a year. I can summarise my position on the experience with two words: it's magic.
It is still tricky to run one's root system off ZFS (at least on Debian). That, I think, is for those who are brave and have to time to deal with issues that might arise following updates. But for non-root filesystems, ZFS is, as I said, magic. It's fast, reliable, caches intelligently, adaptable to a large variety of mirror/striping/RAID configurations, snapshots with incredible efficiency, and simply works as advertised.
Someone once (before the port to other OSes) said that ZFS was Solaris' "killer app". Having used it in production for a year, I can understand why they said that.
Re: (Score:2)
I ran zfs on freebsd for a few years but gave up on it. at one time, I did a cvsup (like an apt-get update, sort of, on bsd) and it updated zfs code, updated a disk format encoding but you could not revert it! if I had to boot an older version of the o/s (like, before the cvsup) the disk was not readable! that was a showstopper for me and a design style that I object to, VERY MUCH. makes support a nightmare.
I've never seen this in linux with jfs, xfs, ext*fs, even reiser (remember that?) never screwed m
Re: (Score:3)
it updated zfs code, updated a disk format encoding but you could not revert it
You can thank your package maintainer for this. ZFS never ever ever upgrades the on-disk format silently. You always have to do a manual "zpool upgrade" to do it. It'll tell you when a pool's format is out of date in "zpool status", but it'll never do the upgrade by itself.
updating a disk image format and not allowing n-1 version of o/s to read it is a huge design mistake and I'm not sure I understand the reasoning behind it, but until that is changed, I won't run zfs
Again, this is not ZFS' fault, it's your package maintainer for auto-upgrading all your imported zpools. ZFS never does this by itself.
Re: (Score:2)
adaptable to a large variety of mirror/striping/RAID configurations
"Adaptable" is a bit of a stretch here. If you set up a RAID on ZFS, you can't change it, you can only replace individual disks within it, or destroy the entire array.
That isn't a big deal if you're talking about a ZFS filesystem with a very large number of drives, but it is a big limitation for a small ZFS filesystem. That is, if I have 300 disks in 60 arrays of 5 1TB disks each, and I want to move to 3TB disks, then I just need to add 5 3TB disks, turn them into an array, add them to the filesystem, the
Re: (Score:2)
then there is no easy way to replace those with 5 3TB drives one at a time and actually get use out of the extra space.
It's not THAT bad. You do this:
1. Put new disk in usb cradle.
2. Run 'zpool replace', swapping new disk for old disk.
3. Take the new disk and physically replace the old disk.
4. Repeat 1-3 for each new disk until you have the whole array running at the new capacity.
5. If autoexpand is not enabled, run the 'zfs online' command with the '-e' flag to use the new capacity.
I've only used FreeBSD, not Linux - but I presume this would work so long as you are giving ZFS the whole disk. ZFS does not care which interfa
Re: (Score:2)
Do you have a link. The last time I looked into this, you could not add a disk to a raid-z. You could add disks to a zpool, or add another raid-z to a zpool. However, a raid-z was basically immutable. This is in contrast to mdadm where you can add/remove individual disks from a raid5.
Google seems to suggest that this has not changed, however I'd certainly be interested in whether this is the case. The last time I chatted with somebody who was using ZFS in a big way they indicated that this was a limita
Still no SELinux support (Score:2)
Re: (Score:2)
other security systems exist, many believe that SELinux is causes more problems than it solves
Re: (Score:2)
How many of the others are now integrated into the Linux kernel?
Re: (Score:2)
many believe that SELinux is causes more problems than it solves
I've met those people. Not impressed.
Re: (Score:2)
and I've watched SELinux heads waste days trying to figure out why it's killing standard apps that used to work for years
Re: (Score:2)
Yay for me! (Score:2)
Hey, I'm the guy who got modded +5 funny for replying to the 8/10TB disk announcement with "of course they did, I ordered 6TB drives 2 hours ago". Well, I switched my home NAS over to ZFS last month. So, yay for me, for once I'm ahead in at least some minimal sense or other!
Seriously though, I have found ZFS to be a damned good solution so far. (FYI, CentOS, Core i5, 4GB, 6x4TB with 2-disk parity, 2 eSATA -> port multipliers...) I really don't think I will ever deploy hardware RAID again.
I used it for about a year (Score:3)
If you're wondering why ZFS trusts the checksums on the "new" drive instead of reading the entire file, it will read the entire file and compare it to the checksum every time you access it. Once a month by default, it runs a "scrub" where it reads every file and verifies they haven't suffered bit rot and still match the checksums. Apparently the strategy after a dropped drive is to get the redundant filesystem up and running again ASAP, then do the file integrity scrub afterwards at its leisure. (You can manually force this check at any time with a zfs scrub.)
The other main advantage I'd say is that it's incredibly flexible when you're putting together redundant arrays. RAID 5 normally requires 3+ drives or partitions of the same size. ZFS lets you mix together drives, partitions, files (yes, one of your ZFS "drives" can be a file on another filesystem), other devices like SAS drives, etc. You can even put the 3+ "drives" needed for redundancy onto a single drive if you just want to play around with it for testing.
The only problem I ran into was with deduplication. Dedup was part of the reason I decided to try ZFS, and is one of the features frequently mentioned by ZFS advocates. While dedup does work, it is an incredible memory and performance hog. Writes to the ZFS array went from 65+ MB/s (bunch of mixed random files) down to about 8 MB/s with dedup turned on, and memory use climbed to where I ordered more RAM to bump the system up to 16 GB. In the end I decided the approx 2% disk space I was saving with dedup wasn't worth it and disabled it.
I eventually switch to FreeNAS (based on FreeBSD, which has a native port of ZFS) because it was annoying having to reinstall ZFS for Linux after an Ubuntu/Mint update, and I couldn't see myself doing that after every new release because I wanted features which were added to the core OS. (And if you're wondering, dedup performance is just as bad under FreeNAS.)
Re: (Score:2)
Re:rsync causes lockups? (Score:4, Informative)
Is the target not a zfs filesystem as well? If so zfs send/recv allows for replication and handles deltas at the filesystem level. It should be more efficient.
Re: (Score:3)
http://docs.oracle.com/cd/E192... [oracle.com]
Re:rsync causes lockups? (Score:4, Interesting)
Re: (Score:3)
Does the sky fall in if your buffer isn't 'sizable'? Or does it just run a bit slower?
Re:rsync causes lockups? (Score:4, Funny)
The sky won't fall but the walls might.
-Shaka
Re: (Score:3)
They're working on fixing that, but in the mean time you can pipe it through mbuffer or something similar to resolve the issue.
Re:rsync causes lockups? (Score:4, Informative)
Back when I did OpenSolaris work, we used a tool called mbuffer which is basically netcat with a buffer on each end. It wouldn't been suitable for internet backups (no encryption) but it works pretty well for cross campus backups and the like.
IIRC it works like this on the sending side: 'zfs send pool/fs@snap | mbuffer -s 128k -m 4G -O 10.0.0.1:9090'
And on the receive side: 'mbuffer -s 128k -m 4G -I 9090 | zfs receive pool/fs'
It can still be pretty bursty but it smoothes out a lot of it.
Re:rsync causes lockups? (Score:4, Informative)
You can kludge on encryption in the pipeline:
http://sourceforge.net/project... [sourceforge.net]
Re: (Score:3)
Re:rsync causes lockups? (Score:4, Insightful)
I've been using ZFS on linux for years with nightly backup jobs that rely on rsync. I've never had a problem.
Re: (Score:2, Insightful)
The GPL only hates Mad Max post apocalyptic style "freedom".
The FSF rightfully understands that the complete absence of the rule of law simply enables the person with the biggest pile of guns to control things.
Sun seemed like a benign enough master but Oracle far less so.
Re: (Score:2, Insightful)
They are strongly against one set of freedoms in support of the subset of freedoms they deem more important.
Which is fine, but I've always found their choice in terminology and strong focus around the word "free" to be annoying. Consequently I try to avoid using the term "free software" and instead usually opt for "open source", which while it doesn't convey the idea that it's restrictively licensed to ensure it and any derivatives remain open source, it also doesn't falsely convey that it is entirely free
Re: (Score:2)
Well, they're two different things. "Open source" is a design methodology. "Free software" is a social movement. I try my hardest to use only FOSS. I say that from my windows computer at work. But hey, I take my ultrabook with debian on it just about everywhere I go.
Re: (Score:2, Flamebait)
There is "free as in beer" (usually both GPL and BSD). There is "free as in freedom" (BSD). And then there is "free as in free-range chickens" (GPL).
Re: (Score:2)
Sun used their own Open Source license, which they've had for quite a while (and released quite a bit of software over the years using). The issue is "Free" vs "Free" vs "Open Source"
Re: (Score:2)
Re: (Score:2)
If Sun wanted to hate freedom, would they have released it under an open source license, as approved by the OSI?
Re: (Score:2)
Because ZFS has far better features than BtrFS
http://rudd-o.com/linux-and-fr... [rudd-o.com]
It has SOME features which btrfs has not yet implemented. Btrfs also has some features which ZFS has not yet implemented, including support for dynamically resizing a RAID (not adding/removing a RAID from a zpool).
Re: Unfamiliar (Score:5, Informative)
above, below, and at the same level. ZFS is everyt (Score:5, Interesting)
> ZFS is a layer below LVM.
Typically you'd layer raid, then LVM, then the filesystem. ZFS tries to be all three. It's raid, and it's a volume manager, and it's a filesystem. There are some benefits to integration, and some drawbacks. With the raid>lvm>filesystem approach, it's trivial to add dm-cache, bcache, iscsi, or any other piece of storage technology. With ZFS, anything you want to add has to be specifically supported within ZFS.
The Unix tradition is small, single purpose tools that do one thing well. Witness sort, grep, wc, etc. Want to count the log entries that mention Slashdot? You don't need a special tool for that, just grep slashdot | wc -l . Tools like mdadm and lvm are building blocks that can be combined to suit your need, the Unix way. ZFS is a big monolithic package that does everything, much like Microsoft Word or Outlook. ZFS is more in the Microsoft tradition.
Re:above, below, and at the same level. ZFS is eve (Score:4, Interesting)
Re: (Score:3)
Re: (Score:2)
I've wanted to do that in the past, but it was specifically blocked. It's a pretty ugly thing to do, but it does give you a "new" block device that could be imported as a mirror on-demand. With enough drives in the zpool, that new device is nearly independent from its mirror, from a failure perspective.
Re:above, below, and at the same level. ZFS is eve (Score:5, Informative)
Anything that can be represented as a block device can added to a zpool. This also includes files which is handy when your trying to understand complicated interactions you can mock up a small zpool based on files instead of devices for testing.
On the otherside of the abstraction ZFS can also expose block devices called zvols that will be backed by the zpool. So if you wanted to run a dmcrypted EXT4 filesystem backed by a zpool you can certain do that using a zvol and still get all the benefits of ZFS integrity protection and snapshoting.
Plenty of layering can be done with ZFS.
Re: (Score:2)
Not really. Fundamentally, a filesystem's job is to store data in a structured manner on an unstructured array of blocks. For everything ZFS does, it still comes down to that.
There are a great many advantages to having that structure include duplicate blocks and checksums.
If you really prefer, you can reasonably build a non-redundant ZFS pool on top of a RAID volume though you will lose a few advantages that way.
Re: (Score:3)
(I still do things the classic way: filesystem on lvm on luks on mdadm. not using ZFS yet.) I'm not sure it's exactly about what's required.
Consider wear leveling on SSDs. Only the filesystem really understands which blocks need to preserve data and which ones are don't-care. So to do SSDs right, it needs to pass info about unallocated storage down to the volume manager, whch then passes it to the encryption, which then passes it to the RAID, which then gives it to old-school "real" block device (which t
Re:Unfamiliar (Score:5, Interesting)
I too have kinda been watching passively with a kinda "I'll look into this once it's ready" attitude.
The gist as far as I understand it is (again, take with huge helping of salt (it's not that bad for your health any more!), I'm posting these partly to be told I'm wrong):
Pros:
- data integrity (checksums and more rigorous checks that something is actually written to the disk)
Cons:
- cpu and ram overhead (even by current standards, uses a tonne of resources)
- doesn't like hardware raid (apparently a lot of the pros rely on talkign to an actual disk)
- expandability sucks (can be done, but weird rules based on pool sizes and such) compared to most raid levels where you can easily toss a new disk in there and expand.
Re: Unfamiliar (Score:3, Interesting)
Re: (Score:3)
Re: Unfamiliar (Score:4, Informative)
Dedup easily needs 5GB of RAM per TB.
For general usage (no dedup), 1GB per TB is a good rule of thumb.
Re: Unfamiliar (Score:4, Funny)
Dedup easily needs 5GB of RAM per TB.
For general usage (no dedup), 1GB per TB is a good rule of thumb.
This. Don't starve the ARC. You wouldn't like it when it's angry.
Re: (Score:3)
1 GB of RAM is worth about $20 these days anyhow (less?).
And yes, de-dup is expensive. Most of the time in my experience you get far better benefits from compression anyhow (source: real world enterprise datasets at work).
Re: (Score:3)
ZFS only supports on-the-fly dedupe. For batch dedupe, you're probably thinking of HAMMER in DragonFly BSD.
BSD consumes insane amounts of RAM and has a massive performance penalty. It's almost never worth it, because the cost of extra RAM will be more than if you had just bought more disks in the first place.
Compression, on the other hand, requires very little RAM or CPU resources, gives a tangible performance improvement, and saves space. Once ZFS implemented LZ4 (which is extremely fast) it begun making s
Re:Unfamiliar (Score:5, Insightful)
There are so many pros for ZFS that I don't even. Until you try it, you won't "get it" - it's more like trying to describe purple to a life long blind guy. But, I'd adjust your list to at least include:
Pros:
- Data integrity
- Effortless handling of failure scenarios (RAIDZ makes normal RAID look like a child's crayon drawing)
- Snapshots.
- Replication. Imagine being able to DD a drive partition without taking it offline, and with perfect data integrity.
- Clones. Imagine being able to remount an rsync backup from last tuesday, and make changes to it, in seconds, without affecting your backup?
- Scrub. Do an fsck mid-day without affecting any end users. Not only "fix" errors, but actually guarantee the accuracy of the "fix" so that no data is lost or corrupted.
- Expandable. Add capacity at any time with no downtime. Replace every disk in your array with no downtime, and it can automatically use the extra space.
- Redundancy, even on a single device! Can't provide multiple disks, but want to defend against having a block failure corrupting your data?
- Flexible. Imagine having several partitions in your array, and be able to resize them at any time. In seconds. Or, don't bother to specify a size and have each partition use whatever space they need.
- Native compression. Double your disk space, while (sometimes) improving performance! We compressed our database backup filesystem and not only do we see some 70% reduction in disk space usage, we saw a net reduction in system load as IO overhead was significantly reduced.
- Sharp cost savings. ZFS obviates the need for exotic RAID hardware to do all the above. It brings back the "Inexpensive" in RAID. (Remember: "Redundant Array of Inexpensive Disks"?)
Cons: /)
- CPU and RAM overhead comparable to Software RAID 5.
- Requires you to be competent and know how it operates, particularly when adding capacity to an existing pool.
- ECC RAM strongly recommended if using scrub.
- Strongly recommended for data partitions, YMMV for native O/S partitions. (EG:
Re: (Score:2)
- CPU and RAM overhead comparable to Software RAID 5.
In my experience it needs a lot more memory than software RAID5. Something like 1GB per TB of disk space if running RAIDZ. Scrubbing can thrash your CPU pretty good, too.
I ran ZFS for a while on a dedicated file server with a fair amount of disk space (16TB) but switched over to btrfs RAID1 as my hardware wasn't up to ZFS requirements, and I needed the capability to add new drives to the pool which ZFS doesn't handle gracefully.
Re: (Score:2)
In my experience it needs a lot more memory than software RAID5. Something like 1GB per TB of disk space if running RAIDZ.
It appears to use a lot of memory because it replaces the standard kernel disk cache with its own ARC, and as unused memory is wasted memory, the ARC will eat up every last bit of memory you allow it.
Scrubbing can thrash your CPU pretty good, too.
It's performing a checksum of your entire system. That's going to be a CPU hog. BTRFS will be no different in this regard. Still, the default algorithm is fairly lightweight, and on a modern multi-core multi-GHz system, you should be bottlenecked on disk long before you "thrash" your CPU. If you're trying t
Re: (Score:2)
...expandability sucks...
Expansion is different with ZFS. Different does not mean sucks. Different means you need to learn something new.
.
In my experience, it does not suck, but is rather easy to do. I added a couple of disks, ran a couple of commands, and doubled the size of my ZFS pool.
Easy as pie.
Re: (Score:2)
Way back when I looked into it (which again, was a while ago and quite brief, so I may/probably am totally wrong) the big problem seemed to be adding small amounts of storage to a large array.
In my particular use case, I have a 20TB file server (raid6, 12x 2TB drives). Lets say I fill that up and want to add 4 more TB. With my current RAID6/dm-crypt/lvm/xfs setup, this is fairly easy. Add 2 drives and expand everything. With ZFS it seemed hard to add arbitrary amounts of storage like this in most configurat
Re: (Score:2)
If you want to add 4 more TB, then you attach a new set of mirror, and you're left with RAID6(12x)+RAID1(2x). There is zero rebalancing (for better or worse): it's available immediately and transparently. The only catch is that you can't remove it again, but you can replace it with any combination of storage that provides equal or greater capacity to your RAID1(2x).
You could also grow your RAID6, and it's more efficient that it would be on most normal hardware RAID. But please don't do that: RAID5/6 real
Re:Unfamiliar (Score:5, Interesting)
Adding additional drives to a raidz vdev is not supported, no. Apparently it's a use case that is extremely rare in enterprise, which is where zfs was intended for. Adding additional capacity is easy if you have no redundancy (12x2TB drives in a pool? Just add 2x2TB more drives to the pool and boom, more space), but not as easy if you want redundancy.
So you can't expand an existing vdev, but you can add a new vdev to the zpool. For example, say your current configuration is 12x2TB in raidz2 (the zfs equivalent of raid6). That's giving you 20TB of capacity, after redundancy. You need to add 4TB of additional usable capacity...
There are a few options. ZFS doesn't enforce redundancy, so there's nothing stopping you from adding two bare 2TB drives to the zpool. You'd get your extra 4TB, but data on those drives would be unprotected. Instead, you'd probably have to take 4x2TB, put them in a new raidz2 vdev, and then add that to your zpool. Then you'd have 12x2TB & 4x2TB, giving you that 12TB of usable capacity, and every disk in the array has dual redundancy.
My home file server currently has 7x4TB & 8x2TB. They're both raidz2 arrays, in the same zpool, for 32TB of usable capacity on 44TB of raw storage. I started out with 5x2TB in raidz1 and migrated the data between various configurations. The iterations looked like this:
Configuration 1: 5x2TB (raidz1)
Configuration 2: 5x2TB (raidz1) + 5x2TB (raidz1)
Configuration 3: 7x4TB (raidz2) + 8x2TB (raidz2)
The migration process was:
1 to 2: Add the new 5x2TB (raidz1) vdev to the existing storage pool
2 to 3: Add the new 7x4TB (raidz2) vdev to a new storage pool, zfs send the file system from the old pool to the new pool, wipe the old 2TB drives, add back 8 of them in a new raidz2 vdev, add that new vdev to the existing new pool
The server only has 15 hotswap bays (the 2-to-3 migration required opening the case to get some of the drives hooked up directly), so my next migration will involve replacing the 2TB drives with something larger (probably 8TB by the time I need to expand). To do that, the process in zfs is that you replace a drive, re-silver the array, replace a drive, resilver the array, etc. When you have replaced the last drive, zfs automatically will expand the vdev to use the new capacity. Resilvering a completely empty drive is not fast, so I expect the process will probably take me about a week, since I'd probably start a new resilver each night before bed. But since I run raidz2, at no point would I be without redundancy, so it should be safe.
Re: (Score:2)
Adding additional drives to a raidz vdev is not supported, no. Apparently it's a use case that is extremely rare in enterprise, which is where zfs was intended for.
This is largely what has kept me away from zfs, besides it not being in the mainline kernel.
If I had 300 disks then being forced to add/remove them in groups of 5 or so wouldn't be a big deal. When you have just a few disks at 90% capacity, being unable to add/remove them 1 at a time while keeping everything redundant is a much bigger problem (using an n+1 redundancy solution, not an n*2 solution).
One of the things I like about btrfs is that the design is more dynamic in this regard - you can have disks of
Re:Unfamiliar (Score:4, Informative)
So you can't expand an existing vdev
While you cannot add new drives to a vdev, you can expand a vdev by incrementally replacing all of its drives with larger versions. Replace a drive, resilver, replace a drive, resilver... and when you're all done, just export the pool, import it back, and you have the full capacity of the new drives available.
Re: (Score:2)
It's also not great with disks of mixed size. For example, you can't create a 4TB mirror using a 4TB drive and 2x2TB drives (spanned). For home users, who will have a collection of mixed-size disks, you can do things, but it involves partitioning. I ended up doing something along the lines of this guy: http://tentacles666.wordpress.... [wordpress.com]
Re:Unfamiliar (Score:5, Informative)
The CPU and RAM overhead is relatively minimal. You can get away with very few resources, even after enabling compression.
I have a ZFS server ~5 years old right now, serving over 100 NFS and a handful of Samba/Netatalk connections simultaneously (home directories mounted on NFS, SMB and AFP for other mounts). There is a fairly steady 1000-2000 IOPS with spikes up to 100k IOPS, the machine has an uptime over 300 days, the CPU load (8 2.4GHz Xeon CPU's) hovers around 5-10% (100TB of data in 8 RAIDZ2 stripes of 8 disks (2 and 4TB), 800GB in SSD read cache, 120GB in mirrored SSD write cache, directly attached with SAS).
It will off course eat as much RAM as you will give it but for the amount you spend on a halfway decent SAS RAID controller, you can easily buy 100GB of RAM and a set of SSD's. You don't WANT a RAID controller. Regular SAS controllers with ZFS are so much faster; RAID controllers are limited by their on-board chips which are typically sub-GHz RISC (ARM, Intel, MIPS) processors - an external SAS RAID controller will cost you about $2-5000 extra and have a throughput of a few 100MBps and a few 100's of IOPS. In contrast, my setup (36 disks, 4 6G SAS channels) can give a whopping 20Gbps and 1M IOPS.
Re: (Score:2)
you can easily buy 100GB of RAM and a set of SSD
It sounds like you are focusing on a server setting. Would the CPU and RAM overheads be enough to be a concern in a desktop setting?
Re:Unfamiliar (Score:5, Informative)
The point of ZFS is that hardware raid sucks.
With hardware raid you're trusting a small, underpowered embedded computer to manage data at a block level.
1. That computer is purposefully kept in the dark about the data being stored as it's designed to be agnostic. Thus it has no way to gracefully recover from errors. It's either your whole volume is consistent, or an unknown state of corruption. This is bad.
2. RAID schemes are mathematically unable to deal with large modern hard drives. The unavoidable error rates for 4GB+ drives (and their interconnects) mean that you are guaranteed to have corruption within the useful lifetime of the drive. This means even if everything works perfecly with 0 hardware failures, your raid array will have to rebuild sometime in it's lifetime. This is bad. It's why you're stupid to go with RAID5 with large hard drives.
3. RAID controllers are pretty much all unique and their volumes are non portable. They are also not documented well. Your drives are useless without the controller, and even recovering with a new controller of the same type is a crapshoot.
ZFS throws the above model away because:
1. Your computer is fast, has lots of processors, and lots of cheap ram. Why ignore all that and use a small, embedded computer that's slower and costs extra?
2. Being part of the filesystem, it's aware of everything on both the block and the file level. It's aware of every file, the blocks it uses, the checksum of the file, and the checksum of every block. You can give yourself as many or as few redundant blocks as you want for some or all of your files.
3. Your volume can be imported on to any other computer that supports ZFS. It's a standard and is portable.
4. Because of all of the above you and implement a whole list of amazing features you can't even begin to dream of in RAID. Look up what you can do with copy-on-write filesystems and you'll wonder how you ever lived without them. (Basically free versioning/snapshotting that almost parodoxically improves performance at the same time)
Re: (Score:3, Insightful)
I would add to you "cons" list that it requires* ECC RAM, though you should probably be using that anyway.
* It's not technically a requirement, but you'll probably be sorry [freenas.org] if you don't use it.
Re: (Score:2)
One correction, the RAM overhead is only intense if you use deduplication.
One different perspective, it doesn't like hardware RAID, and neither should anybody else at this point. (Yeah, I still have hardware RAID in the field.) With ZFS, you will never have the experience of the replacement RAID controller having a different firmware version and not recognizing your disks. With ZFS, you will never get data corruption from a "write hole". With ZFS, it's actually documented as to wtf the RAID is doing in term
Re: (Score:2)
ZFS is in the middle, more easily expandable than some, but definitely not as good as the easiest.
Yes, ZFS is not a Drobo. You need to plan out your disk usage from the beginning, because you are kind of stuck with it.
For instance, if you have 5 disks and they are all the same size and you want 2 disk redundancy, it is almost a no-brainer to setup a raidz2. The downside is that if you ever want to make the vdev larger by replacing disks, you need to replace all 5 disks to the new larger size... a vdev is limited by the smallest disk. You can mitigate this by putting the same 5 disks into a pair of mirro
Re: Unfamiliar (Score:2)
The main huge feature of filesystems like zfs and btrfs is check summing of the filesystem for enhanced data integrity. Snapshots, data deduplication, etc. are also nice features, but without the check summing, any file system issues will be multiplied (eg wrong bits would be propagated through the snapshots).
Re: (Score:2)
Why would I prefer ZFS over DIF on FC and now SAS2 which does the checksumming in a far more comprehensive manner for me and is agnositic to the file system it is on?
Re: Unfamiliar (Score:3)
Re: (Score:2)
or you could spend 20 minutes reading about it on their web page instead of relying on slashdot summaries, you lazy git
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The main feature is data checksumming. All the other features are just icing on the cake (snapshots, data dedup, etc.). Ars has a good article with illustrations. [arstechnica.com]
Re: (Score:2)
I think "a filesystem with ECC (like memory) and ACID (like a good database)" is as close as you come for an elevator pitch. I've had bits flip, whether it's in memory or on disk or in transit I'm not really sure but it happens. Like for example I create "known good" PARs for a 5GB video and later it fails CRC, If you use a hex editor compare tool it'll show a single bit difference. Backups are neat, but you really want control over bit rot - real bit rot - so you don't end up with slow corruption. ZFS pret
Re: (Score:2)
I've been a member of the Church of Parity ever since I discovered that some of my dutifully backed-up family photos had not only gotten corrupted, but the backup dutifully copied the corruption as well. Ever since, I use backup tools which do a parity check (e.g. Unison) and I try to store important things on ZFS if I can.
In my case I was lucky and I had an older backup without the corruption. But lesson learned... Also, have more than one backup :)
Re: (Score:2)
We run a lot of ZFS on OpenIndiana/Nexenta, but also have some ZoL.
My favorite things about ZFS:
- Simpler volume management -- there's no more LVM layer! A little weird at first, but it really grows on you. Just zpool create, zfs create and you're off and running.
- Huge volumes -- we have a couple in production near 800TB
- Writable snapshots (think FlexClone on NetApp) -- no performance penalty. We have systems with hundreds of snaps and clones.
- Really stable (in our experience, ZFS on *Solaris has been
Re: (Score:2)
Is there a good way to calculate how much RAM you need? I'm considering ZFS for my next server build. It'll be around 10TB.
Re: (Score:2)
It depends partly on what features of ZFS you'll be using, and what types of performance you need. In general, you can run ZFS for an arbitrarily-large disk set with about 2GB of RAM - but you won't be using the memory cache features of ZFS much at all. The more ram you have available, the more it'll assign to the ARC (read cache). If you are running a media fileserver, where every read is a large file and is unique, then the ARC doesn't make much difference. If it's a webserver, where you read the same
Re: (Score:2)
Thank you so much, that's very informative!
Re: (Score:2)
For all the technobabble in that summary, I still don't know what ZFS offers me over other filesystems. Maybe the guys working on the system should do a little marketing course, or work on their 'elevator pitch'...
Here's my attempt...
1.) ZFS does software RAID as its normal mode of existence. It's naturally contested as to whether this is a good thing, but it depends on context. ZFS doing software RAID on a busy MySQL server? Not great. ZFS doing software RAID on a FreeNAS box whose lot in life is to shuffle data two and from a bank of hard disks? Better.
2.) Datasets. These are best described as the lovechild of folders and partitions. Like partitions, they can have their own mount points, their own permissions, stor
Re: (Score:2)
Are you suggesting that if A = B, then B = A? :)
Re: (Score:2)
So you suck as a sys admin. Try another hobby.
Re: (Score:2)
Your linked article doesn't really prove that one should use ECC, it speaks of studies showing wide range of errors, from "roughly one bit error, per hour, per gigabyte of memory to one bit error, per millennium, per gigabyte of memory"
Then it takes google's study as gospel truth, "25,000 to 70,000 errors per billion device hours per megabit (about 2.5^(–7) x 10^(11) error/bit-h[ours])(i.e. about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate), and more than 8% of D
Re: (Score:3, Interesting)
No... their numbers are about right.
And the numbers go back to times before Google existed.
Even on the old Cray Y systems, there was roughly one single bit error every day, corrected by ECC. Every week or so there would be roughly 1 double bit error, recovered by data reload...
The only times the memory got disabled was when double bit errors were NOT recovered OR the error rate exceeded 10 (from my memory, number could be higher) in a day. The hardware itself would remap memory so that the system would keep
Re: (Score:3)
FreeBSD has had ZFS for what, over five years now? They are the reason it exists in any actual use (OpenSolaris/Illumos don't count) on any non-Sun/Oracle platform.
God forbid it take the Linux guys longer to get it up and running when Sun purposely licensed it to be difficult to do so on Linux.
And Linux's wannabee ZFS competitor BTRFS (oooh, look at us) sucks so bad it can't get off the ground.
So, this being Linux, some guys* also designed Btrfs to do the same things in the meantime. How dare they!? Sun released ZFS after 4 years of work; Btrfs, 2. Presumably they were working under more of an "agile" setup? Which doesn't really make sense for an FS but hey.
So what does Linux do.... import (steal) ZFS from OpenZFS/FreeBSD
It's called porting, and I don't see how you can call it "stealing" in any honest way.
and start posting about how great all their work with ZFS is, and how Linux bloggers now say 'oh yeah, ZFS is actually solid, so we can use it'. As if they are the only/first ones to certify ZFS.
If you actually skim the a
Re: (Score:2)
Regardless, every import should bring the same pool online in the same way, regardless of the device names.
Re: (Score:2)
ZFS on Linux would be cooler if they could port Time Slider to Linux from Open Solaris. http://java.dzone.com/news/kil... [dzone.com]
That's what snapper is for.