Forgot your password?
typodupeerror
Oracle Sun Microsystems Linux

Native ZFS Is Coming To Linux Next Month 273

Posted by timothy
from the is-it-as-magical-as-advertised? dept.
An anonymous reader writes "Phoronix is reporting that an Indian technology company has been porting the ZFS filesystem to Linux and will be releasing it next month as a native kernel module without a dependence on FUSE. 'In terms of how native ZFS for Linux is being handled by this Indian company, they are releasing their ported ZFS code under the Common Development & Distribution License and will not be attempting to go for mainline integration. Instead, this company will just be releasing their CDDL source-code as a build-able kernel module for users and ensuring it does not use any GPL-only symbols where there would be license conflicts. KQ Infotech also seems confident that Oracle will not attempt to take any legal action against them for this work.'"
This discussion has been archived. No new comments can be posted.

Native ZFS Is Coming To Linux Next Month

Comments Filter:
  • by cpicon92 (1157705) <kristianpicon@gmail.com> on Friday August 27, 2010 @08:36PM (#33399712)
    It's open source in the sense that the source is open. Free to view, and free to use as long as you don't distribute it.
  • by stinerman (812158) <nathan,stine&gmail,com> on Friday August 27, 2010 @08:39PM (#33399720) Homepage

    They don't prevent use. They prevent redistribution as part of the whole.

    I can download, build, and install fglrx (which is completely non-free) or this ZFS module. I just can't distribute either module linked into the kernel.

  • by h4rr4r (612664) on Friday August 27, 2010 @09:00PM (#33399844)

    They get paid to include that. Just like Microsoft is now paying Verizon to add bing search to phones, and NASCAR pays Sprint to include their apps.

  • by Sycraft-fu (314770) on Friday August 27, 2010 @09:01PM (#33399846)

    Seems a little early to be putting faith in that. It's feature list looks good, on par with other modern desktop file systems like HFS+ and NTFS. However it is currently unstable. When will that be fixed? Who knows? Maybe it moved full steam ahead and we have a stable, capable file system next month. Maybe the project loses steam and languishes and 4 years from now it is still "unstable" and "coming soon."

    You can't really say how well it'll work until there is stable code to test. Remember designing a file system isn't the real hard part. I'm not saying it is trivial work or that it is unimportant but it is by far the easier part of all this. You can write out a specification that sounds great on paper, but then you have to implement it. That is the much harder part. You have to make it fast, stable, not corrupt data, able to do everything it should and so on.

    This is part of the reason why NTFS on Linux has been so tricky. It is actually pretty well documented in the Windows Internals book, and other places, but it is a complex file system. FAT, on the other hand, is real simple and thus not hard to implement.

    As an example you can look at driver sized. The NTFS driver in Windows is 1.6MB. The FAT driver, on the other hand which supports multiple versions of FAT, is only 200k. The NTFS kernel driver is one of the very largest in the system, only the ATi video driver (much larger) and TCP/IP stack (a bit larger) are bigger than it on my system.

    So we'll see what happens with btrfs. As of late, there's not been much activity. The last version update was June 2009. Maybe they are rolling up final testing for production release, or maybe things have slowed down and release is not near. We'll just have to wait and see, but it is foolish to believe this will be the Next Big Thing(tm) at this point.

  • by mysidia (191772) on Friday August 27, 2010 @09:08PM (#33399888)

    I don't know if that's true. I know you probably can't redistribute the kernel with the CDDL bits

    Just like you can't distribute a Linux distribution and include nVidia drivers? Tell that to the distros [wikipedia.org].

    But I think it may fall on death ears. Or you might hear back a reminder that mere aggregation on a storage medium is exempted in the GPL... as long as the non-free package is separated from the GPL'd package.

  • Re:Good Article (Score:3, Informative)

    by bill_mcgonigle (4333) * on Friday August 27, 2010 @09:10PM (#33399904) Homepage Journal

    None have been filed since it was production-ready last year.

    It's not. Yet. There are many reports of lock-ups with uptimes on the order of a week. Soon, I hope, but don't set people up to hate on it.

    Besides, what would they sue over? The FreeBSD team using code that Sun deliberately and explicitly licensed for such things?

    It's not Sun you need to worry about, it's NetApp.

  • Re:Good Article (Score:3, Informative)

    by h4rr4r (612664) on Friday August 27, 2010 @09:16PM (#33399944)

    It is not production ready, I know I tested it. The next version should fix those gripes. Patents are what Oracle will sue over.

  • by Christophotron (812632) on Friday August 27, 2010 @09:26PM (#33399990)
    BTRFS is not that unstable really.. I have been running for a few months now, since the on-disk file structure was finalized. it's in a raid 1 configuration across 2 300gig drives on one of my home servers and it hasn't had a hiccup yet, even with lots of file i/o. i think it would like more than the CPU and RAM I gave it, but its still less resource intensive than ZFS. AFAIK ZFS would not even run on that machine due to the 32 bit processor and only 512mb of RAM. Some of the features are not implemented yet but it is certainly stable enough to test..
  • by EvanED (569694) <evaned@@@gmail...com> on Friday August 27, 2010 @09:39PM (#33400066)

    NTFS doesn't do COW, but it's had snapshotting for a while under the name "volume shadow copy". This was added in XP or 2003, and even given somewhat of a UI in the form of "previous versions" in Vista.

  • by coerciblegerm (1829798) on Friday August 27, 2010 @09:53PM (#33400132)

    No, Sun used the CDDL because they hate the restrictions on GPL. The sharing issues go both ways, Sun wanted to keep some ownership. It's not like the BSD license exists just to spite GPL.

    This is the third time I've seen someone post something to this effect in the past week. I smell a smear campaign. Nonetheless, I'm calling BS here. Daneese Cooper, one of the individuals who helped draft the CDDL, stated that they based the CDDL on the MPL "partially because it is GPL incompatible. That was part of the design when they released OpenSolaris." It was made deliberately GPL-incompatible, but this has nothing to do with 'restrictions' in the GPL.

  • In Windows 2003 (Score:3, Informative)

    by Sycraft-fu (314770) on Friday August 27, 2010 @10:05PM (#33400190)

    It was deployed to desktops, and on by default, in Windows Vista/7. It does copy on write and maintains old snapshots of files automatically. On the server side, there is some more management of this if you like. This snapshotting feature is also used by backup utilities to do hot backups. Ghost and TrueImage can image a running system using it. They can snapshot the state for backup and new data can be committed while they work, without messing with anything. Works great. That is also independent of the maintaining of old versions so you can shut that down if you like and still do snapshots for backups.

  • Re:Good Article (Score:3, Informative)

    by mysidia (191772) on Friday August 27, 2010 @10:09PM (#33400212)

    Because ZFS is not production quality on a 32-bit CPU or with less than at least additional 2GB of RAM available for ARC, even on Solaris where ZFS is most mature. Bare minimum for ZFS: 1Gb RAM, 64bit proc.

    If you have a 32-bit CPU or less than 2GB system RAM, use UFS or Ext3, forget about ZFS for such hardware configurations, unless you want to experience pain (system hangs, memory starvation, crashes / Panics due to 32-bit address space squeeze causing fragmentation and ultimately inability to allocate ARC efficiently).

  • by shutdown -p now (807394) on Friday August 27, 2010 @10:12PM (#33400226) Journal

    The whole point of snapshots is that you don't freeze the IO. The snapshot service provides you a, well, snapshot of how things were at the moment it was requested, and maintains that snapshot even as other applications keep writing data. It's roughly similar to MVCC, only the units are FS blocks, not database records.

  • Re:ZFS recap (Score:3, Informative)

    by Anonymous Coward on Friday August 27, 2010 @10:20PM (#33400274)

    We've heard much about ZFS, but being a slashdotter, I can't recklessly go on and RTFA. So, maybe someone here can recap its main benefits. Maybe a power point slide?

    Here's a good PDF on it:

            http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf

    Here's the PDF being presented by the co-creators, Jeff Bonwick and Bill Moore:

            http://blogs.sun.com/video/entry/zfs_the_last_word_in

    Three parts, one hour each. Streamable blip.tv as well as a downloadable M4V file.

    Two, ten minute videos:

            http://www.youtube.com/watch?v=gthel59G56c
            http://www.youtube.com/watch?v=OdHUub462pM

    Though I recommend you set aside the three hours (even if it's over several days) to really get a good understanding of how things work.

  • by benjymouse (756774) on Friday August 27, 2010 @10:26PM (#33400298)

    So you are suggesting I can freeze IO to the machine, then run a snapshot command on NTFS?

    I would be glad to hear it.

    The Volume Shadow Service (VSS) is always running (by default). Backup utilities - including the ones which come with Windows - use VSS to create a snapshot and perform backup from that point in time. It doesn't freeze IO; rather it goes to copy-on-write.

    On server versions you can also create snapshots interactively by using the vssadmin tool.

    Shares can be set up to create a shadow copies multiple times per day. This is not copy on every write - but it *is* copy on write once a block is part of a snapshot. Any client (plugin needed for XP, IIRC) can display previous versions which are available snapshots.

    VSS actually goes beyond NTFS integration (which is probably why it is a service and not just a NTFS feature). Certain applications - e.g. Exchange, SQL Server and Hyper-V - also integrate with VSS. Instead of VSS operating directly on e.g. SQL Server files, it integrates with the server to create a snapshot for the database files. During restore the system knows how some applications took part in the shadow copy. This ensures that I can correctly restore *all* the files needed to bring a SQL server database back to a certain point-in-time. It also allows the SQL server to prune the log automatically.

    I have a Server2008R2 which has several Hyper-V images (development and testing). When I perform a backup of the server, VSS interacts with Hyper-V to perform backup of the virtual machines as well. A Server2003 which hasn't been set up to support VSS is actually "hibernated" by Hyper-V/VSS - then backed up - then brought back into running state. That could be considered "freezing IO", I suppose.

  • by Christophotron (812632) on Friday August 27, 2010 @10:28PM (#33400318)
    I agree. BTRFS is definitely not ready for production or for storage of anything important that is not backed up elsewhere. It has known bugs, like for example the reported free space on a raid 1 will show the total disk size and not the actual free space, so it may be dangerous to fill the array too close to 100% (shown as 50% in df). It is unclear when (or if) it will be ready, but it is being worked on -- I've seen updates for the userland tools in Debian testing, and the newer kernels have updates for the fs driver. The bug I mentioned is fixed in 2.6.33, I believe. I was only countering the argument that it is too unstable even to test it out. That is untrue. Heck, even Linus Torvalds reportedly uses BTRFS as the root filesystem on one of his laptops.
  • by Anonymous Coward on Friday August 27, 2010 @11:54PM (#33400698)

    This is the third time I've seen someone post something to this effect in the past week. I smell a smear campaign.

    Nonetheless, I'm calling BS here. Daneese Cooper, one of the individuals who helped draft the CDDL, stated that they based the CDDL on the MPL "partially because it is GPL incompatible. That was part of the design when they released OpenSolaris." It was made deliberately GPL-incompatible, but this has nothing to do with 'restrictions' in the GPL.

    And Cooper's assertion was reject by Simon Phipps, Sun's Chief Open Source Officer for quite a while (before leaving Oracle in the last few weeks):

    http://www.opensolaris.org/jive/message.jspa?messageID=55013#55008
    http://en.wikipedia.org/wiki/Common_Development_and_Distribution_License#GPL_incompatibility_controversy

  • by Cyberax (705495) on Saturday August 28, 2010 @12:26AM (#33400854)

    "*symbolic links to files, incorrect NTFS has supported reparse points since Windows 2000"

    Incorrect. Reparse points apply only to directories, not files.

    "*no support for RAIDs, incorrect Server versions support RAID 0, 1, and 5"

    On block level. No filesystem support, like in BTRFS or ZFS.

    "*no support for dynamic resizing, incorrect Windows 2003 added support for dynamic growth for non-system/non-boot volumes, 2008 added dynamic grow and shrink for all volumes."

    Only for 'dynamic' disks which are undocumented and shrinking also doesn't always work.

    "It also supports compression, encryption, ACL's, Metadata, and ridiculously large volumes and files."

    Linux filesystems support compression (btrfs), ACLs (POSIX, SELinux), metadata (extended attributes) and ridiculously large volumes.

  • by Anonymous Coward on Saturday August 28, 2010 @02:12AM (#33401300)

    And the source is available now: http://github.com/behlendorf/zfs/wiki

    Has nobody seen this?

  • by stiller (451878) on Saturday August 28, 2010 @02:37AM (#33401382) Homepage Journal

    Note that Btrfs does not yet have a fsck tool that can fix errors. While Btrfs is stable on a stable machine, it is currently possible to corrupt a filesystem irrecoverably if your machine crashes or loses power on disks that don't handle flush requests correctly. This will be fixed when the fsck tool is ready.

    https://btrfs.wiki.kernel.org/index.php/Main_Page [kernel.org]

  • by diegocg (1680514) on Saturday August 28, 2010 @03:15AM (#33401514)

    The ZFS design makes this very difficult. Btrfs, on the other hand, has supported this feature for a long time, thanks to a nice design feature called backrefs.

  • by Anonymous Coward on Saturday August 28, 2010 @06:26AM (#33402014)

    when it comes to license compatibility issues in general, it is the GPL which is decidedly incompatible with every other license.

    That's FUD if I've ever seen FUD. Check out the FSF's list of free software licenses [gnu.org]; there's many licenses that ARE GPL-compatible. Excluding the GNU licenses themselves, there's at least Apache 2.0, Artistic 2.0, Berkeley DB, Boost, Modified BSD, CeCILL, Clear BSD, Cryptix, eCos 2.0, Educational Community 2.0, Eiffel Forum 2, EU Datagrid, Expat, FreeBSD (!), FreeType, iMatix, Independent JPEG Group, imlib2, Intel Open Source, ISC, NCSA, Netscape Javascript, OpenLDAP, Perl 5, PD, Python 2, Python up to 1.6, Ruby, SGI B 2.0, SML/NJ, Unicode, VIM 6.1+, w3c, webm, WFTPL 2, X11, XFree86 1.1, zlib and Zope 2.

    And keep in mind that these are *licenses*; in reality, most projects won't even bother making up their own licenses. "Decidedly incompatible with every other license". Sheesh!

    some GPL advocates tend to view those who choose a non-GPL license as trying to thwart GNU and/or Linux so they don't have to admit that maybe other licenses have terms and conditions that have their own merit.

    Who are those mysterious "GPL advocates" you mention, then? Also, what does this have to do with a situation where Sun really WAS trying to "thwart GNU and/or Linux", by its own admission?

    Look, the CDDL isn't a bad license per se, and the FSF page linked above lists it as a free software license, too, if a GPL-incompatible one (it does urge you not to use it for that reason, but hey, this *is* the FSF). But the original point was that Sun wanted to make sure that ZFS etc. would not be available on Linux, and they chose/engineered a GPL-incompatible license specifically to ensure that. You're not even contesting that anymore, so why are you still arguing about the whole thing?

    It's a fact. Sun didn't want Linux to get ZFS. Get over it.

  • by TheRaven64 (641858) on Saturday August 28, 2010 @06:58AM (#33402110) Journal

    Comparing hammer to ZFS is also a bit silly. Hammer was developed precisely because ZFS did not solve the problem that DragonflyBSD wanted solved. ZFS is designed for large SANs controlled from a central server. Hammer is designed to allow you to treat every disk on a network as part of the same storage pool. They are diametrically opposed objectives, and a filesystem designed to do both would need to either make painful compromises or have so much variation in code paths that it would effectively be two different filesystems.

    You can do something similar with ZFS in FreeBSD, because ZFS slots into the GEOM system and can use any GEOM provider as the backing store, meaning that you can use remote partitions exported over the network, but you'd need a massive amount of configuration and get a lot of fragility for something that hammer does automatically and reliably. Conversely, hammer has incredibly poor performance on a number of workloads where ZFS does very well and doesn't provide the same level of redundancy on a single machine.

    Btrfs, at the moment, is largely vapourware. It might become something impressive in the future, but for now it is not.

    Either way, porting ZFS to Linux is probably a mistake. The FreeBSD port has some performance issues from the mismatch between the design of the ZFS code and the rest of the kernel, but more importantly it's not as flexible as it could be. ZFS is highly modular. The FreeBSD GEOM stack is also incredibly modular. If you were doing a native ZFS implementation for FreeBSD, you'd rewrite each of the components of ZFS as a separate GEOM module. Instead, the entire ZFS stack is exposed, more or less, as a single GEOM module. A lot of the potential flexibility of ZFS is lost by doing this, but it's done because it's much easier than a complete reimplementation.

  • Re:Good Article (Score:5, Informative)

    by TheRaven64 (641858) on Saturday August 28, 2010 @07:45AM (#33402230) Journal

    My phone runs linux and is not x86 of any shape or register size, nor is my workstation, nor are many other machines I have running linux

    I can't speak for the Linux version, but ZFS on FreeBSD needs x86-64 for three reasons:

    First, and most simply, this is the platform that all of the ZFS developers use, so it is the one that is most tested. This doesn't mean that it won't work elsewhere, it just means that it is not well tested anywhere else.

    The second is a performance consideration. ZFS uses a lot of 64-bit arithmetic for computing checksums and so on. On most 32-bit platforms, doing 64-bit arithmetic means that you need to split the operands between two registers, effectively halving the number of GPRs that you have to work with. On x86-32, this basically limits you to 2 registers, which cripples performance - every operation involves some stack spills. This is an x86-specific limitation. On ARM, for example, you have 16 32-bit registers, which can be viewed as 8 64-bit registers for certain instructions. Doing a lot of 64-bit arithmetic on an ARM chip still doesn't generate as much register pressure as even doing 32-bit operations on x86.

    The final limitation is memory. ZFS likes to have 600MB or so of kernel memory. On x86, the divide between kernel and userspace memory is typically done using segmentation. The kernel has one segment, marked in the GDT as requiring ring-0 permission to access. When you switch to kernel space, the segment register points to this entry. In userspace, you use other segments (sometimes just one per process, sometimes one for stack, one for heap, and so on, sometimes one for all processes with some churn between them). With other implementations, this is done at the page level, although that's more expensive. The kernel's memory, however, is always mapped into the userspace process's address space - it just isn't always accessible.

    The reason for this is that x86 lacks sensible TLB controls. If the kernel's address space were not mapped in this way, then every system call would require a TLB flush, which would impact performance. The more address space that you allocate to the kernel, the less you give to userspace apps. If the kernel has 2GB of address space, userland apps can only have 2GB each. On ARM, each TLB entry is tagged with an ASID. The kernel and userspace programs' address spaces are entirely separate, but transitions between the two don't require a TLB flush because the userspace process can't see entries tagged with the kernel's ASID.

    Rather than saying that ZFS requires 64-bit, or requires x86-64, it's more accurate to say that it won't work (well) on x86-32 due to inherent limitations of the platform. That doesn't mean that it won't work well on other 32-bit or 64-bit architectures which are less braindead.

Economics is extremely useful as a form of employment for economists. -- John Kenneth Galbraith

Working...