Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't really understand the hype around ZFS. What's so great about it?

I've tried it out on OpenSolaris a few times in an attempt to learn why it's so hyped up. The setup process is a bit cryptic and confusing. Even after the initial setup I was really confused about what commands were destructive and which ones were not. Based on that alone I would not consider using ZFS yet. The flexibility of ZFS was a bit lost on me. It seems like there's a ton of limitations and caveats to consider. I can't really comment on performance or reliability since this was a short lab test but in the last 5 years or so I've never lost a HFS+, EXT3 or NTFS file system so I'm not sure just how much more reliable ZFS can be. Is all this hype just acronym lust? I feel like a good RAID card combined with a semi-modern FS is still a better solution. It's certainly easier to setup IMO.



The hype is that with ZFS you don't need an expensive/extra point of failure RAID card and RAID card config, plus a logical volume manager with it's own utilities plus a filesystem plus a quota management system - you plug disks in, ZFS does the RAID and the volume management and is the filesystem and the quotas all in one set of tools (quite few / simple commands for basic usage, at least).

On top of that, ZFS stores all data along with checksums, so when you read a file it will pick up errors and correct them. How do you know the photos you haven't accessed recently are OK? That they will not be corrupt? You don't - you assume that if the disk shows up and the directory list shows up then they will be there. Wih ZFS, you set up scheduled scrubs and it verifies all the data is present, readable and correct - and because of this regular check, it picks up hardware failures more quickly.

Also, it has instant (well, small constant time) snapshots, which doubles as a time-machine/filesystem versioning system so you can see files as they were, and you can clone snapshots to take backups / send them over a network to keep a clone in sync.

Also, it was going to be Zettabyte File System (rumour) because it was designed to take filesystem size limits out of consideration.

Did I mention it writes data, then updates a pointer to the data in one operation so as long as your disks /controller do not cheap out and misrepresent what they are doing, data is always in the old or the new state - never half written.

Also, because of the lack of RAID controller, you can move the disks to another system and attach the zpool with leas requirements in an emergency.

That's pretty awesome - not just acronyms and buzzwords.


ZFS allows RAID-5 without the "write hole" (google it). That's the killer advantage for me. The rest is just gravy.

I'm amazed you found set-up cryptic. Once I understood the concepts of ZFS pools and filesystems, I found the toolset among the most elegant and well-designed out there.

I think it might actually be a little daunting in it's simplicity, maybe. ("that's it?!")


I think that might be it. I found it a bit hard to visual what was actually happening with these commands since they are all pretty simple.


block level checksumming. http://storagemojo.com/2007/09/19/cerns-data-corruption-rese...

If you have never lost a file system, I'm guessing you haven't had very many.

Also, ZFS gracefully handles cheap disk. RAID cards don't handle the case where you are using consumer grade disk and it retries for hours rather than failing. (I recently tested this on a rather nice 3ware 9550SX. It handled the problem no better than linux md. Performance was worse on the 3ware, too. I'm sending it back.)

I am personally switching to 'enterprise' (read, 20% more cost for 30% less capacity) sata just so I get drives that actually fail rather than retrying when they are failed.

And this is ignoring the 'silent corruption' issue the CERN guys were on about. Block level checksumming fixes that too.

Also, cheap snapshots are awesome. LVM snapshots work, and they can save your ass in a pinch, but they are expensive performance-wise. In my case, where I'm pushing my disk to it's performance limits already, LVM snapshots are nearly unusable. ZFS snapshots, I understand, have almost zero performance overhead. (space is cheap. Performance, not so much.)


This has nothing to do with ZFS, but FYI: you can configure consumer-grade Western Digital drives to report failures immediately, rather than retrying, just like their more expensive "enterprise" counterparts. See http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery for details. A bit of Googling will turn up the WDTLER.exe binary that you need; make a bootable FreeDOS CD or USB stick, then run WDTLER from the command line to toggle the feature on or off.

Possibly there are other good reasons to buy the enterprise models, but using the cheaper drives is working out for me. I'm running 6 WD15EADS drives with TLER disabled in a software RAID6 configuration on a GNU/Linux machine, and except for the fact that two of the drives failed within hours of first use (the replacement drives work fine), I've had no problems running the array 24/7 for over a month now.


Yeah, I have 2 wd 'black' consumer grade drives fixed with WDTLER paired with some seagate ES drives right now. (I mix makes to avoid firmware plagues) I read that WDTLER only works on 1TB and smaller drives. I will have to find some 1.5TB WD drives and experiment.

As for the relation to ZFS, it is supposed to timeout ill-behaving consumer drives like this. In fact, from what I understand, you can configure ZFS's behavior, for example, to fail the first drive that times out, and then hang on the second drive (which would perhaps be a reasonable configuration for a ZRAID volume)


WDTLER works on the WD15EADS (1.5TB), as that's what I'm using.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: