zfs filesystem screwed up?
Hi,
I am running S10U3 (all patches applied).
Today by mistake I extracted a big (4.5G) tar archive into my home directory (on ZFS) which ran out of space and the tar command terminated with the error "Disk quota exceeded" (it should have been something like "No space left on device" ?)
I think the zfs filesystem got screwed. Now I am unable to delete any file with rm as unlink(2) fails with error 49 (EDQUOT).
I can't login because there is no space on left on /home.
I even tried to delete files as root but I still get EDQUOT.
Files can be read though.
I tried zpool scrub (not sure what that does) and it shows no errors.
zpool status shows no errors either.
I am confident that my drive is not faulty.
Restarting the system didn't help either.
I had put all my important stuff on that zfs FS thinking that it would be safe but I never expected that such a problem would ever occur.
What should I do? Any suggestions?
Is zfs completely reliable or are there any known problems?
[1081 byte] By [
ephemeraa] at [2007-11-27 8:17:05]

# 1
I think I understand what has happened here. Although its fairly disturbing that ZFS allows it to happen.
ZFS uses atomic operations to update filesytem metadata.
This is implemented as follows. When a directory is updated a shadow copy of it and all its parents is created all the way to the root "superblock".
Then the existing superblock is swapped for the shadow superblock as an atomic operation.
A file deletion is an metadata operation like any other and requires making shadow copies
So what I think has happened is that the filesystem is so full that it can't find space to make the shadow copies to allow a delete.
You'd think that ZFS would maintain an emergency pool for this sort of circumstance.
So one way out is if you can add an extra device even a small one to the pool.
That will give you enough space to delete.
Of course since you can never remove a device from a pool you'll be stuck with it.
You could backup, scrub the filesystem and restore to remove the unwanted device.
If you don't have an extra device to add to the pool, then I guess you can backup, scrub and restore making sure to exclude the unwanted data...
You could try asking on the opensolaris zfs forum's.
They might have a special technique for dealing with it.
I find it kind of disturbing that sun allows a filesystem into production that can get itself wedged in this way..
# 2
Have you by any chance created a zfs snapshot, if so the file deletion can consume more disk space, because a new version of the directory needs to be created to reflect the new state of the namespace. This behavior means that you can get an unexpected EDQUOT when attempting to remove a file.
If so you may have to zfs destroy the snapshot then try deleting the file to recover space.
# 3
Robert,
> ZFS uses atomic operations to update filesytem metadata.
This is implemented as follows. When a directory is updated a shadow copy of it and all its parents is created all the way to the root "superblock".
Then the existing superblock is swapped for the shadow superblock as an atomic operation.
A file deletion is an metadata operation like any other and requires making shadow copies
So what I think has happened is that the filesystem is so full that it can't find space to make the shadow copies to allow a delete.
Thanks for the explanation, probably that's what happened but I would consider it a very weak design if a user can cripple the FS just by filling it up.
> So one way out is if you can add an extra device even a small one to the pool.
That will give you enough space to delete.
Of course since you can never remove a device from a pool you'll be stuck with it.
I would have certainly liked to do this but this is just my desktop computer and I have only 1 hard disc with no extra space.
> You could try asking on the opensolaris zfs forum's.
They might have a special technique for dealing with it
The guys at the opensolaris forums don't like to answer Solaris problems but anyway I will give it a try.
Thankfully, I lost no data because I had backups and because the damaged ZFS was readable, so the only damage done was a loss of confidence in ZFS.