Filesystem configuration for large mail store
Hello all,
Are there any guidelines for setting up a filesystem for a mail store?
We are building a hardware upgrade for an existing mail store server. The new mail store is Messenger 6.2 on two Solaris 9 clustered servers. The system has 50K users with plans to increase to 200K. Each store has a 1TB SAN disk for mail. Each UFS filesystem can be expanded to 5 TB as the number of users grow.
On our initial attempt to migrate the mail store, we created tar files of the old store and loaded them onto the new store. The process failed after 30% of the store was loaded due to running out of inodes.
After some investigation, I discovered that for UFS filesystems of 1T or larger the nbpi (number of bytes per inode) is 1MB by default. This nbpi means that the average file is expected to be 1 MB and you can have 1 M of them. Our mail store has over 4 M files, so running out of inodes make sense (now).
The man page for newfs says that you cannot set nbpi less than 1MB for multi-terabyte filesystems. For filesystems 3 GB to 1 TB, the default nbpi is much lower, 8192. Obviously, for a mail store, there will be a huge number of very small files, not 1MB each.
Sorry for the long-winded prelude, but I'm confused about how to get a mail store - that by nature has many small files - onto a large filesystem that seems to be oriented for large files.
What is the conventional wisdom for setting up a large, scalable (> 1T) filesystem for messenging?
Is it better to use multiple small (< 1T) partitions or start with a 900 GB (force nbpi to 8192) partition and then grow it to 5 TB?
Is the man page wrong and I CAN set the nbpi lower than 1MB for a multi-terabyte filesystem?
Will having lots of small partitions affect the cluster performance in case of failover?
How are others solving this situation?
Regards
[1905 byte] By [
bjatko] at [2007-11-26 9:34:03]

# 1
While I don't claim to be the ultimate guru when it comes to filesystems, many of our larger users use things like Veritas file system, and such.
You can also use the newfs to create more inodes. Not by manipulating the filesize, but directly, if I correctly remember.
As for multiple partitions, we do suggest that for safety, and convenience in doing backups.
A single very large partition can take a long time to run imsbackup on. 10 smaller partitions can be backed up in parallel, much faster....
Also, if you loose a partition (no filesystem is perfect), if it's all in one big one, you're looking at all your users being down. If you have 10 partitions, then 1/10 of your users are down.
# 2
Hi,
> While I don't claim to be the ultimate guru when it
> comes to filesystems, many of our larger users use
> things like Veritas file system, and such.
Veritas was essential prior to UFS gaining logging capability. Performance wise they are now pretty-much on par but VxFS is a bit more flexible. From a troubleshooting point-of-view, having VxFS can add that extra level of complexity - so swings and roundabouts on this one. As was noted, UFS hits inode issues which are less of a concern with VxFS -- but this product also costs extra to license.
> Also, if you loose a partition (no filesystem is
> perfect), if it's all in one big one, you're looking
> at all your users being down. If you have 10
> partitions, then 1/10 of your users are down.
Yes. This is your number-one reason for not wanting overly large partitions (both from a number-of-files/total size perspective). I have been directly involved in exactly what Jay describes, the corruption of a partitions filesystem (VxFS) which had to be restored from backup - fortunately as we had a number of partions (16) only a small proportion of the user-base was affected. Messaging server operates quite happily with a missing partition.
Also you need to think of worse-case scenarios. If you need to do a full FSCK of a filesystem, how long is that going to take for a 5TB filesystem with millions of files? The system is going to be potentially offline whilst this takes place.
Increasing the number of partitions also has its downsides, this will decrease the amount of 'single-copy' capability of messaging server. When an identical email is delivered to many recipients on the same messaging server, the software will keep one copy on disk and hard-link the other copies, reducing storage usage overall. Since hard-links cannot go across file-system partitions, its reduces the amount of savings you get from this technique.
Shane.