Mail Store and ZFS
Greetings all:
I have a customer for whom I am doing an email migration for. We are migrating to Messaging Server 6.3 as part of the Sun Java Communications Suite 5. I understand that ZFS is supported in this version for the mail store but my customer has a number of questions that I would like to post here for comments. They are:
1) Filesystem - What filesystem will be used to store the data? Currently we are using a RAID 5 Direct Attached 3.0 TB array. What is the best way to set this up?
a) If ZFS is ready for prime-time we would prefer the use of it given UFS's limitations with 1.0 TB partition - 1 million files per 1.0 TB. If not what is best way to set up UFS given that we have about 1500 student accounts and 500 staff/faculty accounts?
b) Should data be on one partition with 2.5TB of space for data and .5TB space for logs? We also have a second 146GB disk on the v490 that could be used for logs if that is large enough and if i/o to the Direct-Attached array is an issue
c) tempfs appears useful for some files with JCS. Adding more to this in a bit.
2) Quotas - Related to question 1 for how filesystem is broken up if necessary. If using ZFS ignore section on benefits of single-copy email downside if broken into multiple partitions.
a) We would like the largest quota possible while first maintaining critical aspects - system performance, reliability and backups.
b) The average user with a 75MB quota is using approximately 44MB of disk space, for about 2050 users totaling 90GB of data.
c) Is is safe to assume that the same number of users using 500MB quota (assuming 100% usage) will be using 1.026TB of space?
d) Assuming c) is correct, is it safe to assume that giving a 1GB quota will be using 2.052TB?
e) Is 80% of quota a fair estimate of a typical user? Any stats (Gartner, other implementations) to back up a typical use case?
f) If e) is correct, is it safe to give quotas in the typical form
2007-2008 - 1GB
2008-2009 - 1.25 GB
2009-2010 - 1.50 GB
Note: Every Year approximately 400 accounts are deleted on December 31st.
g) Are there any performance issues with assumptions a)-f)?
h) Are there any backup issues with assumptions a_-f)?
i) Are there any other considerations to consider?
j) How hard is it to move a batch of users (say 100) to a new partition on a NAS, SAN, or additional direct-attached users when the filesystem gets full? How long would it take to move 100 1GB accounts? Would it affect system performance in any way? Is this moot with ZFS?
k) Since single-copy efficiency is only used per partition (as stated in Sun's docs) ,is there a way to determine current single-copy efficiency statistics that can be used if multiple less than 1 Terabyte UFS partitions are necessary?
l) If using about 4 million files with 90GB accounts, it it safe to assume 100 million files for 2TB of data? If using ZFS, is cloning or similar ZFS/JCS backup method viable? If using UFS, is imsbackup, volcopy, or other Solaris imaging viable? Any pitfalls to this approach?
[3187 byte] By [
sheger77a] at [2007-11-27 8:13:02]

# 1
Hi,
> I have a customer for whom I am doing an email
> migration for. We are migrating to Messaging Server
> 6.3 as part of the Sun Java Communications Suite 5.
> I understand that ZFS is supported in this version
> for the mail store but my customer has a number of
> questions that I would like to post here for
> comments.
You understand incorrectly. ZFS has not been officially certified for use with messaging server 6.3 - work is being carried out to 'certify' ZFS and SAM-FS for the next release. Other customers are using ZFS and haven't reported significant issues but that's not to say they don't exist (issues that is).
The advice I can offer is if you plan on running ZFS, make sure your system is patched to at least Solaris 10u3 which includes a number of ZFS performance improvements.
Given that messaging server relies so highly on the behavior and speed of an underlying file-system we need to run any number of tests to make sure that messaging server not only works but doesn't hit significant bottlenecks.
They are:
>
> 1) Filesystem - What filesystem will be used to store
> the data? Currently we are using a RAID 5 Direct
> Attached 3.0 TB array. What is the best way to set
> this up?
That is up to the customer as to what filesystem they run. We support UFS and VxFS on Solaris.
Make sure that you systems have fast write times - this is essential. Also you wouldn't want to have a single 3TB partition (file-system) as any filesystem corruption would cause the entire store to be unavailable.
> a) If ZFS is ready for prime-time we would prefer
> the use of it given UFS's limitations with 1.0 TB
> partition - 1 million files per 1.0 TB. If not what
> is best way to set up UFS given that we have about
> 1500 student accounts and 500 staff/faculty
> accounts?
Create multiple partitions of a few hundred GB each until ZFS is 'ready-for-prime-time'.
> b) Should data be on one partition with 2.5TB of
> space for data and .5TB space for logs? We also have
> a second 146GB disk on the v490 that could be used
> for logs if that is large enough and if i/o to the
> Direct-Attached array is an issue
If you can put your logs and mailbox database directory on fast direct-attached that can improve performance. But it would depend on what the load of the system is. One v490 to cater for that number of students/staff shouldn't be a problem. I personally ran a single v480 for ~10,000 staff accounts and another for ~40,000 student accounts without them breaking a sweat.
> c) tempfs appears useful for some files with JCS.
> Adding more to this in a bit.
Set store.dbtmpdir to point to /tmp/mail-store/
That will store the database temporary files on a tmpfs filesystem which improves performance.
> 2) Quotas - Related to question 1 for how filesystem
> is broken up if necessary. If using ZFS ignore
> section on benefits of single-copy email downside if
> broken into multiple partitions.
> a) We would like the largest quota possible while
> first maintaining critical aspects - system
> performance, reliability and backups.
Ok, but the most critical aspect is growth and how you intend to deal with growth. The biggest issue with direct-attach is running out of spare slots etc. and expanding file-systems to cater for store growth.
Remember also that filesystems start to perform badly (fragmentation issues) as they reach high utilisation (95%+) so you don't want to get near this value.
> b) The average user with a 75MB quota is using
> approximately 44MB of disk space, for about 2050
> users totaling 90GB of data.
> c) Is is safe to assume that the same number of
> users using 500MB quota (assuming 100% usage) will
> be using 1.026TB of space?
This is very much site dependant. In my case for a University staff population we were massively oversubscribed (amount of possible usage vs. actual usage) simply because a number of accounts got very little use.
One thing I would implement is some kind of default spam filtering so that 'spam' emails are put into a 'Spam' folder which is cleaned up every 30 days etc. That stopped growth tremendously in our student accounts which we had over 80,000 and for which a large number of students never checked (or used as spam-catches :( )
> d) Assuming c) is correct, is it safe to assume that
> giving a 1GB quota will be using 2.052TB?
Already addressed above.
> e) Is 80% of quota a fair estimate of a typical
> user? Any stats (Gartner, other implementations) to
> back up a typical use case?
My personal experience at a University is that usage was much less then that. We tended to work on the add-more-space-as-required philosophy. So perhaps start with less space but have documented procedures in place on how to increase should the need eventuate.
> f) If e) is correct, is it safe to give quotas in
> the typical form
>2007-2008 - 1GB
> 2008-2009 - 1.25 GB
>2009-2010 - 1.50 GB
> Note: Every Year approximately 400 accounts are
> deleted on December 31st.
> g) Are there any performance issues with assumptions
> a)-f)?
Yes. Larger quotas result in larger inboxes which results in larger imap processes which requires more imap processes (until 64bit version is released) and potentially more I/O for backup/imexpire operations etc.
> h) Are there any backup issues with assumptions
> a_-f)?
They will be larger and take longer? I'm not sure what else you mean? Also consider how you are going to restore in the case of a DR scenario, I know with legato we found it would take 2 weeks to restore the systems from scratch on a per-file level.
Consider also taking block-level backups if your backup software can handle it.
> i) Are there any other considerations to consider?
> j) How hard is it to move a batch of users (say 100)
> to a new partition on a NAS, SAN, or additional
> direct-attached users when the filesystem gets full?
You can use the mboxutil command to shift between messaging partitions which can be mounted to different file-systems/san partitions. Make sure you use the 'relinker' command to restore the hard-links broken during the process.
This procedure should of course be tested BEFORE using on real accounts.
> How long would it take to move 100 1GB accounts?
> Would it affect system performance in any way? Is
> this moot with ZFS?
Do it and find out - thats a 'how long is a piece of string' question.
> k) Since single-copy efficiency is only used per
> partition (as stated in Sun's docs) ,is there a way
> to determine current single-copy efficiency
> statistics that can be used if multiple less than 1
> Terabyte UFS partitions are necessary?
refer to relinker utility documentation. single-copy efficiency also works on the behaviour of your organisation (do you send out bulk-emails to 1 recipient/email or many recipients/email).
> l) If using about 4 million files with 90GB
> accounts, it it safe to assume 100 million files for
> 2TB of data? If using ZFS, is cloning or similar
> ZFS/JCS backup method viable? If using UFS, is
> imsbackup, volcopy, or other Solaris imaging viable?
> Any pitfalls to this approach?
No idea.
Hope this helps.
Shane.
# 2
Shane,
Thank you for the in-depth response to all my customer's questions. I appreciate it.
My understanding of ZFS being supported in 6.3 came from jay_plesset's response in this post: http://forum.java.sun.com/thread.jspa?threadID=5175709
Jay, if you are out there, care to comment?
# 3
I was confused and incorrect. Shane is correct. Official support is due next release. However, some customers are indeed using ZFS, and haven't reported any issues big enough to inspire any warnings to us support folks.
# 4
Hi,
As an aside, I actually did some back-to-back testing to see just what impact ZFS had on overall numbers of IOps and read/write blocks compared to UFS given the fundamental differences in the file-systems.
The results were a bit of a mixed bag, wth ZFS having less IOps and for smaller files (small emails) and less blocks written - so in general larger read/write operations and more efficiently handled - which is good. For larger files (> 100K say) UFS had less overall data written but still had small read/write operations - so ZFS had more 'overhead'.
Backups were affected as a result with ZFS having an additional 10% overhead in the amount of data read off disk compared to UFS but nearly 50% less read operations, so it may be performance is better (disks prefer less larger reads).
How this works out in real life is going to depend on a lot of factors - so I suggest you benchmark against a ZFS and a UFS partition with the same simulated load.
One thing you definitely want to do though is to use LMTP from your MTA front-end to your store backend, this will save considerably on the amount of data written to disk and also the number of LDAP lookups.
Regards,
Shane.
# 5
When you say that ZFS will be supported in the "next release", do you mean the next core patch to 6.3, or the release of 6.4 (which will be approximately when?)
We're interested in investigating the possibility of using ZFS file system compression to reduce disk space. We saw awesome results with using ZFS fs compression in our log repository (60% space reduction, no performance loss.) We're estimating that we could get a 25-35% space reduction on the email stores, but we don't know what effect this would have on performance and reliability. Do you have input on this idea?
How does using LMTP reduce disk usage? Do you mean that it eliminates the need for MTA queues on the stores? But that is transient data and doesn't add up to much. Does LMTP actually reduce disk usage on the store message repository? Does it have something to do with message linking?
# 6
Just wanted to add a little something about ZFS. I am not using on my email server, but in a limited way on another system. Make sure you have lots of memory. ZFS will use it all. It is supposed to release it, and it does, but not fast enough for some applications. I am occasionally getting out of memory on some of my applications the first time they are run and then they will run the second time.
# 7
Hi,
> When you say that ZFS will be supported in the "next
> release", do you mean the next core patch to 6.3, or
> the release of 6.4 (which will be approximately
> when?)
Support is slated for the next full release (namely 6.4 or it could be 7.0.. who knows) which is not going to be before the start of next year.
> We're interested in investigating the possibility of
> using ZFS file system compression to reduce disk
> space. We saw awesome results with using ZFS fs
> compression in our log repository (60% space
> reduction, no performance loss.) We're estimating
> that we could get a 25-35% space reduction on the
> email stores, but we don't know what effect this
> would have on performance and reliability. Do you
> have input on this idea?
I would say that compression is going to naturally have an impact on CPU usage (to compress/decompress) and also may impact the responsiveness of file-system which can reduce the throughput of messaging server.
What I would suggest is that you consider separating off the smaller folder index files from your message files - and just compress the message file partitions. The store.* index files require fast-as-possible access.
You can use the following parameters to dictate the location of the store.* files from the *.msg files:
store.partition.*.messagepath
store.partition.*.path
At the end of the day if you want to go down this path.. please load-test and compare.
> How does using LMTP reduce disk usage? Do you mean
> that it eliminates the need for MTA queues on the
> stores? But that is transient data and doesn't add
> up to much. Does LMTP actually reduce disk usage on
> the store message repository? Does it have something
> to do with message linking?
Not overall usage but data written to disk (I/O). With SMTP you need to queue the email first to stable storage and then eventually relocate that queued email across to the users partition.
LMTP removes the requirement for the relocation since the email is delivered directly to the users partition and hence reduces I/O since you don't need to double-write the same data.
Regards,
Shane.
# 8
This is a good discussion since I am looking at the same issue (on a smaller scale).
What sort of ZFS related mail store issues have you been seeing?
Saying 'any issues big enough' makes me confident on one hand but nervous on the other hand as that implies that there are some issues, just not major issues.
Would this issue (ZFS and Msg Server) be part of why Cluster 3.2 support is not officially there yet for JCS 5 (and is this also coming in the next release of Msg Server rather than a point/core release/patch?
# 9
> This is a good discussion since I am looking at the
> same issue (on a smaller scale).
>
> What sort of ZFS related mail store issues have you
> been seeing?
I've personally heard of none, nor have I seen any myself.
>
> Saying 'any issues big enough' makes me confident on
> one hand but nervous on the other hand as that
> implies that there are some issues, just not major
> issues.
Basically, if there are issues, we don't know of them. What it really means is that if you run across any issues, you will be working in "unsupported" terratory, and our Support folks are likely to tell you that. . .We cannot accept bugs filed against 6.3 for any problems found with ZFS.
>
> Would this issue (ZFS and Msg Server) be part of why
> Cluster 3.2 support is not officially there yet for
> JCS 5 (and is this also coming in the next release of
> Msg Server rather than a point/core release/patch?
Official support means we need to do full QA with it.JCS5 didn't get that with Cluster 3.2, so it's not supported. Ditto ZFS.
# 10
Hi,
> > This is a good discussion since I am looking at the
> > same issue (on a smaller scale).
> >
> > What sort of ZFS related mail store issues have you
> > been seeing?
>
> I've personally heard of none, nor have I seen any
> myself.
Neither have I - although I know of general ZFS performance issues which are being addressed with each release of Solaris 10.
It breaks down to.. it's new and untried (for very large messaging sites at least) so we (Sun) need to step carefully. The following document has some interesting information on ZFS though:
http://www.sun.com/emrkt/campaign_docs/expertexchange/knowledge/solaris_zfs_per f.html#10
> > Saying 'any issues big enough' makes me confident on
> > one hand but nervous on the other hand as that
> > implies that there are some issues, just not major
> > issues.
>
> Basically, if there are issues, we don't know of
> them. What it really means is that if you run across
> any issues, you will be working in "unsupported"
> terratory, and our Support folks are likely to tell
> you that. . .We cannot accept bugs filed against
> 6.3 for any problems found with ZFS.
... doesn't mean we won't log a bug and look at fixing it, but if to resolve the bug means rewriting fundamental components of an existing release then Support folks will push back.
> >
> > Would this issue (ZFS and Msg Server) be part of why
> > Cluster 3.2 support is not officially there yet for
> > JCS 5 (and is this also coming in the next release of
> > Msg Server rather than a point/core release/patch?
>
> Official support means we need to do full QA with it.
> JCS5 didn't get that with Cluster 3.2, so it's not
> supported. Ditto ZFS.
Ditto IE7 - which wasn't released until way after the beta program had completed etc.
Regards,
Shane.
# 11
I understand and appreciate both (yourself and Jay's) answers on this.
Does something like Cluster 3.2 certification generally happen with a major release (i.e. JCS 6 or whatever the next one will be called, 5.1, etc) or as a point release type of deal?
I would imagine that the answer would be 'it depends' where 'depends' is how different product revision X is from Y, etc and how many installations it would impact.
# 12
Hi,
> Does something like Cluster 3.2 certification
> generally happen with a major release (i.e. JCS 6 or
> whatever the next one will be called, 5.1, etc) or as
> a point release type of deal?
Major releases usually. Although this all depends on the kind of pressures that customers apply; the more customers ask for it the quicker it happens.
> I would imagine that the answer would be 'it depends'
> where 'depends' is how different product revision X
> is from Y, etc and how many installations it would
> impact.
Usually depends on how much time the various groups/people have to test and fix any bugs that come up - time being a function of priority :)
Regards,
Shane.