CPU-bound during backups with imsbackup?
I was wondering if anyone else that's running imsbackup with
Networker (using the imsasm that comes with JES) is totally CPU
bound? Our store server is a 1280 with 8 cpus and 16GB of memory.
We're doing full backups of around 580GB and it's taking us 8-10 hours.
Here's what sar looks like during our backup window (starting at 17:30):
00:00:01%usr%sys%wio%idle
17:00:00916569
17:20:01915670
17:40:001435842
18:00:01287110
18:20:00286633
18:40:00286921
19:00:00286831
19:20:00276742
19:40:00286921
20:00:01286832
20:20:01286821
20:40:01286921
21:00:00286821
21:20:00296821
21:40:00296821
22:00:00297011
22:20:00286921
22:40:00296911
23:00:00286921
23:20:00236764
23:40:01286921
And our version (yeah, we've been too busy to keep current):
Sun Java(tm) System Messaging Server 6.2-3.04 (built Jul 15 2005)
libimta.so 6.2-3.04 (built 01:43:03, Jul 15 2005)
SunOS ash.usg.tufts.edu 5.9 Generic_118558-28 sun4u sparc SUNW,Netra-T12
[1120 byte] By [
jwasilkoa] at [2007-11-27 0:08:59]

# 1
You're sure you're not disk bound, too? That's typically the limit on backups rather than cpu. Do you have an idea what process is using all the cpu time?
# 2
It looks like imsbackup is the cpu hog. We're not even close to being diskbound (there's very little waitio time). We're only seeing about 1100-1200 ops/second peak/aggregate to 98 disks.
The current theory is that we're seeing performance problems with the Sun Cluster global filesystem. I'm going to see if I can switch the mail spool partitions to not be globally mounted and see how that looks.
One of the interesting things I noticed was that munmap() in imsbackup seems to responsible for the bulk of the kernel time:
syscallsecondscalls errors
_exit.0001
read .00031
write.4495691
open .7381059
close.0411069
unlink.0001
time .0002
lseek.00062
fstat.0421059
fdsync.0081
fcntl.0002
lwp_park.1563306
lwp_unpark.1033306
poll .0001
sigprocmask .0003
mmap .4321059
munmap1.2361071
yield.004108
lwp_exit.0001
lwp_wait.0001
lwp_mutex_wakeup .00016
lwp_mutex_lock.00017
llseek.0001
stat64.0001
open64.0011
---
sys totals: 3.218177863
usr time:.572
elapsed:35.530
Message was edited by:
jwasilko
# 3
Hi,
Probably an obvious question, but have you set store.dbtmpdir to be a memory-mapped file-system (e.g. /tmp/msg-<your mailstore hostname>)? This will improve the backup speed/throughput.
How many imsbackup processes are you running at once? Are you using backup-groups to parallel them up?
As Jay noted, I/O issues tend to be the biggest issue, although Sun Cluster global filesystem may be a factor. Has this problem been getting worse over time or is this just the first time you noticed it?
What you have described is 'out-of-the-ordinary' so getting an idea of your environment would help.
If you are in a SAN environment, you may also want to consider OS level drivers/HBA drivers as well - have you patched these recently?
On a side note, I would recommend patching up to -63 if for no other reason then to prevent the following bug:
6441637 imsrestore crashes when many IMAP flags are set on message
This has caught a number of customers out, especially when they are in a hurry to restore an account and imsrestore keeps crashing on that one account.
Regards,
Shane.
# 4
Hi,
If you are using global filesystem...are you using the StoragePlus with affinty on true in the resource group of messaging store.
I wonder why you have used the global filesystem at all...for the store...
sometimes this will result in the usgae of private interconnect to access the store fileystem from the active node -- if the disk resource is active on the other node and you are running imsbackup from the other node.
Thanks
# 5
> Hi,
Hi Shane! Thanks for looking at my question...
> Probably an obvious question, but have you set
> store.dbtmpdir to be a memory-mapped file-system
> (e.g. /tmp/msg-<your mailstore hostname>)? This will
> improve the backup speed/throughput.
Yup, we've done that.
> How many imsbackup processes are you running at once?
> Are you using backup-groups to parallel them up?
We're doing 19 at the same time, more or less. We increased parallelism until we started to see backups impact the user experience and then backed off a bit.
We are using backup groups.
> As Jay noted, I/O issues tend to be the biggest
> issue, although Sun Cluster global filesystem may be
> a factor. Has this problem been getting worse over
> time or is this just the first time you noticed it?
We've always had this problem since we switched to the cluster. I'm starting to look at it now since we've increased quota a lot since we moved to new hardware and it's getting to the point where our backups are taking more than 12 hours.
I'm with you in thinking that the global filesystem is probably the culprit. No one else I've talked to has seen similar performance problems with imsbackup.
> What you have described is 'out-of-the-ordinary' so
> getting an idea of your environment would help.
Let me know what else might be helpful.
> If you are in a SAN environment, you may also want to
> consider OS level drivers/HBA drivers as well - have
> you patched these recently?
Heh. We've been up to our neck in SAN issues (we had a Sun 6320 that went bad and took us down for 16 hours and then got worse). We're current on the SAN Foundation Kit.
> On a side note, I would recommend patching up to -63
> if for no other reason then to prevent the following
> bug:
>
> 6441637 imsrestore crashes when many IMAP flags are
> set on message
>
> This has caught a number of customers out, especially
> when they are in a hurry to restore an account and
> imsrestore keeps crashing on that one account.
Thanks for the heads up about that....
-jeff
# 6
> Hi,
> If you are using global filesystem...are you using
> the StoragePlus with affinty on true in the resource
> group of messaging store.
We do have AffinityOn=TRUE.
> I wonder why you have used the global filesystem at
> all...for the store...
> sometimes this will result in the usgae of private
> interconnect to access the store fileystem from the
> active node -- if the disk resource is active on the
> other node and you are running imsbackup from the
> other node.
We do the backups via an IP address that fails over with the cluster, so we're always backing it up via the node that owns the disks.
As far as why we have everything set up as global filesystems, well, that's a bit of a long story. We had a consulting company set up the cluster and during their integration work they discovered that the cluster failover scripts needed to be able to access some of the filesystems on the inactive node to 'pre-check' the environment before the failover started. They said that meant we needed to have the filesystems mounted globally.
My background is with Veritas Cluster, so I'm still somewhat new to the Sun Cluster stuff. My goal is to change the store filesystems to be failover filesystems, but I've got no idea what's involved with that. Hopefully next week I can dive into it.
I just checked 'top' on the inactive node on the cluster while backups are running. It's totally idle, but it's got a load average of 1.5 and is spending 15-25% of the time in the kernel:
last pid: 26261; load averages: 1.41, 1.30, 1.1322:10:34
72 processes: 71 sleeping, 1 on cpu
CPU states: 83.7% idle, 0.0% user, 15.2% kernel, 1.1% iowait, 0.0% swap
Memory: 16G real, 14G free, 547M swap in use, 20G swap free
PID USERNAME LWP PRI NICE SIZERES STATETIMECPU COMMAND
26237 root1 490 3328K 2256K cpu/80:01 0.33% top
1 root1 590 1712K 992K sleep31:54 0.02% init
2257 root1 59088M86M sleep11:29 0.02% se.sparcv9
15403 jeffw1 59010M 2784K sleep0:00 0.02% sshd
1827 root16 100 -2034M 2992K sleep0:26 0.01% rpc.pmfd
2037 root7 590 5752K 4888K sleep17:59 0.00% mibiisa
1807 nagios1 590 3288K 1432K sleep3:43 0.00% nrpe
2229 noaccess 24 590 120M48M sleep3:04 0.00% java
353 root1 590 3784K 2104K sleep2:52 0.00% sshd
131 root6 590 6168K 5408K sleep0:52 0.00% picld
1855 root4 590 6432K 3848K sleep0:40 0.00% cl_eventlogd
352 root20 590 3328K 2832K sleep0:27 0.00% nscd
64 root16 590 5264K 3312K sleep0:22 0.00% syseventd
1665 noaccess 26 2910 108M47M sleep0:15 0.00% java
1869 root1 590 2304K 1352K sleep0:09 0.00% in.mpathd
This really seems to point back to the global filesystem getting in the way
# 7
Hi,
=======================================================
As far as why we have everything set up as global filesystems, well, that's a bit of a long story. We had a consulting company set up the cluster and during their integration work they discovered that the cluster failover scripts needed to be able to access some of the filesystems on the inactive node to 'pre-check' the environment before the failover started. They said that meant we needed to have the filesystems mounted globally.
=======================================================
I have installed Sun Cluster (3.1) with messaging (iMS 5.2 aswell as SJMS) many times . Always I have used faiover filesystems and not global.
If we follow the cluster portion of the admin guides in messaging docs, there is no problem. Though it is confusing, and actually weirdly. Sun sowehow switches recomendations for messaging clusters to have own or shared binaries per node.
I assume you are using the two binary approach ( for 2 nodes of cluster ) as Sun recomends that for SJES. Here there is no need for global fileystem as far as you install the messaging twice - once on each node.
You have to switch the diskgroups and other resources ( like IP ), before you start installing on other node. The configuration script should then point to failover filesystem on first node and this config needs to be copied over to next node. We need to use the special cmd for the config of second node then, this is documented.
So there is never a compulsion to use globalfilesystem....for SJES --
For a test -- if possible - you can stop the messaging resource group. Start the resources of IP and filesystem and not the messaging ( or manually ifconfig the vitual IP and mount the disk group ). Then u can start the messaging manually again -- ( start-msg ha )
After all this ur messaging is running through direct fs. So check whether ur imsbackup runs fine now....
Thanks
Thanks.
# 8
Another thought. .. For Solaris 10, "TOP" always reports i/o wait at zero . . . .
# 9
> Another thought. .. > > For Solaris 10, "TOP" always reports i/o wait at zeroWe're still on Solaris 9....
# 10
Could be the global filesystem thing, then. . .
# 11
> Could be the global filesystem thing, then. . .Yeah, that's my guess. I've made the change to failover fileystems on our test and staging clusters. Hopefully during our Sunday maint window, I'll make the change there and I'll report back Monday with good news :-)
# 12
A very late followup.Moving from the global filesystem to failover fileysystem cut our CPU usage by 50% or more.