store server status....failed
Hi I recently had a JMS failer, it just stopped working and I noticed my /var directory was maxed out. I found huge log/stats files under /var/opt/SUNWam/stats/, so I moved these to a filesystem with more space so I can examine them later [does anybody know what they monitor and how to configure the monitoring intervals etc]. I restarted the system and then started directory server, I noticed entries were being made to these files [so I think they are ok].
The big problem now is Messenger's store server status fails so it will not start at all. This is what I get:
# /opt/SUNWmsgsr/sbin/start-msg
Connecting to watcher ...
Launching watcher ...
Starting ens server ... 1627
Starting store server .... 1628
checking store server status ............ failed
So I got no email :^(
Does anybody have an idea what might be the problem. Is the store corrupted? how can I fix this?
Thanks in advance.
-James
[979 byte] By [
bdajames] at [2007-11-26 11:04:31]

# 2
It would help to have more data to go on.
If you were restarting the server (or watcher was doing so), and the stored found that there was a buildup of transaction logs, then it would have gone into "recover mode", while processing those logs.
During recovery, the server won't start the other processes, and will give that error message.
check that stored is, or is not running.
check the store.pid file for "initializing" or other.
do a "ls" of the mboxlist directory, and note number of "log*"files.
# 3
Thanks Jay what I think happend was the /var went to 100% and mail stopped [I cleared out some am stats/logs]. Now I am not sure why stored would want to write to this partition I have store [all my users] on another disk, but maybe I left a few test accounts on the original default partition. Here is what I got from the logs.
# more /opt/SUNWmsgsr/log/default
[25/Oct/2006:15:15:01 -0400] correo msprobe[27193]: General Information: Log created (1161803701)
[25/Oct/2006:15:15:01 -0400] correo msprobe[27193]: General Warning: alarmid=diskavail|instance=Mboxlist Directory|time=25/Oct/2006:15:15:01 -0400
|value=0|low=0|high=62|threshold(below)=10|count below threshold=42|warning sent=92
[25/Oct/2006:15:25:01 -0400] correo msprobe[27202]: General Warning: alarmid=diskavail|instance=default|time=25/Oct/2006:15:25:01 -0400|value=0|lo
w=0|high=60|threshold(below)=10|count below threshold=42|warning sent=92
[25/Oct/2006:23:00:00 -0400] correo imexpire[27410]: General Notice: Expire started (0)
[25/Oct/2006:23:00:23 -0400] correo imexpire[27410]: General Notice: Expire finished
[26/Oct/2006:10:44:57 -0400] correo stored[1824]: Store Warning: Database snapshot failed: snapshot copy log file failed
[26/Oct/2006:10:45:34 -0400] correo stored[1824]: Store Critical: Mailbox database error: write: 0xfd9a03e4, 1774: No space left on device
[26/Oct/2006:11:27:23 -0400] correo imsched[1829]: General Notice: shutting down
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Error: mshttpd process 1827 is running, cannot start/stop stored
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28043 exited abnormally
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28049 exited abnormally
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28165 exited abnormally
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28163 exited abnormally
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28166 exited abnormally
[26/Oct/2006:11:27:23 -0400] correo stored[1824]: Store Notice: Cannot stop stored when other message store processes are running. Please stop the
other message store processes before stopping stored
[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: imapd process 1825 exited abnormally
[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: popd process 1826 exited abnormally
[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: mshttpd process 1827 exited abnormally
I have checked and stored is not running nor anything by mailsrv. I am not sure where store.pid is?
There are 2 of those log files:
# ls /var/opt/SUNWmsgsr/store/mboxlist/
__db.001__db.003__db.005folderlocklog.0000000076 peruser.dbsubscr.db
__db.002__db.004folder.dblog.0000000075 lright.dbquota.db
I have fsck all my disks [/var, /opt, /export/home etc] and all is well. I have also been able to start with start-msg:
sched, ens, dispatcher, job_contriller
- pop and imap don't start without store.
I don't know what to do next thanks again
-james
# 4
Yes, filling /var is a "bad thing".
Two log files isn't very bad, but it may delay startup.
It's also possible that your database has become corrupted, because the server could not write to it.
I would try things in this order:
1. just start the server. /opt/SUNWmsgsr/sbin/start-msg
Observer is stored is running, even if you get an error message. the pid files should be in the config directory, if memory serves me correctly.If you get the timeout error, stored should be attempting to recover. If it is running, be patient for a few minutes, and the pid file should change from "initializing" to "ready", and you can re-run the start script, and be up and fine.
If stored has died, then:
2. Run a "rapid recovery" This is considered a disaster-recovery technique, not to be run, unless you have a disaster . . .
A. Make sure all Messaging Server processes are stopped.
B. Remove all files from the mboxlist directory.
C. Start Messaging Server. It should start normally.
D. Run
reconstruct -m