store server status....failed

Hi I recently had a JMS failer, it just stopped working and I noticed my /var directory was maxed out. I found huge log/stats files under /var/opt/SUNWam/stats/, so I moved these to a filesystem with more space so I can examine them later [does anybody know what they monitor and how to configure the monitoring intervals etc]. I restarted the system and then started directory server, I noticed entries were being made to these files [so I think they are ok].

The big problem now is Messenger's store server status fails so it will not start at all. This is what I get:

# /opt/SUNWmsgsr/sbin/start-msg

Connecting to watcher ...

Launching watcher ...

Starting ens server ... 1627

Starting store server .... 1628

checking store server status ............ failed

So I got no email :^(

Does anybody have an idea what might be the problem. Is the store corrupted? how can I fix this?

Thanks in advance.

-James

[979 byte] By [bdajames] at [2007-11-26 11:04:31]
# 1

I found a posting with a similar problem

http://forum.sun.com/jive/thread.jspa?forumID=15&threadID=48514

This person ran out of store space. I have the store on another disk but since alot of the config files seem to be under /var I think its possible I might have a similar problem. I also found reconstruct in the Admin Guide.

The person in the posting is running 6.1 and I'm on 6.2-3.04 but I hope it applies.

I'll wait a tad to run this solution, just case there are other suggestions.

Thanks in advance

-James

bdajames at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 2

It would help to have more data to go on.

If you were restarting the server (or watcher was doing so), and the stored found that there was a buildup of transaction logs, then it would have gone into "recover mode", while processing those logs.

During recovery, the server won't start the other processes, and will give that error message.

check that stored is, or is not running.

check the store.pid file for "initializing" or other.

do a "ls" of the mboxlist directory, and note number of "log*"files.

jay_plesset at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 3

Thanks Jay what I think happend was the /var went to 100% and mail stopped [I cleared out some am stats/logs]. Now I am not sure why stored would want to write to this partition I have store [all my users] on another disk, but maybe I left a few test accounts on the original default partition. Here is what I got from the logs.

# more /opt/SUNWmsgsr/log/default

[25/Oct/2006:15:15:01 -0400] correo msprobe[27193]: General Information: Log created (1161803701)

[25/Oct/2006:15:15:01 -0400] correo msprobe[27193]: General Warning: alarmid=diskavail|instance=Mboxlist Directory|time=25/Oct/2006:15:15:01 -0400

|value=0|low=0|high=62|threshold(below)=10|count below threshold=42|warning sent=92

[25/Oct/2006:15:25:01 -0400] correo msprobe[27202]: General Warning: alarmid=diskavail|instance=default|time=25/Oct/2006:15:25:01 -0400|value=0|lo

w=0|high=60|threshold(below)=10|count below threshold=42|warning sent=92

[25/Oct/2006:23:00:00 -0400] correo imexpire[27410]: General Notice: Expire started (0)

[25/Oct/2006:23:00:23 -0400] correo imexpire[27410]: General Notice: Expire finished

[26/Oct/2006:10:44:57 -0400] correo stored[1824]: Store Warning: Database snapshot failed: snapshot copy log file failed

[26/Oct/2006:10:45:34 -0400] correo stored[1824]: Store Critical: Mailbox database error: write: 0xfd9a03e4, 1774: No space left on device

[26/Oct/2006:11:27:23 -0400] correo imsched[1829]: General Notice: shutting down

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Error: mshttpd process 1827 is running, cannot start/stop stored

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28043 exited abnormally

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28049 exited abnormally

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28165 exited abnormally

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28163 exited abnormally

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: General Warning: ims_master process 28166 exited abnormally

[26/Oct/2006:11:27:23 -0400] correo stored[1824]: Store Notice: Cannot stop stored when other message store processes are running. Please stop the

other message store processes before stopping stored

[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: imapd process 1825 exited abnormally

[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: popd process 1826 exited abnormally

[26/Oct/2006:11:29:28 -0400] correo stored[1824]: General Warning: mshttpd process 1827 exited abnormally

I have checked and stored is not running nor anything by mailsrv. I am not sure where store.pid is?

There are 2 of those log files:

# ls /var/opt/SUNWmsgsr/store/mboxlist/

__db.001__db.003__db.005folderlocklog.0000000076 peruser.dbsubscr.db

__db.002__db.004folder.dblog.0000000075 lright.dbquota.db

I have fsck all my disks [/var, /opt, /export/home etc] and all is well. I have also been able to start with start-msg:

sched, ens, dispatcher, job_contriller

- pop and imap don't start without store.

I don't know what to do next thanks again

-james

bdajames at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 4

Yes, filling /var is a "bad thing".

Two log files isn't very bad, but it may delay startup.

It's also possible that your database has become corrupted, because the server could not write to it.

I would try things in this order:

1. just start the server. /opt/SUNWmsgsr/sbin/start-msg

Observer is stored is running, even if you get an error message. the pid files should be in the config directory, if memory serves me correctly.If you get the timeout error, stored should be attempting to recover. If it is running, be patient for a few minutes, and the pid file should change from "initializing" to "ready", and you can re-run the start script, and be up and fine.

If stored has died, then:

2. Run a "rapid recovery" This is considered a disaster-recovery technique, not to be run, unless you have a disaster . . .

A. Make sure all Messaging Server processes are stopped.

B. Remove all files from the mboxlist directory.

C. Start Messaging Server. It should start normally.

D. Run

reconstruct -m

jay_plesset at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 5

stored was not running unfortunately I could not find store.pid oh well.

I cleared all files in mboxlist then started Messaging server...YES. This time it came up ok. So I ran reconstruct -m it went through all my mailboxes. When finished I stop-msg and then start-msg it came up clean and we have mail again.

Thanks again Jay

-James

bdajames at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 6
It's always a good idea to make sure you don't run out of disk......
jay_plesset at 2007-7-7 3:18:25 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...