reconstruct problems after shutdown / startup

Yesterday we had an air conditioning outage in our data center, so I had to shutdown our iMS system. I forgot to do a stop-msg on the mail server, so it apparently did not get shut down very cleanly. When I booted back up later in the day, I had to run fsck on the data partition (which is on an external disk array) many times before it would boot up successfully.

I was able to start up the messaging server fine, but I monitored the logs and found errors like this one (from the imap log):

[30/Jul/2006:22:00:02 -0500] opal imapd[813]: Store Error: Unable to get quota root user/sch736: Unknown code _ 248

[30/Jul/2006:22:00:02 -0500] opal imapd[813]: Store Error: Unable to add new mailbox entry user/sch736: System I/O error. Administrator, check server log for details.

[30/Jul/2006:22:00:02 -0500] opal imapd[813]: Store Error: Unable to auto create mailbox user/sch736: System I/O error. Administrator, check server log for details.

I was getting similar errors all over the place, so I decided to rebuild the mboxlist.

running reconstruct -r -f doesn't do anything at all

reconstruct -m gives me the following:

partition primary is at /opt/luminis/iplanet/server5/msg-opal/store/partition/primary

user/aab753: fixed quota root usage

user/abm909: cannot fix quota root usage: I/O error

and puts the following in the default log:

[31/Jul/2006:08:52:47 -0500] opal reconstruct[25053]: General Notice: ./reconstruct -m

[31/Jul/2006:08:52:47 -0500] opal reconstruct[25053]: Store Critical: Unable to open index file for user/aab753/School/Comm 170 group: No such file or directory

[31/Jul/2006:08:52:47 -0500] opal reconstruct[25053]: Store Critical: Mailbox database error: folder.db: page 2204 doesn't exist, create flag not set

[31/Jul/2006:08:52:52 -0500] opal reconstruct[25053]: Store Critical: Mailbox database error: folder.db: page 2204 doesn't exist, create flag not set

[31/Jul/2006:08:52:52 -0500] opal reconstruct[25053]: Store Critical: Mailbox database error: folder.db: page 2204 doesn't exist, create flag not set

reconstruct -n -r runs for a while but then dies at

user/abm909/Sent

and puts the following in default log

[31/Jul/2006:08:57:02 -0500] opal reconstruct[25066]: General Notice: ./reconstruct -n -r

[31/Jul/2006:08:57:11 -0500] opal reconstruct[25066]: Store Critical: Mailbox database error: folder.db: page 2204 doesn't exist, create flag not set

I also was receiving the following error while running reconstruct -r

cannot fix quota root usage: I/O error

but I think that is because I had copied the files out of the mboxutil folder at the time.

Any ideas on how to fix what's going on?

Thanks,

Greg

[2813 byte] By [truman_gmarsh] at [2007-11-26 9:09:12]
# 1
You have some mangled store.idx files.You will need to manually remove the bad ones (the ones mentioned in your error logs), and reconstruct -r for those users.
jay_plesset at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 2

Jay,

I had already renamed all store.idx to stored.idx.old

I still can't get it to reconstruct that user.

If I go to that user's directory and find store.idx files, here's what I get:

$ find ./ -name store.idx

./=+Sent/store.idx

./=+Drafts/store.idx

./=+Deleted/store.idx

./store.idx

It appears that the one in =+Sent was the last successful one.

If I move that one to store.idx.old - it will let me run

reconstruct -r user/abm909 without throwing any errors, but when I try to run reconstruct -r -f on the whole store again, it dies there again.

Greg

truman_gmarsh at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 3

If reconstruct -r acts differntly from reconstruct -r -f, then there is a problem in one of the store.idx files. Reconstruct -r starts by comparing the number of messages shown in the store.idx file with the actual number of messages. If they match, nothing is done.

If one of the actual message files is in the wrong directory, then you will get repeating errors.

You say, "it dies there again.". does this mean that the reconstruct itself crashes? If so, I'll need to know your exact version, and more stuff.

Does reconstruct dump a core file? Is your system configured so it can? If you get a core file, what does "pstack core" show?

jay_plesset at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 4

HI Jay,

Sorry I disappeared for a while. This server is part of a SunGard Higher Education Luminis install, so I was dealing with their support for a while trying to get things resolved.

We got things working ok by following your steps from

http://swforum.sun.com/jive/thread.jspa?threadID=53262&messageID=204166

A. stop the messaging server. Confirm that ALL processes related to Messaging are stopped. Use ps -ef, and grep for the path it's installed in. Kill -9 any processes that are still there.

B. Remove all contents of the mboxlist directory.

C. Start the server up. It will work normally, but users will only have access to INBOX folders.

D. Run reconstruct -m to fix folder access.

However, after doing that, some users were getting the following error in the Luminis IMAP client:

[error: We could not load the Email Center because one of the folders

was invalid. Error given was: MailException: type=2

(EXCEPTION_InvalidFolder), description: Error getting subfolder, Sent ]

And some users can get in fine, but are missing their personal subfolders.

So they had me run these steps:

1) Stop the messaging server in its entirety (./stop-msg and check for related processes to make sure they've stopped running. On NT/Windows 2000, if they're set to start up automatically change them to Manual in the Services control panel)

2) Make a backup of the store directory, as well as the msg-instance/store/mboxlist folder contents. It is recommended you copy the contents of mboxlist out to an easily accessible backup directory -- if later required to restore from backup, this will make the mboxlist folder readily available instead of having to do a full restore.

3) run reconstruct -m

4) run a script to move all store.idx to store.idx.old -- See the steps below on how to do this.

5) start up store by doing:

./start-msg store

and wait until store posted all transactions in the mboxlist/log.* files -- sometimes this may take 1-2 hours, because it has to go through every transaction that hasn't completed the update on the mailbox database

6) run reconstruct -r

7) stop-msg store

8) backup and move out the contents of iplanet/server5/msg-instance/store/mboxlist to a place we can get back to later

9) run reconstruct -m again

10) ran "start-msg store" and let it complete, once again - this usually doesn't take as long unless new transactions were posted

11) start the rest of the messaging server processes -- by running ./start-msg, and let it ignore the store process since that is already running, or start up all other services manually (i.e., pop, http, imap, etc)

Which I did, until step 9, when it gave me the following error:

reconstruct: cannot list quota roots: System I/O error. Administrator, check server log for details

So I started the messaging server and then reconstruct -m ran successfully, but we're back in the exact same boat with the same users having the same problems.

I can fix the users getting the IMAP error by moving all the store.idx files and doing a recontruct -r on the user. However, I don't know who all is affected and don't really want to do that one by one.

I don't know how to fix the subfolder problem.

Thank you so much for any help you can provide. I hope I am giving you good information to go off of.

Thanks,

Greg

truman_gmarsh at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 5

> HI Jay,

> Sorry I disappeared for a while. This server is part

> of a SunGard Higher Education Luminis install, so I

> was dealing with their support for a while trying to

> get things resolved.

I understand.

>

> We got things working ok by following your steps from

>

> http://swforum.sun.com/jive/thread.jspa?threadID=53262

> &messageID=204166

>

> A. stop the messaging server. Confirm that ALL

> processes related to Messaging are stopped. Use ps

> -ef, and grep for the path it's installed in. Kill -9

> any processes that are still there.

>

> B. Remove all contents of the mboxlist directory.

>

> C. Start the server up. It will work normally, but

> users will only have access to INBOX folders.

>

> D. Run reconstruct -m to fix folder access.

>

> However, after doing that, some users were getting

> the following error in the Luminis IMAP client:

> [error: We could not load the Email Center because

> one of the folders

> was invalid. Error given was: MailException: type=2

> (EXCEPTION_InvalidFolder), description: Error getting

> subfolder, Sent ]

>

> And some users can get in fine, but are missing their

> personal subfolders.

>

> So they had me run these steps:

> 1) Stop the messaging server in its entirety

> (./stop-msg and check for related processes to make

> sure they've stopped running. On NT/Windows 2000,

> if they're set to start up automatically change them

> to Manual in the Services control panel)

> ) Make a backup of the store directory, as well as

> the msg-instance/store/mboxlist folder contents. It

> is recommended you copy the contents of mboxlist out

> to an easily accessible backup directory -- if later

> required to restore from backup, this will make the

> mboxlist folder readily available instead of having

> to do a full restore.

> 3) run reconstruct -m

> 4) run a script to move all store.idx to

> store.idx.old -- See the steps below on how to do

> this.

> 5) start up store by doing:

> ./start-msg store

>

> and wait until store posted all transactions in the

> mboxlist/log.* files -- sometimes this may take 1-2

> hours, because it has to go through every transaction

> that hasn't completed the update on the mailbox

> database

>

> 6) run reconstruct -r

> 7) stop-msg store

> 8) backup and move out the contents of

> iplanet/server5/msg-instance/store/mboxlist to a

> place we can get back to later

> 9) run reconstruct -m again

> 10) ran "start-msg store" and let it complete, once

> again - this usually doesn't take as long unless new

> transactions were posted

> 11) start the rest of the messaging server processes

> -- by running ./start-msg, and let it ignore the

> store process since that is already running, or start

> up all other services manually (i.e., pop, http,

> imap, etc)

>

> Which I did, until step 9, when it gave me the

> following error:

> reconstruct: cannot list quota roots: System I/O

> error. Administrator, check server log for details

>

> So I started the messaging server and then

> reconstruct -m ran successfully, but we're back in

> the exact same boat with the same users having the

> same problems.

>

> I can fix the users getting the IMAP error by moving

> all the store.idx files and doing a recontruct -r on

> the user. However, I don't know who all is affected

> and don't really want to do that one by one.

>

> I don't know how to fix the subfolder problem.

Each folder has a store.idx file.

reconstruct -r -f may fix all the problems at one go. If not, parse the log files for affected users, and remove store.idx files, and reconstruct.

You MAY need to apply some later hotfix/patch, as some work has been done in the reconstruct area. I don't remember exactly what version you're on.

>

>

> Thank you so much for any help you can provide. I

> hope I am giving you good information to go off of.

>

> Thanks,

> Greg

jay_plesset at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 6

Jay,

FYI. Here's what I ended up doing:

For each user, find their personal mail folders in the file system, then run mboxutil -c user/foldername to create the mailbox again. Then run reconstruct -r to fix the user.

We wrote a perl script which went through and fixed everyone.

Thanks for all your help with this.

Greg

truman_gmarsh at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 7
Yes. mboxutil -c wil create a clean, empty store.idx. reconstruct -r will then "fix" the problem.
jay_plesset at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 8
to resolve that you have to do this:you must to move the store.idx to store.idx.old and store.exp to store.exp.old and then run reconstruct -r user/e-mail account , then run iminitquota -u if you need i can send you the script.Regards.
immonitoraccess at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...
# 9
Um, no.Is there some reason you're responding to a several year old posting, where the solution was already given and reported to be successful?
jay_plesset at 2007-7-6 23:26:31 > top of Java-index,E-Mail, Calendar, & Collaboration,Sun Java System Messaging Server...