Message Server Restart Problem
Hello. I've run into the following problem, and hope someone can help.
Was adding a fourth mail partition, and I had problems with the start-msg after doing the stop-msg. Have done this before without any problems.
The start-msg looked as follows:
Starting ENS daemon
/opt/email/server5: Starting STORE daemon ..... done: 14668
/opt/email/server5: Starting POP3 daemon .... done: 14685
/opt/email/server5: Starting IMAP4 daemon ...Cannot bind to port 143: Address already in use
Did a stop/start a second time and the stop complained about no pidfile.imap. Indeed, there is not one. The other services (pop, etc) have
pidfile's.
After doing some reading, I'm thinking the imapd processes were not completely down after the first stop-msg? I see 4 processes now, which
matches the service.imap.numprocesses = 4 config. I was not previously
aware of this number of imapd processes.
Tonight, I will try again. As for the procedure, I plan on waiting a few minutes
to allow imapd processes for finish up so that port 143 can be bound on the
next start. Can those processes be killed if they don't go away? Is there a
method to ensure that the 143 port is ready for a start-msg?
We are on Iplanet MS 5.2. Any help will be much appreciated.
Thank you, Keith
[1365 byte] By [
kmrnm10a] at [2007-11-26 23:36:48]

# 1
Hi,
> The start-msg looked as follows:
>
> Starting ENS daemon
> /opt/email/server5: Starting STORE daemon ..... done:
> 14668
> /opt/email/server5: Starting POP3 daemon .... done:
> 14685
> /opt/email/server5: Starting IMAP4 daemon ...Cannot
> bind to port 143: Address already in use
Sounds like one of the imapd processes didn't shut down cleanly - or was still in the process of shutting down when you attempted to restart, and the old process is still bound to port 143.
> After doing some reading, I'm thinking the imapd
> processes were not completely down after the first
> stop-msg?
Yep - a fair assumption. That is why usually when shutting down/restarting iMS5.2 I used to make sure all the processes were dead (./stop-msg; ps -ef | grep server5). What can happen is that the process has been asked to 'gracefully' shut down... rather then just kill -9 the process. The graceful shutdown can take a bit of time.
> I see 4 processes now, which
> matches the service.imap.numprocesses = 4 config. I
> was not previously
> aware of this number of imapd processes.
There should be 4 processes.
> Tonight, I will try again. As for the procedure, I
> plan on waiting a few minutes
> to allow imapd processes for finish up so that port
> 143 can be bound on the
> next start.
Yep sounds fair. It's pretty easy to just run a 'ps -ef | grep server5' and see if there are any processes remaining rather then just waiting.
> Can those processes be killed if they
> don't go away?
Yes - but make sure you kill ALL of the messaging processes if they are stuck, don't just concentrate on the imapd process.
> Is there a
> method to ensure that the 143 port is ready for a
> start-msg?
Already discussed.
> We are on Iplanet MS 5.2. Any help will be much
> appreciated.
As usual.. iMS5.2 is soon to be out-of-support (may 2007) so you should look to upgrade to version 6.
Regards,
Shane.
# 2
Thank you so much Shane. This is just the kind of information I needed. I now feel better equipped for the next try (didn't try last night - too tired). I had not run into this before - just lucky I guess.
I'll be talking with management about the support issue with 5.2. We're tied to this version based on the portal app we are using. I know 6 would be better (and not just for support).
Thanks again,
Keith
# 3
I would be checking with your portal folks. Most such integrations work by using standard protocols, such as smtp, imap, etc. Those don't change, and we've yet to hear of a situation where upgrading the mail server actually breaks the portal.
# 4
Well,
Had a disaster last night. We couldn't get server back up, so removed the files in msg-inst/store/mboxlist/*, and did a reconstruct -m. It looked like all was well. Brought the server back up, but noticed all accounts were ending up in the default partition (we have 3) rather than where they were before. My email account was already in the default partition, but I can only see my INBOX and other basic folder. All my other folders are physically there, but I cannot see them from any mail client. This looks to be the case for all other accounts.
All of this started adding a 4th partition. This partition never really became permanent and was 'sort of' there during the initial reconstruct. I've since removed it along with a good stop/start.
If physical folders/messages are there, what can I do to make them viewable again.
Thank you,
Keith
# 5
I would do a configutil > filename and check and see that all your store partitions are defined. I have found in the past the it is better not to use the gui. I think I had a problem in the past on 5.2 where on my test server I added a store and it saved the new store and removed all the old store partitions.
Hope this helps,
Gary
# 6
Also, the store process mUST be running when you do the reconstruct -m. Do not run any reconstruct commands while the server is down.
Also, check that your ldap data includes the correct partition for your users. Once the data is removed from mboxlist directory, that data needs to be recreated from somewhere.
It's possible that a backup snapshot was restored automatically when you started Messaging , and that has made some issues for you.
configutil is your friend in dealing with Messaging configuration issues.
# 7
We did not have stored running, so the first reconstruct was bad. We ended up contacting the portal vendor and they basically did what we did, only they had stored up. It's unfortunate we didn't the first time - we might be completely good at this point.
Got the system back up about 11:00 today. There is still residual/strange things, like mboxutil -l showing some accts in two partitions (we currently have 3). We still have a fair amount of cleanup to do. Some people can't see their folders, but the physical folders are there.
We'll be contacting the vendor tomorrow again to see about the residue. I thought about running reconstruct -n -r on partitions, but want to get some sleep tonight. I wanted to just run a check (no repair) just to see what comes out, but looks like you can't do that with -n. Out of 50K complaint level is relatively small, but we know there are there are accts in the 3rd partition (like 11k), that don't belong there, and that is a concern. I think most of those are *new* accounts that our first reconstruct seemed to be adding even tho they existed elsewhere (the 3rd partition was the default partition at the time).
Copied the configutil data, it all looks like we want it. Just now need to work in individual acct issues. It's better than it was, but there is still a lot to do.
I appreciate all your input as I've learned a lot from this episode. We'll see how it goes tomorrow.
Thank you, Keith
# 8
When you restarted stored, and the database was not in place, it's possible that it automatically restored an old version. That old version may have had some of your users in the incorrect partition. That's why they show accounts in more than one place.
Removing the directories in the wrong place, and re-running reconstruct -m will fix that. mboxutil can also remove the wrong directories. Use mboxutil -c to "recreate" the correct connection to the mailbox.
If you have changed the "default" partition, then it's likely that has caused the problem. As long as your database is fine, changing the default partition works fine, but .. .
Early versions of Messaging didn't save the partition properly in ldap. Removal of the db may have caused major confusion of your replacement database.
In the future, I would strongly recommend not removing the mboxlist files. That's really a disaster-recovery action, and I've not had a disaster bad enough to recommend doing that for many months.
# 9
Apologize for taking so long for the final on this. Most of the dust has settled.
We were able to merge everyone's old inbox mail for their new ones (some accounts had new inboxes created during the recovery). My colleague wrote a script to use the 'deliver' command to re-deliver their old mail. It came up as unread mail, but the original date/times correct. As for some people that couldn't see their other folders, we had them resubscribe to them (couldn't find a way to do this programmatically)
I know we have some physical paths (for inboxes) on our third partition that the database doesn't know about, and they need to be removed - I need to revisit this. As you said, this probably happened when we switched default partitions. I have to do some testing on this scenario.
Again, thank you very much for all your help.
Keith