Consumer acks are missing 825 error

Hello,I receive the following error :Message: slapd-data ERROR : 3855:[10/Aug/2006:13:01:01 +0200] - ERROR8320 - Repl. Transport - conn=-1 op=-1 msgId=-1 - [S] End Failed with response: Consumer acks are missing 825Any idea ?Thank you
[269 byte] By [Marc@Flower] at [2007-11-26 9:24:34]
# 1
This message happens when the replication session failed to end properly, often due to a timeout too short in some early release of Directory Server 5.2.Have you tried to upgrade to Directory Server 5.2patch4 ?Ludovic.
ludovicp at 2007-7-7 0:00:13 > top of Java-index,Web & Directory Servers,Directory Servers...
# 2
I have been noticing the same errors appearing occasionally in my error logs. I am running 5.2 patch 4. Any other ideas of what may cause this?Mike
mdpiot at 2007-7-7 0:00:13 > top of Java-index,Web & Directory Servers,Directory Servers...
# 3
I am also having this issue in DS 5.2 r4 across a number of my masters. It occurs nightly, right about the time the db2ldif backup completes. I don't know if that is related though.What could cause this and how can it be prevented?
chadmjohn at 2007-7-7 0:00:13 > top of Java-index,Web & Directory Servers,Directory Servers...
# 4

> I am also having this issue in DS 5.2 r4 across a

> number of my masters. It occurs nightly, right about

> the time the db2ldif backup completes. I don't know

> if that is related though.

>

> What could cause this and how can it be prevented?

When a supplier sends a replication update to a consumer it maintains a replication timeout timer, waiting for the consumer to acknowledge the proper reception of the data before sending the next update. This acknowledgement is not sent immediately, but delayed by the time the consumer needs to apply the received data. Exceeding the timeout (which is set to 60s by default if I remember correctly) will not cause the replication to stop or result in a broken replication - it simply causes the termination of a specific replication session - indicated by ERROR<8320> .. Consumer acks are missing (825). Instantly theanfter a new replication session will be established and all data will be retransmitted (which may contain some data that have been already sent in the previous session)

If you can rule out any nework/connection related problem, this is most likely caused by a TEMPORARY inability of the consumer to process the replication updates in a timely fashion - otherwise that error would appear very often. This temporary inability may have been caused by anything - below are some (but surely not all) possible reasons:

- consumer (system / DS instance) is/was down, busy, or in hung state

- consumer DS instance is/was too slow to process the operations sent through replication - most of the time caused by high etime operations / unindexed searches on the consumer

- other applications utilizing/blocking the consumer system

You need to analyze the access/error logs of the affected supplier/consumer instance and the OS related log files (/var/adm/messages, etc.) of the consumer system to find out what exactly happened at the time those messages appeared.

A db2ldif keeping the consumer busy otherwise may be a good candidate for causing this - however, as noted before - usually you don't need to care as replication protocol is able to deal with that.

stefanwo at 2007-7-7 0:00:13 > top of Java-index,Web & Directory Servers,Directory Servers...
# 5
We have similar problem, the root cause Consumers are forced to rebuild CoS cache when CoS template or definition are updated. This increases replication delay. Is anything can be done about it?
mikenepomny at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...
# 6

We have also seen a similar issue recently where we are getting multiple instances of this error in the logs and replication has slowed to an absolute crawl. At times updates are taking up to 2 hours to propagate between servers. There is also an associated degradation in performance for updates being written to our suppliers, although search perfmance on our consumers seems to be relatively stable.

I have opened a support call with SUN, but have not had a resolution as yet.

oztrich at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...
# 7
Have you resolved your problem with replication delay?
mikenepomny at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...
# 8
Directory Server 6.0 contains several improvements with regards to replication, replication delays and timeouts. It is available now, and DS 6.1 should be available very shortly.Regards,Ludovic.
ludovicp at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...
# 9

No resolution as yet, although I am led to believe that it is related to an issue with the internal replication ID. I will update with more details when I have them.

Sadly we do not have the option of using DS6 at the moment as we are running on AIX. And it's AIX5.1 at that, so were are constrained even further to DS5.2 patch 2. So I've got to fix it.... :-(

oztrich at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...
# 10

It looks like we've finally managed to resolve this in our environment, although we took the drastic step of rebuilding the live servers completely which might not be an option for everyone!

In that case, the following procedure should work:

1) Delete all of the existing replication agreements

2) Re-initialise a supplier by exporting data without the -r option, and then re-import.

3) Export the data again, this time using the -r option.

4) Use this 2nd export to re-initaliase all of the other servers.

5) Re-create replication agreements.

Many thanks to the SUN engineer who helped us work through this!

Our decision to rebuild was taken because we needed to move our suppliers onto different servers, plus the "consumer acks" error message can potentially be caused by corrupted changelog replica ID numbers. The environment was built before I started supporting it, so a clean start was the only way I could ensure replicas were built correctly.

If the procedure detailed above does not work, a rebuild may be the only option. In our case replication had become such a problem (impacting overall performance of the Directory Servers as well as causing application impact) that we had little choice.

Thanks,

Mark.

oztrich at 2007-7-7 0:00:14 > top of Java-index,Web & Directory Servers,Directory Servers...