Caching without Stale Objects

Hi,

I'm looking at caching various objects in memory but I cannot work out how it is possible to overcome the problem of child objects becoming stale in the cache:

For example, say I had anOrder object in cache with a reference to aContact object. However, theContact object is also used in other parts of the system so can be updated independently.

Therefore when I next pull theOrder object from the cache, it's childContact object may have become stale.

About the only way I can think to get around this is to store something like a 'Contact ID' in the cache and then each time anOrder is requested - instantiate a new Order Transfer object, pulling theContact from its own cache. However, this is obviously slower because its creating a new object with each request to the cache, and similarly will have a higher memory consumption because I'm creating new instances each time.

Any thoughts/pointers would be greatly appreciated as I can't seem to find any information on this issue.

[1086 byte] By [pauldeasona] at [2007-10-2 20:41:43]
# 1

what do you mean by "stale", exactly?

if your Contract object holds a reference to a Contact object, and not the Contact itself, the two can happily vary independently with no impact. that seems to be where you're going with the "contact id" idea. there's not even any need to have a separate cache for Contact and Contract, since both are just objects. if they have a common supertype, which has a method for retreiving a unique reference, you can just cache the whole shebang in one map, and objects can get their referenced objects as they need to

georgemca at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 2

> Hi,

>

> I'm looking at caching various objects in memory but

> I cannot work out how it is possible to overcome the

> problem of child objects becoming stale in the

> cache:

>

It's not a trivial problem. That's why it's usually easier to use an existing caching solution, say Jakarta Commons Pool library or an application server cache such as JBoss's.

> For example, say I had an Order object in

> cache with a reference to a Contact object.

> However, the Contact object is also used in

> other parts of the system so can be updated

> independently.

>

Ok.

> Therefore when I next pull the Order object

> from the cache, it's child Contact object may

> have become stale.

>

I'm not sure I follow. Java passes object references by value. So, assuming that you modify one Contract's reference, the underlying Contract object's values will also be modified. Any references that point to the Contract object will 'see' the change (at least in the same JVM).

More generally, however, someone could load or re-load a Contract record from the database. These problems are hard to detect. You could override the setter in your cache to notice if another copy of an object is already there. What action to take is up to you. Hibernate, for example, would compare the two snapshots in its cache and issue sql UPDATE statements as needed. It depends on how you want your cache to behave.

> About the only way I can think to get around this is

> to store something like a 'Contact ID' in the cache

> and then each time an Order is requested -

> instantiate a new Order Transfer object, pulling the

> Contact from its own cache. However, this is

> obviously slower because its creating a new object

> with each request to the cache, and similarly will

> have a higher memory consumption because I'm creating

> new instances each time.

>

Now it's sounding more and more like an O/R mapper (Hibernate, Toplink, etc.) with a cache. Why are you inventing your own?

> Any thoughts/pointers would be greatly appreciated as

> I can't seem to find any information on this issue.

You are welcome. See above.

- Saish

Saisha at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 3

Thanks for your input guys - let me clarify a bit. I will be using an existing caching framework like OSCache but I dont think it solves my problem.

I would like the Order object to be a proper transfer object so that it holds a reference to Contact not just an int with the contact id. Otherwise, when I get to my JSP page I'm going to need to start making calls back to the factory to get the various Contact objects from it, which isn't good MVC.

My concern with holding a Java reference to the Contact object in the Order object (that is cached) is that if somebody updated the Contact data i.e. persisted it to the database then my Order object would not know that the Contact data had been updated. I could cache the Contact objects in their own cache, and quite rightly if I updated the Contact object itself then the update would occur across the application. But what if the Contact objects expires from the cache and is reloaded at a later time. At this point the Order object is pointing to an old version of Contact but any updates etc. would be pointing at the newer Contact object.

Does that make sense?

pauldeasona at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 4

> ...

> Does that make sense?

Yes. There are a couple ways that I have thought up in the past to deal with this. I'm not suggesting these are the only ways but maybe they will get the ball rolling for you:

1. Define each DTO with two classes. The first is package-protected struct like class that has a bunch of references, no methods. The second is a wrapper to that class an contains the normal getter and setters (If you think about it, this is pretty easy to refactor out from an existing DTO). Then, when the contact data changes, you swap out the underlying struct-like Object. Everything else in the system only cares about the wrapper and that reference doesn't change but the data is now updated.

2. Do not put 'hard' links between the parent and child. The parent merely keeps the key for the child and pulls it from the cache on calls for the child. I think you alluded to this strategy in your original post. There is actually a really nice side benefit to this in that you don't load the child unless it's needed. If you have a largely heirarchical system, this can greatly improve the perceived performance of the system. This should not be slow unless you have a slow cache or are doing something wrong.

dubwaia at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 5

Thanks for your suggestions.

In terms of point (1), how would the wrapper know that the underlying class struct had changed? i.e. how would it know when to update it? Possibly I'm not understanding this properly but I'm not sure how this is much different than holding a normal reference to the Contact. Is the risk not the same in that if the Contact expires from its own cache then any future updates will not affect the underlying struct because they'll be creating a new instance of Contact.

(2) I do like this in many ways and have considered it but my concern comes from the fact it's not really MVC because the DTOs are making calls back and fourth to the factory/cache. Then if I wanted to do something like use Web Services and send the DTO over an XML file it wouldn't work because the DTO isn't really holding all its data.

Thanks again, if you can help me with what you mean in point (1) that'd be great.

pauldeasona at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 6

> Thanks for your suggestions.

>

> In terms of point (1), how would the wrapper know

> that the underlying class struct had changed? i.e.

> how would it know when to update it? Possibly I'm not

> understanding this properly but I'm not sure how this

> is much different than holding a normal reference to

> the Contact. Is the risk not the same in that if the

> Contact expires from its own cache then any future

> updates will not affect the underlying struct because

> they'll be creating a new instance of Contact.

My fault, I forgot that I can't beam thoughts into your head.

The children would be in a cache. But when the cache is refreshed, the data being provided is only the internal struct Object. The updated structs are swapped into the Objects that are already in the cache. The parent still refers to the same child instance and it also sees the update.

pro/con- this can also cause any reference that is currently in use to appear to change at any moment. This might be really good and it might be really bad or both. Because the Object's data can change at any point, you need to constantly consider how this might cause failures. I created such a design and it was implemented and I was told that it was super-fast and that the design was not modified too much.

> (2) I do like this in many ways and have considered

> it but my concern comes from the fact it's not really

> MVC because the DTOs are making calls back and fourth

> to the factory/cache. Then if I wanted to do

> something like use Web Services and send the DTO over

> an XML file it wouldn't work because the DTO isn't

> really holding all its data.

You could do all of this externally by asking the parent explicitly for the child's key and then going to the cache and you would resolve your concern but what do you gain? You'd end up with a lot of redundant code and it would actually be a violation of OO pricicples. It seems to me that one of the biggest misconceptions of OO is that an Objects method must be implemented only with that Object's internal data. I say, do what works. The MVC pattern doesn't require the model have any particular implementation. It's point is to separate the model from the the way the data is rendered in views.

dubwaia at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 7

>

> I would like the Order object to be a proper

> transfer object so that it holds a reference to

> Contact not just an int with the contact id.

> Otherwise, when I get to my JSP page I'm going to

> need to start making calls back to the factory to get

> the various Contact objects from it, which

> isn't good MVC.

>

Actually that depends on usage. Just because someone pulls up a list of customers it doesn't mean that the want the entire tree of cusomers moved across the wire.

> My concern with holding a Java reference to the

> Contact object in the Order object

> (that is cached) is that if somebody updated the

> Contact data i.e. persisted it to the database

> then my Order object would not know that the

> Contact data had been updated. I could cache

> the Contact objects in their own cache, and

> quite rightly if I updated the Contact object

> itself then the update would occur across the

> application. But what if the Contact objects

> expires from the cache and is reloaded at a later

> time. At this point the Order object is

> pointing to an old version of Contact but any

> updates etc. would be pointing at the newer

> Contact object.

>

Are you basing this on actual business cases?

jschella at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 8

>The children would be in a cache. But when the cache is refreshed, the data

>being provided is only the internal struct Object. The updated structs are

>swapped into the Objects that are already in the cache.

Apologies but I think I'm missing something. How does the cache for the struct know which objects to update when it is refreshed? The scenario I'm worried about is that my cache fills up and so one of the structs gets booted out. However, my parent object still has a reference to it so continues to use that. However, at a later time the struct is requested from the cache by something else and reloaded - however, my parent object is still pointing to the old struct and has no knowledge there is a new object about.

As I say, I may be misinterpreting this design as I'm not sure the major difference between wrapping the Contact object in an inner class as opposed to just referencing it.

>The MVC pattern doesn't require the model have any particular

>implementation. It's point is to separate the model from the the way the

>data is rendered in views.

Sure, that's fair enough, but then you run into issues such as if you want to change the values on the DTO then pass it back for updates because you don't have the internal data on the objects you can't set them easily. I guess the solution is to create two types of DTO, one which is cache-aware?

Jschell:

>Actually that depends on usage. Just because someone pulls up a list of

>customers it doesn't mean that the want the entire tree of cusomers moved

>across the wire.

True, but in most cases I'll need the tree.

>Are you basing this on actual business cases?

Yep, this is an exact problem of mine. I have a generic set of Contacts which are used in different areas of the application so I need different objects that reference them to be aware when they're updated separately.

Thanks guys for your continued help.

pauldeasona at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 9

> Apologies but I think I'm missing something. How does

> the cache for the struct know which objects to update

> when it is refreshed?

The wrapper Objects are in the cache. So when the new struct comes in, it finds the appropriate wrapper and swaps out the struct.

> The scenario I'm worried about

> is that my cache fills up and so one of the structs

> gets booted out. However, my parent object still has

> a reference to it so continues to use that.

Well, it's probably a bad idea to boot the item from the cache if it's still in use. There are a couple of (non-exclusive) ways to deal with this. Use a memory sensitive cache via SoftReferences and/or WeakReferences. If an Object is in use but it may not actually be needed and you want to boot it, you can null out the struct in the wrapper. Then if it is needed again, when the wrapper requests the Object again, you can relaod it through the cache.

> However,

> at a later time the struct is requested from the

> cache by something else and reloaded - however, my

> parent object is still pointing to the old struct and

> has no knowledge there is a new object about.

Again, deleting the Object from the cache while it is still in use seems kind of pointless. You aren't reclaiming any more memory than what a reference uses and it complicates things greatly.

dubwaia at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 10

I think that you need to sit down and really think about the sematics of the cache you are

proposing to use. What do you want it for? what are its use cases? e.g.

If it always has to reflect the exact data in the DB and the data in the DB can change at

any time by methods ouside your control then you can forget using caching altogether.

If your code is the only thing that is "allowed" to modify the data then you already "know"

when the data is changed and so you also know when to update your cached DTO's.

If you take a different view. Is it OK for a single request to see consitent data then you

may be able to get by by updating the DTO's as the request begins. In this case the

cache exists on a "per-request" basis and is just used to lower the comms cost with the

DB for the DTO's that this request needs.

matfud

matfuda at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 11

> >Are you basing this on actual business cases?

>

> Yep, this is an exact problem of mine. I have a

> generic set of Contacts which are used in different

> areas of the application so I need different objects

> that reference them to be aware when they're updated

> separately.

>

You described an implementation not a use case.

A use case would be where accounting is updating the address at the same time that the call center does and this is something that really does happen. Moreover when it does happen it happens in such a way that the data from one source is more correct than the data from the other source. (If it is the same data it doesn't matter.)

jschella at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 12

Thanks for your continuing messages. Ok let me tacke each of these. I'll start with a use case then as hopefully that will clarify my issue:

1. I have two 'parent' DTO objects Order and Invoice both of which hold a reference a Contact DTO object.

2. In some instances both Order and Invoice may be pointing to the same Contact.

3. So I keep Contact in its own cache so that I have a central point of access and update to it.

4. However, the Contact cache reaches its limit and wants to remove an entry of Contact so another one can be loaded in. It's using something like LRU but it removes a Contact object from the cache that an Order object in its own cache is pointing to.

5. This doesn't directly affect the Order as the object still exists. However, accounts then load up an Invoice which will hold a reference to the same Contact, so it calls the Contact cache which reloads the Contact from the database and puts it in the cache.

The problem I now have is that accounts can update the Contact object based on the invoice but because the Order object is pointing to the old instance of the Contact it's ignorant to any changes so carries on as normal with out-of-date information.

Ideally as dubwai mentioned the Contact cache would never remove the Contact object so I could update this single instance. But how would the Contact cache ever be aware if its objects were in use elsewhere? I've tried to look up to see if you can find the number of references to an object but came to a dead-end.

pauldeasona at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 13

BTW.... I appreciate the above is still more of an implementation so I guess the use-case would be that accounts are updating invoice contact information. Then at a later time customer services send out order details to the contact but it's the out-of-date contact information.

So the problem as above can arise. Thanks!

pauldeasona at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 14

> Ideally as dubwai mentioned the Contact cache would

> never remove the Contact object so I could

> update this single instance. But how would the

> Contact cache ever be aware if its objects were in

> use elsewhere? I've tried to look up to see if you

> can find the number of references to an object but

> came to a dead-end.

Look at the Reference APIs: WeakReference, SoftReference, and PhantomReference. These depend on the garbage collector but work well in practice, I find.

WeakReference is the easiest to understand. Once no 'strong' references (normal references) are pointing to an Object, the weakreference will not keep it from being garbage collected.

This is a good article. A little old but still useful: http://java.sun.com/developer/technicalArticles/ALT/RefObj/

What you can do is have weak (or soft) references to all the objects in the cache. Keep strong references to those Objects that were most recently used (or whatever) and let the GC remove all the items that are old and not in use.

dubwaia at 2007-7-13 23:25:02 > top of Java-index,Other Topics,Patterns & OO Design...
# 15
Dubai - a BIG thanks for that information - using weak references I think I can go away and explore some ideas I previously had dismissed and hopefully find the solution I'm looking for. Thanks very much for your help and to everyone else that contributed.
pauldeasona at 2007-7-21 1:50:53 > top of Java-index,Other Topics,Patterns & OO Design...
# 16

> Thanks for your continuing messages. Ok let me tacke

> each of these. I'll start with a use case then as

> hopefully that will clarify my issue:

>

> 1. I have two 'parent' DTO objects Order and

> Invoice both of which hold a reference a

> Contact DTO object.

> 2. In some instances both Order and

> Invoice may be pointing to the same

> Contact.

> 3. So I keep Contact in its own cache so that

> I have a central point of access and update to it.

> 4. However, the Contact cache reaches its limit and

> wants to remove an entry of Contact so another

> one can be loaded in. It's using something like LRU

> but it removes a Contact object from the cache

> that an Order object in its own cache is

> pointing to.

> 5. This doesn't directly affect the Order as

> the object still exists. However, accounts then load

> up an Invoice which will hold a reference to

> the same Contact, so it calls the Contact

> cache which reloads the Contact from the

> database and puts it in the cache.

>

> The problem I now have is that accounts can update

> the Contact object based on the invoice but

> because the Order object is pointing to the

> old instance of the Contact it's ignorant to

> any changes so carries on as normal with out-of-date

> information.

Still implementation not a use case.

You are describing a possible scenario based on the code that you are writing.

You are not describing a way in which the users are actually expect to use this.

(There is no point in coding for scenarios which never exist for the users.)

So what actual user process will be impacted?

For example does the customer center package the order as soon as they take it using the information on the screen?

If yes then what happens if they take the order before accounting updates it but after they took the actual order?

Obviously that sort of problem would have nothing to do with your cache but it demonstrates where a possible problem lies and provides a starting point for what solution is needed.

jschella at 2007-7-21 1:50:53 > top of Java-index,Other Topics,Patterns & OO Design...
# 17

> BTW.... I appreciate the above is still more of an

> implementation so I guess the use-case would be that

> accounts are updating invoice contact information.

> Then at a later time customer services send out order

> details to the contact but it's the out-of-date

> contact information.

>

A cache must provide a way to update the instances in the cache itself. This can either be via a reload or via updating the instance directly.

How this is done still depends on the specifics of the use case.

Perhaps your problem is that you are not updating the instance but instead are creating a new one. That of course is a bad idea as it negates the idea of the cache which represents a single instance for each logical entity that it represents.

jschella at 2007-7-21 1:50:53 > top of Java-index,Other Topics,Patterns & OO Design...
# 18

You are talking about two different things here. They both look like a cache but I wouldn't exactly call them both a cache.

1. How to ensure multiple objects or departments are pointing to the same data.

2. How to cache some data for easy retrieval.

#1 is a multiton. I have implemented one myself using WeakReferences and a Map. So if one department is using the Contact object/data, all departments that request Contact data will receive the same object. Now you no longer have an issue of one department updating the info but another department not knowing about it.

However, you must then consider how fast other departments are alerted of this. If for instance the street changes, and other departments know before the city and state change, that would be a problem. Nevertheless, Im sure you can work aroudn that.

Alternately, and with respect to the last stated issue, you can use a listener. So any released Contact objects keep in touch with their source, and if any info is changed, the source fires an event notifying all Contact object holders that they require a new Contact object.

I started with the pure multiton technique but more and more i am finding that its nice to know the data has changed, not just be using changed data unawares.

_dnoyeBa at 2007-7-21 1:50:53 > top of Java-index,Other Topics,Patterns & OO Design...