Result set comparison

Hi all!

I wish to compare two result set's from different sources, the later data being populated from the previous one. Both the result sets will have some data which i wish to compare (extra data in both result set is present). Now in the later datasource changes can be made and records can be added and deleted or edited. I wish to logically figure out a algorithm that will assist me in identifying if a

1. record has been deleted.

2. extra record has been added.

3. record has been deleted and then added. (as there is grouping in the data and both resultsets interpret grouping in different manner, records deleted should be added to same group).

This problem can be sighted as:

database A and database B has column x in both which have same values in order.

Database A|Database B

column x|column x

1|D

2|1

3|2

4|3

|D

|4

D is a dummy record in Database B that implies a new group, while database A has a seperate column whose value changes for a new group. now being sure that no changes would occur in database A, i have to check out for cases and report if a record has been deleted from Database B, or a extra is added or has been deleted and added but not in same order, as in suppose 2 is deleted from database B and is added after 3.

I have been able to handle the editing part easily that if 1 is replaced with 6 i can detect it but if something is added or deleted, things go out of control.

please advice at the earliest

thanks in advance

Abhishek

[1586 byte] By [abhijaan] at [2007-9-27 20:00:21]
# 1

Actually I din't get the requirements part very clearly.

If you don't mind - please explain what exactly this is for.

The approach to the problem largely wil depend upon the context:

- Is it a web-app or an otherwise case?

- Is it mandatory to have two resultsets?

What I understood from your question is that you have to detect rows that have been added/modified/deleted from a database table. Why would you need two resultsets, then? Why not just one resultset with an extra column for the action that was performed on the row.

This answer is with only a partial understanding of the problem itself. If you think I din't understand the question itself - please let me know... with possibly a clearer picture.

axe_fx at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 2

No, the case is not as you felt. I will try to make it more clear. Actually i have a web app that does its part and finally submits the order to the ERP database at my clients end. I am aware of the structure of ERP database, and am allowed to query it. Now the problem in hand is to check integrity of the data that i had submited which also resides in my application, as there is always a possiblity that it has been modified from the ERP. So i have to query the data that i had submitted from both the databases and compare for validate the integrity.

i hope this makes the problem more clear

regards

abhishek

abhijaan at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 3

Starting with 'source' and 'destination'

First define equality between two records. For example do all fields exactly match? Or all fields except the creation timestamp? Etc.

When I mention "compare" below it means preforming a equality comparison between two records.

The following can be made for efficient, but with more complex code if the source and destination are sorted.

>

> 1. record has been deleted.

Compare each record in the destination to the source. If the record is not found in the source it has been deleted. Collect this record into a "deleted" array.

> 2. extra record has been added.

Compare each record in the source to the destination (this is reversed from the above.) If the record is not in the destination then it has been added. Collect this record into the "added" array.

You didn't mention updates but you would do that here as well. Obviously that complicates equality between two records.

> 3. record has been deleted and then added. (as there

> is grouping in the data and both resultsets interpret

> grouping in different manner, records deleted should

> be added to same group).

This is not necessarily possible. It depends on the data in the record.

Say the record has the following fields: "customer name", "customer number".

There is absolutely no way to tell, using just those attributes, that a record has been deleted and then added.

Something has to change for it to be known.

I have no idea what you means by 'groups' above but say you have the following: "customer name", "customer number" and "group".

Now for an existing record "group" is non-null. For a new record then "group" is null (or a default value.)

So after doing the add from step two, any record in the "add" array that had a null group would be checked to see if matched an existing record with an existing group (this means a different equality comparison must be done.) If it matches it means it is a add and then delete. If it doesn't it is just an add.

jschell at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 4

In effort to give even a more clearer picture, i will reiterate, that the result sets are from different databases (the design of both databases is polls apart). Now I query number of tables from each database and form the result sets A and B. The result set A has 15 columns and result set B has 20 columns, all the 15 columns of A are not there in B. I need to compare just column a1 of result set A with column b1 of result set B(the comparison here essentially being equality check, 100% string.equals(str) types to be more precise). Also the records in both the result sets can be assumed to be sorted in same order, except for one exception of the "dummy" record, which increase the number of records in the result set B compared to result set A. This means removing all dummy records from result set B makes both the columns a1 and b1 identical.

Rule for number of Records:

records in A + number of dummy records in B = records in B

a1 | b1

=========

str1 | dummy

str2 | str1

str3 | str2

str4 | dummy

str2 | str3

str5 | str4

str2 | str2

str6 | str5

str1 | dummy

..... | str2

..... | str6

..... | str1

> Compare each record in the destination to the source.

> If the record is not found in the source it has been

> deleted. Collect this record into a "deleted"

> array.

>

Sorry about not mentioning it earlier, but the sequence of the records in result sets is important. The values in column a1 and b1 are not unique, they may repeat a number of times. So i can't simply check for this particular value in complete result set, as there may be a case in which the third record "str2" has been deleted from result set B, but on checking for "str2" in result set, it will get the "str2" after the last "dummy" record and show "str2" to be present there, hence not detecting any change.

> > 3. record has been deleted and then added. (as

> there

> > is grouping in the data and both resultsets

> interpret

> > grouping in different manner, records deleted

> should

> > be added to same group).

>

> This is not necessarily possible. It depends on the

> data in the record.

> There is absolutely no way to tell, using just those

> attributes, that a record has been deleted and then

> added.

>

> Something has to change for it to be known.

Yes this is data dependent, if a record is deleted from database B, it can never be same unless added to same group. There are attributes associated with each record which are unique and are known only at run time in my application which submits this data to database B, so these attributes cant be added in the record.

Also the group here stands for a logical group like i order items, the items ordered can be grouped as category 1, category 2 ,ecetra and all these groups divisions in resultset B are identified by record "dummy". for result set A there is seperate column which specify the group to which the item belong.

abhijaan at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 5

Instead of "even more clear" I would say "still just as obscure and confusing". In other words I still have very little idea what you are trying to do. In the hope that it may help you I'll just give you a quick algorithm that dates back to the earliest days of computing, when people kept their data on magnetic tapes and needed ways of merging or comparing two files.

1. Start at the beginning of result sets A and B.

2. Read one record from A and one record from B.

3a. If A is "less than" B then the A record is not in B (it was "added"); read another record from A.

3b. Else if A is "greater than" B then the B record is not in A (it was "deleted"); read another record from B.

3c. Else (A is "equal to" B) read another record from A and another record from B.

Repeat step 3 until you reach the end of both result sets. Note that I haven't included the logic necessary to handle the case where you reach the end of result set A before the end or result set B, but that shouldn't be hard to add in.

DrClap at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 6

> Instead of "even more clear" I would say "still just

> as obscure and confusing". In other words I still

> have very little idea what you are trying to do.

Actually for me it is now even more confusing than before. But when I read it earlier in the day I was hoping it was just me.

jschell at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...
# 7

> 1. Start at the beginning of result sets A and B.

>

> 2. Read one record from A and one record from B.

>

> 3a. If A is "less than" B then the A record is not in

> B (it was "added"); read another record from A.

>

> 3b. Else if A is "greater than" B then the B record is

> not in A (it was "deleted"); read another record from

> B.

>

> 3c. Else (A is "equal to" B) read another record from

> A and another record from B.

This is the algorithm i am following at the moment, but due to number of possiblities that i am trying to handle, 3a and 3b mentioned above do not go as intended(this is mainly due to the specific Grouping mechanism). Though it now seems that i will need more data to be posted to the ERP(the only thing i was trying to avoid) to make grouping more visible and comparison less complicated.

anyways thanks for the helping hand

abhijaan at 2007-7-6 23:49:18 > top of Java-index,Other Topics,Algorithms...