Autoinc function

Hello there,

I need to code a function that returns the value of the next id field

in a table(something like sequence.nextVal does in Oracle).

This might look pretty easy but there are many different ways to do it.

I'll list bellow some I've thought of:

1) I create for each table that has an autoinc field another table with one field that stores the actual id value(somehow stupid simulation of a sequence)

2) I create a table and for each table that has an autoinc field I add a field that stores the actual id value to that table

This must be accompanied by a class that provides the functionality

(like getVal, getNextVal and why not setVal).

Multiple ways of doing the things as well:

1) The value is updated/read directly from the table(on getVal a SELECT, on getNextVal an UPDATE and a SELECT will be executed)

2) I use a private member that stores the value. That member is set within the contructor to the maximum value of the autoinc field in the associated table and the methods will read/update that value. In this case there is no requirement for a table to store the value but some problems might arise.

Hmm, I can't say that I am decided, so maybe some of you can point me to a better solution or tell me what's wrong with any of the ideas presented above or just give me some advice.

Speak out,

Thanks

PS: I need this because there is no standard solution in db engines for such a thing and I need my app run on many db engines.

[1546 byte] By [roibma] at [2007-9-28 1:37:11]
# 1

This may work but it depends upon it requirement.

Problem with autoinc it acts neatly on int field but not with

varchar field at least in arrangement if you dont care because

alphabetically ' 10' < ' 1' so your record after 9 will not be 10

if you are using varchar to avoid this left pad the key with leading zeroes.

But that is another issue anyway you are only interested to yield unique values everytime you add record. One approach I use is find the max as in x = SELECT MAX(ID) FROM TABLE and use x + 1 as the new key;

Though database stored genkeys are good in other cases such as if youve got prefixing methodology as in PO0001 .PO99. where PO is a prefix.

reynoldmedesa at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 2

> Hello there,

>

> I need to code a function that returns the value of

> the next id field

> in a table(something like sequence.nextVal does in

> Oracle).

> This might look pretty easy but there are many

> different ways to do it.

> I'll list bellow some I've thought of:

> 1) I create for each table that has an autoinc field

> another table with one field that stores the actual id

> value(somehow stupid simulation of a sequence)

> 2) I create a table and for each table that has an

> autoinc field I add a field that stores the actual id

> value to that table

Second one is better. The speed of the look up is going to be trivial compared to the java to database and back again time, so you might as well put it in a single table.

Naturally from the database point of view, it would be better if this was maintained by triggers on the table itself. But if you don't have a id function then the database isn't likely to have triggers either.

>

> This must be accompanied by a class that provides the

> functionality

> (like getVal, getNextVal and why not setVal).

> Multiple ways of doing the things as well:

> 1) The value is updated/read directly from the

> table(on getVal a SELECT, on getNextVal an UPDATE and

> a SELECT will be executed)

> 2) I use a private member that stores the value. That

> member is set within the contructor to the maximum

> value of the autoinc field in the associated table and

> the methods will read/update that value. In this case

> there is no requirement for a table to store the value

> but some problems might arise.

Nope. Unless you are using J2EE or can otherwise guarantee access will only be via your classes you would have to put it in the database.

>

> PS: I need this because there is no standard solution

> in db engines for such a thing and I need my app run

> on many db engines.

There are solutions that do not require the database.

A "GUID" works if you do not need a field that is ordered. You can search the forums for information on "GUID" and the algorithm to produce it.

You can create a ordered field by using a combination of ip address, timestamp and an incrementing counter in the application. This is not guaranteed to be unique, but if you use a primary key and catch duplicate errors you can regen to get around it. And it will seldom occur.

jschella at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 3

Hello,

> Second one is better. The speed of the look up is

> going to be trivial compared to the java to database

> and back again time, so you might as well put it in a

> single table.

Well, if there are not 1000's of tables that is ok.

If there are 100's I don't know which is better, which could be the fastests? a table with 100's of fields or 100's of tables with one field? This is not a problem for entry level database engines, more likely for big database engines like oracle.

If there are under 100 tables then 2) looks better, I agree.

> Naturally from the database point of view, it would be

> better if this was maintained by triggers on the table

> itself. But if you don't have a id function then the

> database isn't likely to have triggers either.

And then, there might be problems with triggers.

DO all the triggers work the same for all database engines?

> > This must be accompanied by a class that provides

> the

> > functionality

> > (like getVal, getNextVal and why not setVal).

> > Multiple ways of doing the things as well:

> > 1) The value is updated/read directly from the

> > table(on getVal a SELECT, on getNextVal an UPDATE

> and

> > a SELECT will be executed)

> > 2) I use a private member that stores the value.

> That

> > member is set within the contructor to the maximum

> > value of the autoinc field in the associated table

> and

> > the methods will read/update that value. In this

> case

> > there is no requirement for a table to store the

> value

> > but some problems might arise.

> Nope. Unless you are using J2EE or can otherwise

> guarantee access will only be via your classes you

> would have to put it in the database.

I think I can guarantee access only from within my classes but there might still be problems.

I am thinking mainly about concurrent access, the application should be able to host many concurrent users(let's say there should be really no problem when I get 100 concurrent users, and should be able(the app) to handle even more without any problems)

> There are solutions that do not require the database.

>

> A "GUID" works if you do not need a field that is

> ordered. You can search the forums for information on

> "GUID" and the algorithm to produce it.

>

> You can create a ordered field by using a combination

> of ip address, timestamp and an incrementing counter

> in the application. This is not guaranteed to be

> unique, but if you use a primary key and catch

> duplicate errors you can regen to get around it. And

> it will seldom occur.

To be honest, these solutions don't look that professional for me. I think that by using int id's the lookup based on id will be much faster than by using a [var]char id's, and beside will take less space which is important when you have zillions of records in a table or when you want to cache entire tables in memory.

thanks a lot for your ideas :)

roibma at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 4

> This may work but it depends upon it requirement.

> Problem with autoinc it acts neatly on int field but

> not with

> varchar field at least in arrangement if you dont care

> because

> alphabetically ' 10' < ' 1' so your record after 9

> will not be 10

> if you are using varchar to avoid this left pad the

> key with leading zeroes.

The id will be an integer as there is no point making it varchar.

> But that is another issue anyway you are only

> y interested to yield unique values everytime you add

> record. One approach I use is find the max as in x =

> SELECT MAX(ID) FROM TABLE and use x + 1 as the new

> key;

This method is not good when you have a multiple concurrent users.

> Though database stored genkeys are good in other

> r cases such as if youve got prefixing methodology as

> in PO0001 .PO99. where PO is a prefix.

I'm sorry if I don't understand everything here but what do you call here "genkeys"? Also, where should I use the prefix? In id?

greetz,

marius

roibma at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 5

> Hello,

>

> > Second one is better. The speed of the look up is

> > going to be trivial compared to the java to

> database

> > and back again time, so you might as well put it in

> a

> > single table.

> Well, if there are not 1000's of tables that is ok.

> If there are 100's I don't know which is better, which

> could be the fastests? a table with 100's of fields or

> 100's of tables with one field? This is not a problem

> for entry level database engines, more likely for big

> database engines like oracle.

> If there are under 100 tables then 2) looks better, I

> agree.

>

No. If you had a 100 million tables then it might be a problem. Even a table scan with a 1000 records doesn't take that long. And naturally an index removes all problems.

>

> > Naturally from the database point of view, it would

> be

> > better if this was maintained by triggers on the

> table

> > itself. But if you don't have a id function then

> the

> > database isn't likely to have triggers either.

> And then, there might be problems with triggers.

> DO all the triggers work the same for all database

> engines?

>

What do you mean "work the same"? That is like asking "do all java methods work the same".

> I think I can guarantee access only from within my

> classes but there might still be problems.

> I am thinking mainly about concurrent access, the

> application should be able to host many concurrent

> users(let's say there should be really no problem when

> I get 100 concurrent users, and should be able(the

> app) to handle even more without any problems)

>

And what if someone wants to load the database using SQL and the tools that come with the database? Then there will be no way to keep the field in sync.

> > There are solutions that do not require the

> database.

> >

> > A "GUID" works if you do not need a field that is

> > ordered. You can search the forums for information

> on

> > "GUID" and the algorithm to produce it.

> >

> > You can create a ordered field by using a

> combination

> > of ip address, timestamp and an incrementing

> counter

> > in the application. This is not guaranteed to be

> > unique, but if you use a primary key and catch

> > duplicate errors you can regen to get around it.

> And

> > it will seldom occur.

> To be honest, these solutions don't look that

> professional for me.

So the thousands or millions of commercial applications that use them are not professional?

Let me assure you that the computed id is very professional when you are talking about a high speed speed and the difference between two round trips to the database vs one round trip.

> I think that by using int id's

> the lookup based on id will be much faster than by

> using a [var]char id's,

Have you tried it? For most indexed tables on 'normal' databases where the varchar is not very big (less than 100s of bytes) the speed is not measurable.

> and beside will take less

> space which is important when you have zillions of

> records in a table or when you want to cache entire

> tables in memory.

>

Table sizing is an exact science and even with different databases you can make estimates.

Have you sized the database? (If you haven't then that means that sizing is not a concern and should not matter.)

jschella at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 6

> No. If you had a 100 million tables then it might be

> a problem. Even a table scan with a 1000 records

> doesn't take that long. And naturally an index

> removes all problems.

I'll test on this using several db engines.

> What do you mean "work the same"? That is like asking

> "do all java methods work the same".

Mainly I was thinking that SELECT, UPDATE, INSERT has the same basic

syntax in all db engines, while triggers are not even supported in all

db engines.

> So the thousands or millions of commercial

> applications that use them are not professional?

Not really.

They have a good solution, unique ID, relative easy to generate and no

dependence on other id's(like maximum id and stuff).

But I think that uint based id's are somehow better, requiring less

database space(and memory when caching is used), and are slightly faster

on lookup which is not something to throw away.

> Let me assure you that the computed id is very

> professional when you are talking about a high speed

> speed and the difference between two round trips to

> the database vs one round trip.

I guess you reffer here at that additional db operation for updating the

current id in the tables that will store the values for id's. That is

not critically important because a new record is not inserted in tables

very often, most likely in one day several 100's new

records will be cretaed while zillions record will be retrieved.

So retrieving a record is the critical part.

> Have you tried it? For most indexed tables on

> 'normal' databases where the varchar is not very big

> (less than 100s of bytes) the speed is not

> measurable.

If I'll end up using GUID id's then the field storing the id will be a

char because the GUID will have always the same len(36) so no word of

100's of bytes even when a char taker more than one byte(unicode).

> Table sizing is an exact science and even with

> different databases you can make estimates.

>

> Have you sized the database? (If you haven't then

> that means that sizing is not a concern and should not

> matter.)

I did not calculate in details but I think that saving at least 26

bytes(when the id NUMBER field takes 10 bytes for example) on each

record it's something. And there are that many to many relational

tables. Actually quite a lot of them.

roibma at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...
# 7

>

> > What do you mean "work the same"? That is like

> asking

> > "do all java methods work the same".

> Mainly I was thinking that SELECT, UPDATE, INSERT has

> the same basic

> syntax in all db engines, while triggers are not even

> supported in all

> db engines.

>

As I said. It would be better with them but not all support them.

> > So the thousands or millions of commercial

> > applications that use them are not professional?

> Not really.

> They have a good solution, unique ID, relative easy to

> generate and no

> dependence on other id's(like maximum id and stuff).

> But I think that uint based id's are somehow better,

> requiring less

> database space(and memory when caching is used), and

> are slightly faster

> on lookup which is not something to throw away.

Then from that point of view one would say that applications that use databases which do not support autoinc nor triggers are not professional. Lack of autoinc suggests problems that we have seen. Lack of triggers means that referential integrity rules can not be enforced.

Yet I personally do not think that is a good definition of "professional". But to each his/her own.

>

> > Let me assure you that the computed id is very

> > professional when you are talking about a high

> speed

> > speed and the difference between two round trips to

> > the database vs one round trip.

> I guess you reffer here at that additional db

> operation for updating the

> current id in the tables that will store the values

> for id's. That is

> not critically important because a new record is not

> inserted in tables

> very often, most likely in one day several 100's new

> records will be cretaed while zillions record will be

> retrieved.

> So retrieving a record is the critical part.

Then constructing an id would not be critical either.

>

> > Have you tried it? For most indexed tables on

> > 'normal' databases where the varchar is not very

> big

> > (less than 100s of bytes) the speed is not

> > measurable.

> If I'll end up using GUID id's then the field storing

> the id will be a

> char because the GUID will have always the same

> len(36) so no word of

> 100's of bytes even when a char taker more than one

> byte(unicode).

>

My point it that I doubt that you can measure the speed difference for small text fields versus int fields when both are indexed.

> > Table sizing is an exact science and even with

> > different databases you can make estimates.

> >

> > Have you sized the database? (If you haven't then

> > that means that sizing is not a concern and should

> not

> > matter.)

> I did not calculate in details but I think that saving

> at least 26

> bytes(when the id NUMBER field takes 10 bytes for

> example) on each

> record it's something. And there are that many to many

> relational

> tables. Actually quite a lot of them.

No by itself it is not something.

If I store a 100 million records and each record size is 5k it takes 500 gig. That 26 bytes then saves 0.5% of the total storage space or 2.5 gig. Now that might seem significant but when you have to set up a server to handle 500 gig, that extra space becomes irrelevant. Server arrays typically hot swap 80 gig drives.

If I store 1 million records and each record size is 1k it takes 5 gig. That 26 bytes then saves 2.5% of the total storage space or 125 meg. But when was the last time you bought a 20 gig hard drive? You might put that on a web appliance but almost anything else is going to come with more space. So again it does not matter.

Now consider if you were concerned about page sizing. So with a 4k page size and 100 of millions of records, if you could fit another record into each page then that could have a measurable preformance impact. However, with a 1k record (total size of record) it would mean that the record would have to be exactly between 998 and 1024 bytes. Any less and it would not matter since records do not span pages. Any more and you have already exceeded the limit.

And given that you seem to be targetting multiple DBs I doubt that page sizing is a concern, given the variety of ways that it is handled and how paging occurs.

I am not against saving space, but unless there a defined definitive need for it, "optimizations" should never take precedence over clean code. Especially micro optimizations performed because a developer is guessing rather than due to measurements obtained using measurement tools.

(This of course could just be my perspective. To me a growth rate of 100 records a day is rather trivial. 200k to 400k per day, which I have worked on, is more troublesome.)

jschella at 2007-7-7 21:10:49 > top of Java-index,Other Topics,Algorithms...