I need speed hints

Hi,

Here, one part of source code.

privatestaticfinalbyte ASS_DATA_SIZE = (byte)116;

privatestaticfinalbyte ASS_SIGN_SIZE = (byte)16;

privatebyte[] mData;

privatebyte[] mSignature;

privatebyte[] mSignatureTemp;

...

public Signature()

{

mData = JCSystem.makeTransientByteArray(ASS_DATA_SIZE, JCSystem.CLEAR_ON_DESELECT);

mSignature =JCSystem.makeTransientByteArray(ASS_SIGN_SIZE, JCSystem.CLEAR_ON_DESELECT);

mSignatureTemp =JCSystem.makeTransientByteArray(ASS_SIGN_SIZE, JCSystem.CLEAR_ON_DESELECT);

}

....

privateboolean check()

{

Util.arrayCopyNonAtomic( mData, (short)0, mSignatureTemp, (short)0, ASS_SIGN_SIZE);

for(byte i = 16, j = 0; i < ASS_DATA_SIZE; i++)

{

j++;

}

returntrue;

}

In function check, it takes about 60 miliseconds.

If I remove the loop, it takes 20 mili and It looks like

this.

privateboolean check()

{

Util.arrayCopyNonAtomic( mData, (short)0, mSignatureTemp, (short)0, ASS_SIGN_SIZE);

returntrue;

}

And check is this way takes about 160 miliseconfs.

privateboolean check()

{

Util.arrayCopyNonAtomic( mData, (short)0, mSignatureTemp, (short)0, ASS_SIGN_SIZE);

for(byte i = 16, j = 0; i < ASS_DATA_SIZE; i++)

{

j++;

//mSignatureTemp[j++] = mData[i];

//if ( j == (byte)16)

//j = 0;

}

returntrue;

//return (Util.arrayCompare( mSignatureTemp, (short)0, mSignature, (short)0, ASS_SIGN_SIZE ) == 0);

}

My question is, Can a loop spend so much time ( 40 miliseconds) ?

And a loop and a xor takes 160 mili.

How can I speed up it ? or is it normal ?

[3508 byte] By [bronze-starDukes] at [2007-11-26 12:10:04]
# 1
How did you measure this times?The first loop and third loop look the same. Where is the xor?
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 2

> How did you measure this times?

It starts when I send the first byte of the apdu.

It finishes when I receive the last byte from card.

> The first loop and third loop look the same. Where is

> the xor?

sorry, I make a mistake....

it's what I mean to write.... :-)

private boolean check()

{

Util.arrayCopyNonAtomic( mData, (short)0, mSignatureTemp, (short)0, ASS_SIGN_SIZE);

for( byte i = 16, j = 0; i < ASS_DATA_SIZE; i++)

{

mSignatureTemp[j++] ^= mData[i];

if ( j == (byte)16)

j = 0;

}

return true;

}

bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 3

Try to take your loops apart:

private boolean check()

{

..

mSignatureTemp[0] ^= mData[0];

mSignatureTemp[1] ^= mData[1];

mSignatureTemp[2] ^= mData[2];

..

mSignatureTemp[16] ^= mData[16];

j = 0;

mSignatureTemp[17] ^= mData[17];

mSignatureTemp[18] ^= mData[18];

..

mSignatureTemp[116] ^= mData[116];

You have to think much more about the HW. Code OO style --> see if it works the way you want --> optimize, this means

- reuse objects

- put as much as you can into RAM

- think assembler like --> spaghetti code

- first variable in RAM has fastest access (JCOP/SmartMX specific) --> reuse this variable (in your case it is )byte i = 16

?)

- work with the APDU buffer

bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 4
Ok, put put all each line in source code ( Hard coded) and now it takes about 90 miliseconds.It's better, but it is far away that I need.Actually, I would like that is about 45 miliseconds.
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 5
And now squeeze everything intp the process method.
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 6
Thanks... Ok. I put all together.But the time was the same. ( about 90 miliseconds )
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 7

This is just very expensive code. Have a look at the byte code: Per line, there will be two baload and one bastore statement. Each of them will make the VM do a bounds check. Some things just can't be coded efficiently in Java...

@lexdabear:

Accessing the fist 4 variables is not just a JCOP/SmartMX specific optimization, as there are special bytecodes for the first 4 locals.

bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 8

That was the next hint I wanted to give: Check out the bytecode. There is a bytecode view in the JCOP Tools ..

@mkdata:

Yep you're right, but I do not know how many registers are pre-used by the OS. In JCOP/SmartMX the first three are system pointers so only the first local can be used which is stored in faster RAM. (Please correct me if I am wrong.)

bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 9
@lexdabear:Of course, even if I could, I would not disclose VM internals... :)However, it think it's unlikely that locals would be stored in registers. In the JCVM, most operations happen on the operand stack, so that would be the first thing I'd map to registers.
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 10
@mkdata:Heh, we do not want to get you into trouble. :) Good point about the operand stack.
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 11
> However, it think it's unlikely that locals would be> stored in registers. In the JCVM, most operations> happen on the operand stack, so that would be the> first thing I'd map to registers.How can we "map to register" ?
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...
# 12
We can't. Only VM implementors can.
bronzestar at 2007-7-7 13:47:46 > top of Java-index,Archived Forums,Socket Programming...