byte[ ] to String

I'm trying to convert byte[] to (ASCII) String with following code:

public String convert(byte[] b){

int l = b.length;

char[] chAr =newchar[l];

for(int i = 0;i < l ; i++){

chAr[i] = (char)b[i];

}

returnnew String(chAr);

}

Please let me know the faster approach.

It may seem very simple. It's not that I am just asking without giving a try.

I did try many approaches including new String(byte[] ), StringBuffer, CharBuffer, ByteBuffer... static ASCII look up table

This one yield the fastest.

I m really wondering how to avoid looping around the loop.

Please do not reply

- to avoid converting byte[] to String (even though i can avoid, I still wanna know)

- without trying yourself.

- unless you really want to help and discuss.

I am just an (above the) novice, but I really want advice from guru. : )

No kidding.

Thanks and Regards,

JPutu

[1454 byte] By [JPutua] at [2007-10-1 0:25:04]
# 1
Did you try // constructor public String(byte[] bytes, String charsetName)new String(buffer,"US-ASCII");
BIJ001a at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 2

This really is the fastest way or close to it.

This is higly unlikely to be the slowest section of code in any real program. Unless it is for interest, it is not work worring about.

The only note is that using CharSets is safer but can be somewhat slower. e.g US-ASCII

Have alook at the source for new String(byte[] bytes, byte hibyte) in src.zip

This does exactly what you have here. There are small diiferences but basically the same.

for (int i = count ; i-- > 0 ;) {

value[i] = (char) (ascii[i + offset] & 0xff);

}

Testing against 0 is marginally faster..

Note a byte is signed so byte > 127 will map to high characters whicy may not be desirable.

byte b = (byte) 255;

char ch = (char) b;

System.out.println("ch="+(int) ch);

Prints 65535.

Peter-Lawreya at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 3

> I m really wondering how to avoid looping around the loop.

Anything that does what you ask is going to have to look at each of the bytes in the array. And to do that it is going to include a loop.

So I assume you don't want this loop in your code, but you don't mind if it's in some library code that you can call. If that's the case, then BIJ001's code is what you want.

DrClapa at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 4
Thanks.It reminds me of a Java myth that looping down is very very slightly faster than looping up. Thanks indeed for reminding me of byte is signed. This I need to be aware of.
JPutua at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 5
Rule of thumb: don't invent a wheel when you have a circular, disc-shaped thing with an axle through it laying at your side.byte[] dataByte = {0x12, 0x34, 0x56};String dataStr = new String(dataByte);-Bryan
bjb1440a at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 6
Yeah, right. Now i have seen some side-effects on doing this. I decided to use the Java API. Thank you all.
JPutua at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 7
String str = new String(my_bytes,"ISO-8859-1");That's all you need.al912912
jorchia at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 8
> > String str = new String(my_bytes,"ISO-8859-1");> That is not the same encoding as "US-ASCII". And based on the OP "US-ASCII" is best.
jschella at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 9

> That is not the same encoding as "US-ASCII". And

> based on the OP "US-ASCII" is best.

US Ascii used only 7 bits (numbers from 0 to 127), Latin-1, or ISO-8859-1, uses 8 bits (from 0 to 255), and has the first 127 characters equals to those of ascii.

That's why Latin-1 is also known as Extended Ascii, and that's the character encoding the systems that use ascii have: Linux, DOS(other systems use unicode).

Anyway, it is more convinient also because it does something predictable with the whole byte, and not jujst fdor the first 7 bits, and it will give the same result for the first seven bits.

I have used Latin-1 in some java apps when I was printing to a parallel port printer hiwch suposedly understood ascii and it worked like a charm. Specially when using the extended symbols like ? or ? ? ?(Extended Ascii numbers:164,160,130,163, respectively).

--

al912912

jorchia at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 10

> > That is not the same encoding as "US-ASCII". And

> > based on the OP "US-ASCII" is best.

>

> US Ascii used only 7 bits (numbers from 0 to 127),

> Latin-1, or ISO-8859-1, uses 8 bits (from 0 to 255),

> and has the first 127 characters equals to those of

> ascii.

>

> That's why Latin-1 is also known as Extended Ascii,

> and that's the character encoding the systems that

> use ascii have: Linux, DOS(other systems use

> unicode).

What makes you think that I did not already understand that?

And exactly what from the OP makes you think that that is relevant?

>

> Anyway, it is more convinient also because it does

> something predictable with the whole byte, and not

> jujst fdor the first 7 bits, and it will give the

> same result for the first seven bits.

>

It has nothing to do with that. A character set is a character set. It doesn't matter how many bits or bytes that it uses.

> I have used Latin-1 in some java apps when I was

> printing to a parallel port printer hiwch suposedly

> understood ascii and it worked like a charm.

> Specially when using the extended symbols like ? or

> ? ? ?(Extended Ascii numbers:164,160,130,163,

> respectively).

Myself I prefer to actually use the encoding that is required rather than guessing and hoping nothing bad happens.

jschella at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 11

> Myself I prefer to actually use the encoding that is

> required rather than guessing and hoping nothing bad

> happens.

Using Ascii would be hoping nothing bad happens (when you use bytes of values not defined in US-ASCII ).

Using Latin-1 would be extending your encoding to some not standard english characters that your clients might want to use, and keeping it totally US-ASCII compatible anyway. My name is "Jorge Guzm醤", not "Jorge Guzman", and if I'm filling an important form, I would want it right.

al912912

jorchia at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 12

> > Myself I prefer to actually use the encoding that is

> > required rather than guessing and hoping nothing bad

> > happens.

>

> Using Ascii would be hoping nothing bad happens (when

> you use bytes of values not defined in US-ASCII ).

If the "requirement" states ascii then the point is not relevant.

>

> Using Latin-1 would be extending your encoding to

> some not standard english characters that your

> clients might want to use, and keeping it totally

> US-ASCII compatible anyway. My name is "Jorge

> Guzm醤", not "Jorge Guzman", and if I'm filling an

> important form, I would want it right.

Certainly and if my clients want to pay me 100 million dollars then I am going to deliver what ever they want.

But since they usually want to pay me a lot less than that, then that means that I must deliver what is agreed upon and not some guess which has an equal chance (or more so) to be wrong as it is to be right. Because if I guess wrong someone is going to have to pay both in time an money to fix it.

jschella at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 13

> Certainly and if my clients want to pay me 100

> million dollars then I am going to deliver what ever

> they want.

Good luck finding those clients.

> But since they usually want to pay me a lot less than

> that, then that means that I must deliver what is

> agreed upon and not some guess which has an equal

> chance (or more so) to be wrong as it is to be right.

> Because if I guess wrong someone is going to have to

> o pay both in time an money to fix it.

Generally your are the tech savvy, not the client, and here is where you advise your client, it wouldn't hurt at least asking them and explaining the advantages of Latin-1 vs Ascii.

They hired you to make the best product, it's your responsability here.

Anyway, this is the last reply I make for the discussion has gone way off topic and it's most probable outcome would be a never ending biased discussion in which none of us will really put any attention to the other guy's points.

On the other hand, I'm not getting anything from it.

Nice talking to you.

--

al912912

jorchia at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 14

>

>

> Anyway, this is the last reply I make for the

> discussion has gone way off topic and it's most

> probable outcome would be a never ending biased

> discussion in which none of us will really put any

> attention to the other guy's points.

>

> On the other hand, I'm not getting anything from it.

Yes but hopefully if someone in the future comes across it then they might see that there is indeed a difference between the two encodings and if they want ASCII then then use the ascii encoding and not something else.

jschella at 2007-7-7 16:10:27 > top of Java-index,Other Topics,Algorithms...
# 15
this might give u an idea how to convert byte code to string of ASCIISpecify an encoding when using the getBytes method to convert a String to a ByteArray:String myString = new String(myBytes, "8859_1");
cilent2004a at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 16

> this might give u an idea how to convert byte code to

> string of ASCII

> Specify an encoding when using the getBytes method to

> convert a String to a ByteArray:

> String myString = new String(myBytes, "8859_1");

And presumably you also understand that that is not an ASCII encoding.

jschella at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 17

> Thanks.It reminds me of a Java myth that looping down

> is very very slightly faster than looping up.

If it is, you'd need a helluva loop to start noticing the difference.

Subtraction on most computers, i.e. the ones using two's complement, is addition of the complement and one. That is,

a - b

is the same as

a + (~b + 1)

The decrement operator may circumvent that, but I don't know.

In any case, make something that works, make it understandable, and then think about making optimizations to the parts that need it. But don't make confusing code because it might shave a picosecond off your run-time.

~Cheers

Adeodatusa at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 18

> In any case, make something that works, make it

> understandable, and then think about making

> optimizations to the parts that need it. But don't

> make confusing code because it might shave a

> picosecond off your run-time.

Sorry about this part, I thought you wrote that loop >.<

~Cheers

Adeodatusa at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 19

> > Thanks.It reminds me of a Java myth that looping

> down

> > is very very slightly faster than looping up.

>

> If it is, you'd need a helluva loop to start noticing

> the difference.

>

>

> Subtraction on most computers, i.e. the ones using

> two's complement, is addition of the complement and

> one. That is,

> a - b

> is the same as

> a + (~b + 1)

> The decrement operator may circumvent that, but I

> don't know.

Huh? On every processor architecture I'm aware of ADD takes the same amount of time as SUB, and INC takes the same amount of time as DEC. The looping-down-is-faster "myth" comes about because on some processors comparing with zero is faster than comparing with a variable. I don't think this is an issue with newer processor designs.

RadcliffePikea at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 20

> > > Thanks.It reminds me of a Java myth that looping

> > down

> > > is very very slightly faster than looping up.

> >

> > If it is, you'd need a helluva loop to start

> noticing

> > the difference.

> >

> >

> > Subtraction on most computers, i.e. the ones using

> > two's complement, is addition of the complement

> and

> > one. That is,

> > a - b

> > is the same as

> > a + (~b + 1)

> > The decrement operator may circumvent that, but I

> > don't know.

>

> Huh? On every processor architecture I'm aware of ADD

> takes the same amount of time as SUB

It's a trivial difference, to be sure. The tests I ran are inconclusive, and I'm not so sure the compiler isn't just optimizing my subtraction of ten into addition of negative ten right off the bat.

That is, turning

x - 10

into

x + 0xfffffff6

But, logically,

a + (~b + 1)

should be slower than

a + b

And it is. After all, the only two things a computer can do are add and compare ;~)

~Cheers

Adeodatusa at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 21

> It's a trivial difference, to be sure. The tests I

> ran are inconclusive, and I'm not so sure the

> compiler isn't just optimizing my subtraction of ten

> into addition of negative ten right off the bat.

> That is, turning

> x - 10

> into

> x + 0xfffffff6

>

> But, logically,

> a + (~b + 1)

> should be slower than

> a + b

>

> And it is. After all, the only two things a computer

> can do are add and compare ;~)

>

> ~Cheers

<rimshot/>

RadcliffePikea at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 22
I can't believe how this topic took off. There's something to be said about missing the forest for the trees. At this point we are looking at the trees through a microscope.
bjb1440a at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 23
Cite from Intel developer manual IIRCADD 0.5SUB 0.5Both ALUs has these instructions hardwired.
Lord_of_the_chaosa at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 24

> Cite from Intel developer manual IIRC

> ADD 0.5

> SUB 0.5

>

> Both ALUs has these instructions hardwired.

Certainly, but that's not the way to go when looping on an Intel x86 machine, they have the LOOPxx instructions, all of which DECREMENT CX (a certain register) and then check for a some condition on the flags (equal zero, not equal zero, greater than zero, etc...)

There is similar instruction that increments and the equivalent of

MOVCX,10

; Do whatever you loop does.

;

LOOPfoo

would be

MOV CX, 0

; This is the bad way to do it

ADD CX,1

CMP CX,10

JMPZ foo

Now that does seem longer doesn't it?

Of course, all of this this applies if you have a good optimizer in your compiler so it uses the LOOP instruction.

al912912

jorchia at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...
# 25

Again, we're splitting hairs with this topic.

jorchi,

I'm pretty sure LOOP is an assembler feature. In other words, LOOP is assembled into something like

MOV CX, [foo];

labelXYZ:

...

DEC CX;

JNZ labelXYZ.

But again, we're splitting hairs here.

-Bryan

bjb1440a at 2007-7-20 1:47:03 > top of Java-index,Other Topics,Algorithms...