casting int to a byte -- retreiving the lowest 8 bits

Yo, thought i'd try to open up some debate on this topic. I wrote a data feed that needed to grab the lower 8 bits out of an int, so i wrote what i believed to be the standard way of doing this, which was effectively

publicstaticbyte intToByte(int c ){

return (byte) (c & 0xff);

}

A colleague looking through my code asked whether the & 0xff was necessary, and did i believe that there was any value of c for which it made a difference, as he believed that this was equivalent to just

return (byte)c;

My immediate thought was worrying about byte in java being signed, and having data screwed up, so i ran some tests.. and on every value i tried (byte)c is indeed equivalent to (byte)( c & 0xff ).

My argument was to be that (byte)( c & 0xFF ); is great code to read as a maintainer, because it;'s immediately obvious that you are strictly interested in the lowest 8 bits of the int, and nothing else is of importance, and a simple (byte)c; can look naive and make every developer looking at the code for the first time think it's incorrect.

However, i knew his comeback would be that the datafeed has an overriding need for speed, so i ran some tests comparing the repeated operation of (byte)c to (byte)(c & 0xff ) over a range of 100,000 numbers (test repeated several times to obviate startup times). It turned out that doing the & 0xff added about 30% to execution time on my machine (java 1.5 on WinXP). That's quite a severe penalty for a very common operation! I think i'm going to change the code to cast straight to a byte and leave a big comment beforehand explaining how it's equivalent to (byte)(c & 0xff );

This got me wondering how it was implemented in the core java libraries though, since OutputStream has a method to write a byte that actually takes an int parameter. How does this work? Most of the lowest level OutputStream implementations seem to end up going to native to do this (understandably), so i dug out ByteArrayOutputStream. This class does optimise away the & 0xFF and is roughly

publicsynchronizedvoid write(int b){

?buf[count] = (byte)b;

?}

No problems with that, so writing to these babies will be fast. But then i started wondering about the methods of DataOutputStream, which is heavily used by use for serialising (a great deal of) internal data flow. Unfortunately in this class there are a lot of redundant & 0xFFs:

publicfinalvoid writeShort(int v)throws IOException{

out.write((v >>> 8) & 0xFF);

out.write((v >>> 0) & 0xFF);

incCount(2);

}

publicfinalvoid writeInt(int v)throws IOException{

out.write((v >>> 24) & 0xFF);

out.write((v >>> 16) & 0xFF);

out.write((v >>> 8) & 0xFF);

out.write((v >>> 0) & 0xFF);

incCount(4);

}

[The v >>> 0 seems to be optimised out at runtime and i get no execution time difference between ( v >>> 0) & 0xff that and ( v & 0xff ) so i got no problems with that]

which again seems ok on inspection because the code looks tidy and clean and easy to understand, but i need to hit these things very heavily so would rather they were 30% faster than easy to read. Interestingly they've taken an entirely different approach for writing out a long value:

publicfinalvoid writeLong(long v)throws IOException{

writeBuffer[0] = (byte)(v >>> 56);

writeBuffer[1] = (byte)(v >>> 48);

writeBuffer[2] = (byte)(v >>> 40);

writeBuffer[3] = (byte)(v >>> 32);

writeBuffer[4] = (byte)(v >>> 24);

writeBuffer[5] = (byte)(v >>> 16);

writeBuffer[6] = (byte)(v >>> 8);

writeBuffer[7] = (byte)(v >>> 0);

out.write(writeBuffer, 0, 8);

incCount(8);

}

both using a private buffer field for the writing before squirting it all out, and not bothering to mask the lower 8 bits. It seems strange that writeLong appears optimised, but writeInt and writeShort are not.

What does everyone else think? Are there any other heavy users of DataOutputStream out there that would rather have things written faster? I guess i'm going to be writing my own version of DataOutputStream in the meantime, because we're writing so much data over these and i'm in an industry where milliseconds matter.

[5971 byte] By [sorabaina] at [2007-10-3 3:56:43]
# 1

To my knowledge, in your situation, the & 0xFF is not necessary. I believe that the most common use of the mask in this case is actually to treat the lower 8 bits as an unsigned integer. For example:

int value = 65530;

int anotherValue = value & 0xFF; // anotherValue == 250

Anyway, the case space is small enough that I just brute forced your problem. Under v1.5.0_07 on Linux, (byte)i and (btye)(i & 0xFF) are definitely the same:

for (int i=Integer.MIN_VALUE;i<Integer.MAX_VALUE;i++)

{

byte a = (byte)i;

byte b = (byte)(i & 0xFF);

if (a!=b) System.out.println(i);

}

byte a = (byte)(Integer.MAX_VALUE);

byte b = (byte)(Integer.MAX_VALUE & 0xFF);

if (a!=b) System.out.println(Integer.MAX_VALUE);

Perhaps the & 0xFF was in response to some older bug or other behavior.>

tvynra at 2007-7-14 21:54:59 > top of Java-index,Core,Core APIs...
# 2

Actually, I have just remembered what I use & 0xFF for:

byte b = ...;

int i = b & 0xFF;

Converting the other way and keeping sign expansion from making all the higher bits negative. Consider:

int i = (b & 0xFF) | ((b2 & 0xFF) << 8);

tvynra at 2007-7-14 21:54:59 > top of Java-index,Core,Core APIs...
# 3
Sign extension is almost always your enemy when doing this kind of IO. As tvynr says, the 0xFF stuff is about making sure the JVM doesn't do sign extension when you don't want it.I can't help thinking the ejp will have something valuable to add any minute now...
dannyyatesa at 2007-7-14 21:54:59 > top of Java-index,Core,Core APIs...
# 4

Yeah, i've used the 0xFF mask a lot when going "the other way"

It just seemed weird to me that in some places DataOutputStream appeared to use 0xFF to go from int to byte, and in other places it did not; but on further inspection i guess that was because when it was doing writeLong it was placing intermediate bytes into an array of bytes, so it "knew" that it didn't need to do the 0xFF, but with writeShort and writeInt it was calling the standard write( int ) method, so i guess it was "being nice" and masking out just the bits it wanted writing... but it's still a waste of time since the javadoc for write( int ) specifies that it'll just write out the lower 8 bits, so masking off the upper ones appears to be an ever bigger waste of time if speed is important to you

sorabaina at 2007-7-14 21:54:59 > top of Java-index,Core,Core APIs...