Getting full SSE performance from Java

Like most C++ compilers, Java is not able to take full advantage of the SSE capabilities on the Intel/AMD CPUs. Other CPUs (PowerPC, Cell) have similar features. Future CPUs are likely to take advantage of these special instructions even more. The current difference in floating point performance between using these instructions and not using them is 3x-4x! The difference will probably increase with future CPUs, such as AMD's Fusion.

The main reason compilers/JVMs can't create efficient SSE code is that SSE requires 16 byte alignment yet neither Java or C++ have facilities to tell the compiler when data is guaranteed to have 16 byte alignment.

It would seem that an easy way around this limitation is to create a simple class, similar to F32vec4 used in C++. This class is essentially just a wrapper for the aligned 4 floating point values that SSE instructions like to operate on.The class will have to be part of the core language to ensure the alignment and so the compiler knows it can safely take advantage of the alignment.

So, what are the chances of getting these special classes in Java to take advantage of these new chip features?

Appendix: Here are the main elements of the F32vec4 class:

class F32vec4

{

protected:

align(16)float vec[4];// this is just 4 floating point values aligned

public:

/* Constructors: __m128, 4 floats, 1 float */

F32vec4(){}

/* initialize 4 SP FP with __m128 data type */

F32vec4(__m128 m){ vec = m;}

/* initialize 4 SP FPs with 4 floats */

F32vec4(float f3,float f2,float f1,float f0){ vec= _mm_set_ps(f3,f2,f1,f0);}

/* Conversion functions */

operator __m128()const{return vec;}/* Convert to __m128 */

/* Logical Operators */

friend F32vec4 operator &(const F32vec4 &a,const F32vec4 &b){return _mm_and_ps(a,b);}

friend F32vec4 operator |(const F32vec4 &a,const F32vec4 &b){return _mm_or_ps(a,b);}

friend F32vec4 operator ^(const F32vec4 &a,const F32vec4 &b){return _mm_xor_ps(a,b);}

/* Arithmetic Operators */

friend F32vec4 operator +(const F32vec4 &a,const F32vec4 &b){return _mm_add_ps(a,b);}

friend F32vec4 operator -(const F32vec4 &a,const F32vec4 &b){return _mm_sub_ps(a,b);}

friend F32vec4 operator *(const F32vec4 &a,const F32vec4 &b){return _mm_mul_ps(a,b);}

friend F32vec4 operator /(const F32vec4 &a,const F32vec4 &b){return _mm_div_ps(a,b);}

}

[4471 byte] By [cstork9a] at [2007-11-26 21:45:25]
# 1

> The main reason compilers/JVMs can't create efficient

> SSE code is that SSE requires 16 byte alignment yet

> neither Java or C++ have facilities to tell the

> compiler when data is guaranteed to have 16 byte

> alignment.

I don't think a JVM needs the Java compiler or developer

to tell it anything about alignment. How data is represented

in the JVM is an implementation detail and the JVM can

do whatever it pleases.

I think your chance of seeing changes in this area would

improve if you write a benchmark that demonstrates poor

performance of the HotSpot VM and submit a bug report

about your observations at bugs.sun.com.

PeterAhea at 2007-7-10 3:34:08 > top of Java-index,Developer Tools,Java Compiler...
# 2

A benchmark is what led to this post. I had some Java experts tune the Java code.The C++ code using the F32vec4 class gave 3-4x performance improvement. The standard C++ code without the F32vec4 class gave the same results as the Java.

As we are indexing through the arrays, I don't see how the JVM can tell that I am being smart to specify operations in units of 16 bytes. I need language facilities in Java, similar to those in C++, that enable me to specify the 16 byte operations and that I am honoring alignment.

Yes, I should create a simple benchmark.But, this is pretty basic SSE stuff. I'm surprised it hasn't come up before.

cstork9a at 2007-7-10 3:34:09 > top of Java-index,Developer Tools,Java Compiler...
# 3

It may have come up before but you are bringing it up on a Java compiler forum :-)

Since this is a JVM issue this is not the best place to get information. If you don't want

to submit a bug, you could try to search Sun's bug database. I recommend using

Google and "site:bugs.sun.com".

PeterAhea at 2007-7-10 3:34:09 > top of Java-index,Developer Tools,Java Compiler...