Optimizations broken for inline code - Mars

I have a problem with a piece of code and wanted to check if it was a known issue or not and if there is a workaround for it.

I have a small piece of code that uses SunStudio inlines running on -xtarget=opteron - xarch=amd64

#include <sys/types.h>

#include <stdio.h>

#include <stdlib.h>

#ifdef __GNUC__

static uint64_t

mul64hi(uint64_t a, uint64_t b)

{

uint64_t lo, hi;

__asm__(

"mulq%3\n\t"

: "=a" (lo), "=d" (hi)

: "0" (a), "rm" (b)

: "cc"

);

return (hi);

}

#else

extern uint64_t mul64hi(uint64_t, uint64_t);

#endif /* __GNUC__ */

int

main(void)

{

uint64_t a = 12535862302449814170ull, b = 12535862302449814170ull;

uint64_t hi;

hi = mul64hi(a, b);

printf("A,B = 0x%016lx, 0x%016lx\n", a, b);

printf("Hi = 0x%016lx\n", hi);

if (hi != 8519001675203524399ull) {

fprintf(stderr, "HI value is incorrect\n");

exit(1);

}

return (0);

}

with the associated inline

.inline mul64hi, 8

movq %rdi,%rax

mulq %rsi

movq %rdx,%rax

.end

The problem is that this code works fine when there are no optimizations used at compile time (resulting in the correct calculation). However, if I try to use -xO1 or greater the compiler breaks the code.

Studio, seems to be busted. Let's see why:

main:55pushq %rbp

main+0x1:48 8b ecmovq%rsp,%rbp

main+0x4:48 83 ec 08subq$0x8,%rsp

main+0x8:41 54 pushq %r12

main+0xa:48 be 9a 4a bb a2 movq$0xadf85458a2bb4a9a,%rsi

58 54 f8 ad

main+0x14: 8b c6 movl%esi,%eax

main+0x16: 48 f7 e6mulq%rsi

main+0x19: 48 8b c2movq%rdx,%rax

main+0x1c: 4c 8b e0movq%rax,%r12

main+0x1f: 48 8d 3d d2 00 00 leaq+0xd2(%rip),%rdi <0x400b98>

00

main+0x26: 48 8b d6movq%rsi,%rdx

main+0x29: 33 c0 xorl%eax,%eax

main+0x2b: e8 00 fe ff ffcall-0x200<printf>

Everything up to offset 0xa is fine. The instruction at offset 0x14 is wrong.

The optimiser (I assume) is being clever - instead of another movq of the constant, it's copying the value into rax from rsi (since we're squaring the number anyway). The problem is it's only moving the low 32 bits using movl not the full 64bits using movq. Then we multiply and movq out, which is ok.

I guessing this has to be a SunStudio bug.

This is part of a much larger program that definitely needs to be heavily optimized as it's performance unoptimized is about 1/6th of what I expect it to be.

Any advice on workarounds or if this is a known bug would be appreciated. I haven't logged it as a bug yet.

BTW this same problem occurs on a fully patched SunStudio 11 as well as the SunStudio 12 EA.

Message was edited by:

RobGiltrap

[2897 byte] By [RobGiltrapa] at [2007-11-27 1:15:17]
# 1
Sounds like a bug to me. Can you file it using bugs.sun.com?You might also try using the new gcc-style inline functionsin Sun Studio 12 EA.
ChrisQuenellea at 2007-7-11 23:50:37 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2
Done... [b]Your Report (Review ID: 949821) - 64bit code with inline breaks when optimized.[/b]For my main piece of code it didn't like the GCC part. Still looking into it.
RobGiltrapa at 2007-7-11 23:50:37 > top of Java-index,Development Tools,Solaris and Linux Development Tools...