Optimizations broken for inline code - Mars
I have a problem with a piece of code and wanted to check if it was a known issue or not and if there is a workaround for it.
I have a small piece of code that uses SunStudio inlines running on -xtarget=opteron - xarch=amd64
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#ifdef __GNUC__
static uint64_t
mul64hi(uint64_t a, uint64_t b)
{
uint64_t lo, hi;
__asm__(
"mulq%3\n\t"
: "=a" (lo), "=d" (hi)
: "0" (a), "rm" (b)
: "cc"
);
return (hi);
}
#else
extern uint64_t mul64hi(uint64_t, uint64_t);
#endif /* __GNUC__ */
int
main(void)
{
uint64_t a = 12535862302449814170ull, b = 12535862302449814170ull;
uint64_t hi;
hi = mul64hi(a, b);
printf("A,B = 0x%016lx, 0x%016lx\n", a, b);
printf("Hi = 0x%016lx\n", hi);
if (hi != 8519001675203524399ull) {
fprintf(stderr, "HI value is incorrect\n");
exit(1);
}
return (0);
}
with the associated inline
.inline mul64hi, 8
movq %rdi,%rax
mulq %rsi
movq %rdx,%rax
.end
The problem is that this code works fine when there are no optimizations used at compile time (resulting in the correct calculation). However, if I try to use -xO1 or greater the compiler breaks the code.
Studio, seems to be busted. Let's see why:
main:55pushq %rbp
main+0x1:48 8b ecmovq%rsp,%rbp
main+0x4:48 83 ec 08subq$0x8,%rsp
main+0x8:41 54 pushq %r12
main+0xa:48 be 9a 4a bb a2 movq$0xadf85458a2bb4a9a,%rsi
58 54 f8 ad
main+0x14: 8b c6 movl%esi,%eax
main+0x16: 48 f7 e6mulq%rsi
main+0x19: 48 8b c2movq%rdx,%rax
main+0x1c: 4c 8b e0movq%rax,%r12
main+0x1f: 48 8d 3d d2 00 00 leaq+0xd2(%rip),%rdi <0x400b98>
00
main+0x26: 48 8b d6movq%rsi,%rdx
main+0x29: 33 c0 xorl%eax,%eax
main+0x2b: e8 00 fe ff ffcall-0x200<printf>
Everything up to offset 0xa is fine. The instruction at offset 0x14 is wrong.
The optimiser (I assume) is being clever - instead of another movq of the constant, it's copying the value into rax from rsi (since we're squaring the number anyway). The problem is it's only moving the low 32 bits using movl not the full 64bits using movq. Then we multiply and movq out, which is ok.
I guessing this has to be a SunStudio bug.
This is part of a much larger program that definitely needs to be heavily optimized as it's performance unoptimized is about 1/6th of what I expect it to be.
Any advice on workarounds or if this is a known bug would be appreciated. I haven't logged it as a bug yet.
BTW this same problem occurs on a fully patched SunStudio 11 as well as the SunStudio 12 EA.
Message was edited by:
RobGiltrap

