GNU Octave on UltraSparc IV
hi there,
i'm kinda new to Sun's Ultrasparc machines. My background is in scientific computing and at my workplace, we're doing quite a lot of scientific computing. if there was a topic on math libraries or scientific computing, this post would be more suited to that topic. Moderators, if u feel like moving my post to a more appropriate topic, please feel free to do so.
we're having some problems at the office, some of our Matlab code which runs on an (hmm, is this allowed?hope it's not a bad word in this forum) Intel Xeon was going really slow. So we decided to let it work on our Sun ULtrasprac IV++ machines. But of course, we couldn't afford another licence for Matlab, so we decided to use the GNU Octave instead.
now i know it's not a good comparison, one is a commercial numerical software solution, one is a GNU version. But i do know that our sun machine is definitely more powerful than the Xeon, it has far more RAM (32GB compared to just 4GB), and more CPUS too, (4 dual core ULtraSparc IV++, compared to a 2 dual core Xeon). Xeon was running Windows Server Edition while the V490 was running SOlaris 10.
We did a simple dry run and found out that the Octave on Sun version was far slower than the Matlab on Xeon. what took 144 seconds for Xeon took 9700 seconds for Sun. Those results were really shocking to us. but like i said earlier, Octave and Matlab are two totally different numerical software solutions, we just thought that the hardware would make up for the difference.
The thing is, most of my colleagues are die-hard Matlab fans, and not too keen to switch to C and use it on the Sun Machine. The next alternative would be Octave which is very closely related to Matlab.
My question to this forum, and to all at SUn would be, why is Octave so much slower on an ULtraSparcIV++ machine? Those this have to do with OCtave's binaries? if so, what can we do to improve and allow Octave to make full use of the ULtraSparc architecture?
cheers
julian aka bus_wrecker
the malaysian lost in tiny singapore
ps- to be fair to Sun, we took the same matlab code, converted it to a C program, included OpenMP pragmas to it, and the V490 blasted the Xeon to bits. I won't tell you the performance benchmarks, but it's almost unbelieveable.
# 1
Whew, where to start. on this one...
First, let me point out the postscript, which I almost didn't notice the first time I read this message:
.......ps- to be fair to Sun, we took the same matlab code, converted it to a C program, included OpenMP pragmas to it,
.......and the V490 blasted the Xeon to bits. I won't tell you the performance benchmarks, but it's almost unbelieveable.
I assume that to make this comparison more fair, when you say "the Xeon" you mean that you compiled the same C code to run on the Xeon (not Matlab on the Xeon). Escpecially parallelized with OpenMP, one would expect the US IV++ machine to do a lot better than the Xeon box.
So the postscript was helpful in highlighting how little can be read into the original comparison.
For a really accurate comparison, one would only vary one parameter, re-run the software, and measure the difference.In the posted comparison, how many factors were varied? At least:
.........1. instruction set architecture (x86 vs. SPARC v9)
.........2. processor implementation (semiconductor technology, cache sizes, etc, etc) (Xeon vs. UltraSPARC IV++)
.........3. processor number (2 dual-core vs. 4 dual-core; but per #7 below, we don't know how many were actually in use)
.........4. application (Matlab vs. Octave)
.........5. compilers (don't know what was used to compile Matlab or Octave)
.........6. compiler options (don't know what levels of optimization were used when Matlab and Octave were compiled)
.........7. parallelization (how parallelized/threaded are the Matlab and Octave binaries that were used?)
.........8. system memory (4GB vs. 32GB)
.........9. operating system (Windows vs. Solaris)
.......10. processor clock speed (unknown vs. 2+GHz)
(what other six factors am I missing? ;-))
Any one of these factors could have a large impact on performance -- multiply them all together and there can be a huge net difference.
#3, #8, and #9 would certainly only run in favor of the UltraSPARC machine (except #3 would be moot if neither application was threaded/parallelized).
#10 probably slightly favored the Xeon box.
#1 probably doesn't matter much in this case.
The others could have swayed either way, "depending".
If one assumed (and this is a pretty iffy assumption!) that the the two applications (Matlab & Octave) were internally similar ...... then if Matlab was parallelized and Octave was not, factor #6 alone could have caused much of the difference you saw.
Professor Renau's suggestion of running Octave on the Xeon box would be another "sanity check" you can perform. At least you'd remove variation #3 from the mix :-).
# 2
Julian:
The reason for any particular performance result is utterly measurable and knowable on any Solaris system, whether it's on SPARC or x64 architecture. A delta as large as (9700/144) 67X certainly cannot hide very long at all from a knowledgeable performance analyst! David Weaver's reply gives good direction for starters.
If you'd like, I'd be happy to engage in a casual off-alias collaboration to help you meet your objectives. Even though your comparison has an "apples versus oranges" element to it, 67X is certainly not a relative performance factor explained by the hardware differences alone. While I would expect a modern Opteron/Xeon chip to be ideal for many numerical algorithms - especially if they are single-threaded - that's not to say you shouldn't be able to get good results from your SPARC systems, too.
Here, I'll offer some pointers to useful tools, and also offer some contextual perspectives.
Perspective
First, I must note that you have - probably unintentionally - committed the "Fallacy of Complex Question" in asking "... why is Octave so much slower on an UltraSparcIV++ machine?" Your question incorrectly extrapolates your single empirical result to the generic case. It might turn out that Octave is generally slower on box A versus box B, but it would require much more characterization work to make that determination or to properly quantify it - and that work should only come after the "usual suspects" have been vetted from your experiment and the system under test. I can certainly understand the alarm that can come from an empirical result that is 67X different, but it still all boils down to the particulars of what you are doing, how you are doing it, and under what circumstances.
Regarding your phrasing about the relative "power" of different systems, the most common analogy here is between a bus and a sports car. Buses are more powerful, but that does not imply the go up hills faster than sports cars. Neither does it imply that buses cannot keep the pace on the highway. For single-threaded workloads, extra CPUs and cores are irrelevant. It's also normal to expect that having memory in excess of your requirements will give no performance benefit. (Hmmm, this does make me wonder the origins of your handle "bus wrecker". ;-) )
Resources
To help survey system configurations factors, we usually use the Sun Explorer program (http://sunsolve.sun.com/explorer) to collect data from your system. From an Explorer output, we can check patch levels and screen for numerous common configuration issues.
To build binaries optimized for execution on SPARC, Sun has both the Sun Studio Compiler and GCC for SPARC systems (GCCfss); both easy to find from the Sun "Cool Tools" page at http://cooltools.sunsource.net. Several tools for analysis of performance are also cited on the Cool Tools page. While this web page emphasizes Sun CMT products, the tools are equally well-suited to all SPARC systems.
The Sun Studio Compilers page at http://developers.sun.com/sunstudio/ offers links to several other resources which may interest you.
All of these things are FREE and easily downloadable.
For x86/x64 systems, Sun bundles all the development tools in the freely-downloadable Solaris Express Developer's Edition (See http://developers.sun.com/solaris/downloads/index.jsp). I found the download and installation to be rather cumbersome and resource-intensive, but once installed, it puts everything at your fingertips on almost any suitably-large x86/x64 system. (FWIW, Solaris installation is undergoing an overhaul called "Project Caiman", and those details are available at http://www.opensolaris.org/os/project/caiman/).
Educated Guesses
A list of where the problem *might* lie would be far too extensive to list here. Nevertheless, I can offer a short list of "usual suspects" that would drive my initial analysis. My method is to exhaust my short list of hypotheses, then pass the problem along to someone smarter than me. Some "usual suspects" I can offer in this case include:
- exact OS version and patch level
- algorithm convergence criteria, and the possible relevance of rounding,
IEEE long temporaries, and compiler factors in this regard
- pathological cache stride
- compiler and compiler option selection
- page coloring issues
- link-time options
- use of optimized math libraries
Drill-Down Analysis
Whether or not my educated guesses pan out, between the many tools native to Solaris and the analysis tools in the Solaris developer toolkit, it's possible to drill-down to the lowest details of "why". It's the kind on thing that I personally consider to be "fun".
Wrapup
Once again - I'll offer to assist directly if a casual collaboration will work for you. If you can supply a repeatable test case, that would be the fastest route forward, but it's not the only path. Let's talk off-alias.
It would be a great outcome if this activity ultimately led to reasonably-optimized Solaris builds of Octave for both x86/x64 and SPARC being posted to http://www.gnu.org/software/octave/download.html !
Best regards,
-- Bob Sneed
# 3
Hi Bob and dweaver,
Thank you for your prompt replies.
I agree that this is a rather unfair comparison. I抳e definitely made too many assumptions to arrive to such a conclusion. (that Octave was slower on a V490 than a Matlab on Xeon)
Perhaps this post started off slightly on the wrong foot too. I think I got slightly over-excited and shouldn抰 have used phrases like 慴lasted to bits? I apologize if I抳e slightly mislead the audience to think that this post is based on comparisons.
Let抯 not go down the road of finding good comparisons. Good comparisons are sometimes difficult to find. We could not install Matlab on the sun machine (license issues) or Octave on the Xeon (operates on windows). As dweaver has pointed out, the list is endless.
This post should be just a simple example of a real world problem. One of the matlab scripts was running really slow and we needed to find an alternative solution to the problem. We just thought that by using Octave on our Sun machines, we should be able to achieve reasonable results.
Unfortunately, as both of you would know, we did not achieve that.
Perhaps the best way to resolve this problem would be go to 憃ff-line?and discuss ways and means to improve the performance of octave on SParc.
Feel free to contact me at julian@omniarray.com
Regards
Bus_wrecker
Ps-bus wrecker originated from a rather comical event that happened when.. hmm.. how do i put it.. err.. "young and stupid"?. I was 14, jumped into a stationary bus, released the parking brakes and it crashed into a nearby church.
# 4
Julian:OK. Let's chase it offline. Check your INBOX.If we come up with some news, we can return here to tell the world.Thanks,-- Bob