Niagara 2 - Fetch and Pick stages question.

Hi,

Can anyone point me to any more publications about the Niagara 2

architecture. In particular I am interested in understanding the Fetch

and Pick Stages. The presentations by Robert Golla and Greg Grohoski

mention some aspects of the design, but I have a few unanswered questions.

In particular I'd be grateful if someone could explain to me exactly

what is fetched during the fetch stage. Greg's presentation states

"Fetch up to 4 instructions from I$"

and then gives various states that threads can be in when fetched. I am

presuming this means that in any one cycle of the fetch stage up to 4

threads' instructions are independently fetched from the I$ unit, am I

correct? If so, is the I$ 4-way multi-ported?

I'd really like to read more about this architecture, so if anyone can

point me in the right direction I'd be really grateful.

Thanks

[945 byte] By [matt.horsnell] at [2007-11-26 11:18:05]
# 1

> Can anyone point me to any more publications about the Niagara 2 [micro]architecture?

Niagara 2 is not yet a shipping product, so you won't necessarily be able to find every detail about it published, yet :-).

Nonetheless, a couple of fairly detailed public disclosures about Niagara 2 have been presented.According to those presentations, Niagara 2 doubles the overall throughput of UltraSPARC T1, gives 10x floating-point throughput, and 5x single-thread floating-point performance. The processor is so solid that the first chips back from the foundry booted Solaris in just 5 days.

From the questions you ask, I suspect you've already seen at least one of these presentations, but for the benefit of those reading this who'd like to learn more about Niagara 2, check out these slide sets:

Hot Chips, Greg Grohoski:

http://www.opensparc.net/publications/presentations/niagara-2-a-highly-threaded -server-on-a-chip.html

Fall 2006 Microprocessor Forum, Robert Golla:

http://www.opensparc.net/publications/presentations/fallmpf-06-niagara2-a-highl y-threaded-server-on-a-chip.html

Perhaps other community members have seen other public information or can provide an answer to your specific question.

dweaver at 2007-7-7 3:33:22 > top of Java-index,Open Source Technologies,OpenSPARC...
# 2

David,

Thanks for the response, I had looked at those presentations and they do make for very interesting reading. For everyones information there is also a real nice podcast with Rick Hetherington (chief architect) and Gary Peterson (director) of the Niagara program, talking in general terms about Niagara 2 "Server on a chip".

http://www.podtech.net/home/technology/1293/niagara-2-server-on-a-chip

From my understanding, and this is just a guess from looking at the presentations and the throughput figures, there are really only two options for the I$. Either it is multi-banked with concurrent accesses to distinct banks allowed (banked on the line size of 64 Bytes), or the I$ is physically replicated 4 times, allowing four concurrent accesses. The former seems to be more likely, as the latter is largely wasting space (something which is not in keeping with the Niagara aims).

The D$ seems as though it remains relatively unchanged, but rather than being driven directly from the pipeline, it looks like some intermediate buffering arbitrates accesses when both integer pipes attempt access of the D$ at the same time. Using the rule of thumb that RISC instruction mixes contain 20% loads and 10% stores (30% of the time the D$ is accessed by a given pipeline), and given that the likelihood of the these happening concurrently the effective overlap is nearer 10%. So it would seem that conflicts would happen relatively infrequently, also the clever frontend scheduling (long-latent operations being considered low priority) will probably mean that the pipeline won't feel the extra couple of cycles delay if the access conflicts, as other threads will already be switched in.

Another interesting presentation is the PARC one from Mark Tremblay, where not only does he outline "Throughput Computing" but he talks about the work on the upcoming Rock processor, including the Runtime Prefetching threads, very cool!

Audio: http://www.parc.xerox.com/events/forum/media/v1128.mp3

Video: mms://216.93.180.194/parc_forum/v1128.wmv

matthorsnell at 2007-7-7 3:33:22 > top of Java-index,Open Source Technologies,OpenSPARC...