> [OpenSPARC] T1 supports TSO memory model.
> TSO allows later loads to bypass earlier stores.
Yes, these statements are true.
> But T1 is a single-issued processor; later loads can't bypass earlier stores.
> I think T1 is a nature Sequential Consistency processor other than a TSO processor.
No, neither of these statements are true.
The memory model(s) a processor implements (Sequential Consistency, TSO, RMO, etc) are independent of how many instructions it can issue per cycle.
Just because UltraSPARC T1 is a single-issue processor, one cannot assume that its memory operations are Sequentially Consistent.
Implemented memory models are also independent of the number of physical processor chips that may reside in the system. The existence of other microarchitectural features, such as store buffers, influence what memory models are implemented, even in single-processor-chip systems.
In fact, OpenSPARC T1 does not implement Sequential Consistency. It implements TSO for almost all operations. The only exceptions are block load and block store operations, which operate under the less-restrictive RMO memory model (regardless of the memory model chosen by the PSTATE.mm control field).
Thanks a lot.
I seem to know a little. Since memory model is independent of issue-width and the number of processors, the TSO, PSO and RMO may be implemented in the uniprocessor system.
I still have several problems. How to implement a TSO or any other weaker memory model? I eagerly know how to implement a memory model, for instance, in T1.
For example, following is a code slice. A and B are memory addresses.
1)Store A
2)Load A
3)Store B
This same code slice is executed simultaneously by two processors in the T1. When Store A in processor P1 and Store A in the processor P2 take place concurrently, How can two processors know the memory operations order (the order of the two store A )?
At the same time, the cache coherence is guaranteed by memory model (?), and the UltraSPARC architecture 2005 has introduced the synchronization mechanism addition with cache coherence and memory model. I want to know the relationship among cache coherence, synchronization mechanism and memory model.
Thanks again.
linuxworld
> I seem to know a little. Since memory model is
> independent of issue-width and the number of
> processors, the TSO, PSO and RMO may be implemented
> in the uniprocessor system.
Note, however, that in uniprocessor systems or single-threaded
applications, these weak memory models will appear to software
as though they were Sequential Consistency.
> I still have several problems. How to implement a TSO
> or any other weaker memory model? I eagerly know how
> to implement a memory model, for instance, in T1.
For example, the presence of store buffers (which allow subsequent
loads to perform before or concurrently with the buffered stores)
gives performance boost but it breaks Sequential Consistency, as
pointed out in David's reply.
> For example, following is a code slice. A and B are
> memory addresses.
> 1)Store A
> 2)Load A
> 3)Store B
> This same code slice is executed simultaneously by
> two processors in the T1. When Store A in processor
> P1 and Store A in the processor P2 take place
> concurrently, How can two processors know the memory
> operations order (the order of the two store A )?
This example is a "data race" where one of the two stores will
reach the memory subsystem first. The race resolution is arbitrary.
However, cache coherence guarantees that any processor
(strand) trying to observe location A will see the effect of the two
stores in the same order.
>
> At the same time, the cache coherence is guaranteed
> by memory model (?), and the UltraSPARC architecture
> 2005 has introduced the synchronization mechanism
> addition with cache coherence and memory model. I
> want to know the relationship among cache coherence,
> synchronization mechanism and memory model.
Generally speaking, cache coherence is only a per-location
guarantee (stores to each location may be observed in an
arbitrary order but all processors will observe their effect in the
same order), while a memory model explains the behavior over
the entire memory space across all locations.
Under TSO memory model, a load can bypass its preceding stores.
When this feature is undesirable (e.g., making it too difficult to write
correct code), a "MEMBAR #StoreLoad" instruction can be used
to prevent such reordering at that point, i.e., no loads can bypass
any stores preceding the MEMBAR.
Thanks.
SPAR V9 supports PSO memory model. PSO allows stores bypassing earlier stores (i.e, allows that later stores become visible out of program order). However, according my knowledge, stores must be committed in program order because stores change memory permanently. why can PSO relax between stores and stores?
Thanks again.
linuxworld.
Thanks.
One of function about LSQ (Load and Store Queue) is committing in-order. In PSO, how to hold the precise interrupt? According Computer Architecture: A Quantitative Approach, I remember that you must commit in order if you want to hold the precise interrupt in a uniprocessor system.
linuxworld.
The order in which instructions update the processor state (registers/condition codes) has little to do with the order in which instructions access memory. It is the former ordering that is required for precise traps (precise interrupts in Hennessy and Patterson).
Clearly, before a load can retire it must have accessed memory, but for the purposes of precise trapping, it could have accessed memory before an older memory-accessing instruction.
A store can be committed to memory after the store has retired. Typically, for normal memory locations, no trap is possible because the memory location is guaranteed to exist. Stores to I/O locations can still meet trap conditions, but such traps are not precise.
PaulL
On 08/14/06 02:26, sang-suan gam wrote:
> a little late ...
>
> in PSO, stores are issued in fifo order from the store buffers
> to the memory, only if they are written to the same memory
> location.
>
> if the stores are for different locations, PSO allows such
> writes to be issued out of order, which most of the times
> is a performance improvement.
>
> sometimes, we'd want to ensure that even stores to different
> locations are issued in orders.
>
> in times like this, we'd use a MEMBAR.
>
> in your code example, store-B may be issued before store-A
> in PSO.
>
> > For example, following is a code slice. A and B are
> > > memory addresses.
> > > 1) Store A
> > > 2) Load A
> > > 3) Store B
> > This same code slice is executed simultaneously by
> > > two processors in the T1. When Store A in processor
> > > P1 and Store A in the processor P2 take place
> > > concurrently, How can two processors know the memory
> > > operations order (the order of the two store A )?
>
> memory orders do not control which stores/loads from which
> cpu is executed first. if programs require determinacy,
> the programs must be written using synchronization primitives
> like mutexes, condition variables and semaphores.
>
> memory order determines what order loads/stores from ONE particular
> cpu is issued to the next memory hierarchy - cache coherency
> domain.
>
> cache coherency protocols works ACROSS cpus to ensure that
> the latest value for a memory location is installed in
> the coherency domain.
>
> cheers,
> sam
MEMBAR #Sync provides a barrier such that any instructions issued
after the membar, only execute after any exceptions due to earlier
instructions are visible in the cpu's program state.
the "traps" and "registers" section in the sparc v9 manual has a
*very* detailed description of what program state is.
> the "traps" and "registers" sections in the sparc v9 manual have
> *very* detailed descriptions of what program state is.
I would just modify this to recommend that, instead of the 1994 SPARC V9 book, you instead consult the current UltraSPARC Architecture specification (available from http://opensparc-t1.sunsource.net/). It provides much more up to date and comprehensive "program state" information (esp with respect to OpenSPARC T1) than the SPARC V9 book.