Our approach assumes an efficient implementation of DCAS functionality. In this section, we briefly outline an instruction set extension to the load-linked/ store-conditional) instructions to support DCAS. (A software implementation is discussed in Section 6.1.) With a processor supporting load-linked (LL) and store-conditional (SC) instructions, add two instructions:
DCAS is then implemented by the instruction sequence shown in Figure 3 (using R4000 instructions in addition to the LL/SC(P) instructions).
  
Figure 3:  DCAS Implementation using  LL/ SC  and  LLP/ SCP .
Success or failure of  SC (and thus of the DCAS operation) is returned
in U1 or whatever general register holds the argument to  SC.
1 denotes success, 0 failure.
If the next instruction tries to read U1,
the hardware interlocks (as it already does
for  LL/SC) if the result of  SC is not already in U1.
The LL and LLP instructions in lines 1 and 2 ``link'' the  loads
with the respective  stores issued by the following SC and SCP
instructions.  Lines 3 and 4 verify that  (T0) and  (T1)
contain  V0 and  V1, respectively.  The SCP and SC in lines
5 and 6 are conditional.  They will not issue the  stores unless
 (T0) and  (T1) have been unchanged since lines 1 and 2.
This guarantees that the results of CAS in lines 3 and 4 are still
valid at line 6, or else the SC fails.  Further, the  store
issued by a successful SCP is buffered pending a successful SC.  Thus,
SC in line 6 writes  U1 and  U0 to  (T1) and  (T0)
atomically with the comparison to  V0 and  V1
.
We have worked out a detailed design for the implementation of these two instructions in a RISC processor such as the R4000 but the description is omitted for brevity.