Fortunately, MP-capable CPUs provide two mechanisms to deal with these problems: atomic operations and memory barriers.
An atomic operation is any operation that a CPU can perform such that all results will be made visible to each CPU at the same time and whose operation is safe from interference by other CPUs. For example, reading or writing a word of memory is an atomic operation. Unfortunately, reading and writing are only of limited usefulness alone as atomic operations. The most useful atomic operations allow modifying a value by both reading the value, modifying it, and writing it as a single atomic change. The details of FreeBSD's atomic operation API can be found in the atomic manual page [Atomic]. A more detailed explanation of how atomic operations work can be found in Section 8.3 of [Schimmel94].
Atomic operations alone are not very useful. An atomic operation can only modify one variable. If one needs to read a variable and then make a decision based on the value of that variable, the value may change after the read, thus rendering the decision invalid. For this reason, atomic operations are best used as building blocks for higher level synchronization primitives or for noncritical statistics.
Many modern CPUs include the ability to reorder instruction streams to increase performance [Intel00,Schimmel94,Mauro01]. On a UP machine, the CPU still operates correctly so long as dependencies are satisfied by either extra logic on the CPU or hints in the instruction stream. On a SMP machine, other CPUs may be operating under different dependencies, thus the data they see may be incorrect. The solution is to use memory barriers to control the order in which memory is accessed. This can be used to establish a common set of dependencies among all CPUs. An explanation of using store barriers in unlock operations can be found in Section 13.5 of [Schimmel94].
In FreeBSD, memory barriers are provided via the atomic operations API. The API is modeled on the memory barriers provided on the IA64 CPU which are described in Section 4.4.7 of [Intel00]. The API include two types of barriers: acquire and release. An acquire barrier guarantees that the current atomic operation will complete before any following memory operations. This type of barrier is used when acquiring a lock to guarantee that the lock is acquired before any protected operations are performed. A release barrier guarantees that all preceding memory operations will be completed and the results visible before the current atomic operation completes. As a result, all protected operations will only occur while the lock is held. This allows a dependency to be established between a lock and the data it protects.