When a node is being throttled, up to three pieces of information are added to the trace for each I/O. First, the compute time since the last I/O is determined (using Approach 1 or 2) and a COMPUTE(<seconds>) call is added to the trace. Second, the I/O operation and its arguments are added. Third, signaling information is added, as per the I/O sampling period.
The I/O sampling period determines how frequently the causality engine delays
I/O to check for dependencies (e.g., a period of 1 indicates that
every I/O is delayed) and therefore determines how
many data dependencies are discovered. In general, if the
sampling period is , the causality engine will discover dependencies
within
operations of the true dependency. Because the
sampling period determines the rate of throttling, too large a sampling
period can also affect the computation calculation. In these cases,
Approach 2 (Section 3.2.2) is preferred.
When an I/O is being delayed, the causality engine delays issuing the I/O until all unthrottled nodes
either exit or block (i.e., a dependency has been found).
A remote procedure
call is sent from the causality engine of the throttled node to a watchdog process on each
unthrottled node to make this determination; some nodes may have exited, others may be blocked.
If a node has exited, then it is not dependent on
the delayed I/O.
Otherwise, the throttled node adds a
SIGNAL(<unthrottled node id>) to its trace, and
the unthrottled node adds a corresponding WAIT(<throttled node id>) call to its trace.
After the throttled node has received a reply from
all of the watchdogs (one per unthrottled node), the I/O is issued.
Algorithm shows the pseudocode.
Of course, delaying I/O in this manner can produce indirect dependencies. For example, referring back to Figure 3, a sampling period of 1 will indicate that the open() call for node 1 is dependent on each I/O from node 0; namely, the open(), the two write() calls, and the close() -- and the traces will be annotated as such to reflect this. However, the only signal needed is that following the close() operation, and the redundant SIGNAL() and WAIT() calls can be easily removed as a preprocessing step to trace replay. The indirect dependencies that cannot be removed are those due to transitive relationships. For example, if node 2 is dependent on node 1, and node 1 on node 0, the causality engine will detect the indirect dependency between nodes 0 and 2. Although these transitive dependencies add additional SIGNAL() and WAIT() calls to the traces, they never force a node to block unnecessarily.
As to selecting the proper sampling period, this depends on the application and storage system. Some workloads and storage systems may be more sensitive to changes in inter-node synchronization than others, so no one sampling period should be expected to work best for all. An iterative approach for determining the proper sampling period is presented in Section 5.