SSM heals itself in the presence of a memory fault; performance and throughput is unaffected, and SSM recovers from the fault.
Using ptrace(), we monitor a child process and change its memory
contents. In this benchmark,
, data size is 8KB, and we
increase t to 100ms to account for the slowdown of bricks using
ptrace. Figure 10 shows the results of injecting a
bitflip in the area of physical memory where the stack pointer is
held. The fault is injected at time 14; the brick crashes
immediately. The lightened section of figure 10
(time 14-23) is the time during which only five bricks are running.
At time 23, Pinpoint detects that the brick has stopped sending
heartbeats and should be restarted, and restarts the brick; the system
tolerates the fault and successfully recovers from it.