We implemented the setrange version first because it was easy to understand and fast for small transactions since it can minimize the amount of redo records. However, we found a number of disadvantages when we ran bigger transactions.
trans_setrange
manually before
modifying a database region. Forgetting to make this call could
result in unrecorded the changes to the database.
trans_setrange
is significant in big
transactions.
When a transaction makes many trans_setrange calls,
call overhead becomes sizeable. Note that this problem is less
serious in systems like RVM [rvm] and Vista [vista], which
implement setrange as a library.
A related problem is that setrange does not work well with the steal buffer management policy, which is essential to run large transactions. Because ranges must be remembered in memory until a transaction commits, it is difficult to evict pages in the middle of a transaction.
Page grain logging can limit memory consumption regardless of the amount of modifications: all modifications are recorded in memory object pages, and pages can be purged. It is less error prone than setrange, because it does not require programmer cooperation. Another advantage is that it can amortize the write detection cost when many bytes are updated in a page, because detection is required only once. Thus, it is faster than setrange when many bytes are modified per page.
The problem with this approach is the log can grow quite large. For each modified page, log records twice as large as the page size are generated (one for an undo record, one for a redo record). This is wasteful for two reasons. First, it generates undo and redo records for the whole page even if only a single byte is modified on the page. Second, page grain logging blindly generates undo records even for transactions small enough not to require paging, in which case undo records are not needed [gray].
Page diffing combines the advantages of setrange and page grain logging. It shares all the advantages of page grain logging. In addition, it can minimize log size by computing page diffs.
However, page diffing introduces overhead that did not exist in earlier versions. One source of overhead is the page diffing itself. Page diffing needs to walk over two pages and write out differences to another memory region. Not only is this procedure slow, but it retards other procedures by contaminating the CPU cache. Another source of overhead is the memory pressure imposed by shadow pages. In the worst case, one in which all accessed pages are modified, the effective memory size is halved. Thus, the system will have more paging activities.