Implementing PAVM based on the above approaches is not easy on
modern operating systems, where
virtual memory (VM) is extensively used.
Under the VM abstraction, all processes and most of the OS only need
to be aware of their own virtual address spaces, and can be totally
oblivious to the actual physical pages used. Effectively,
the VM decouples page allocation requests from the underlying physical
page allocator, hiding much of the complexities of memory
management from the higher layers. Similarly, the decoupling of layers
works in the other direction as well -- the physical page allocator does
not distinguish from which process a page request originates, and
simply returns a random physical page, treating all memory uniformly.
When performing power management on memory nodes, however, we cannot
treat all memory as equivalent, since accessing a node in low-power
state will incur increased latencies and overheads, and the physical
memory address of allocated pages critically affects each process's
energy footprint.
Therefore, we need to eliminate this decoupling and make the
page allocator conscious of the
process requesting pages, so it can nonuniformly allocate
pages based on
to minimize
for each process i.
This unequal treatment of sections of memory due to latencies and overheads for access is not limited to power-managed memory. Rather, it is a distinguishing characteristic of Non-Uniform Memory Access (NUMA) architectures, where there is a distinction between low-latency local memory and high-latency remote memory. In a traditional NUMA system, the notion of a node is more general than what we defined previously and can encompass a set of processors, memory pools, and I/O buses. The physical location of the pages used by a process is critical to its performance since intra- and inter- node memory access times can differ by a few orders of magnitude. Therefore, a strong emphasis has been placed on allocating and keeping the working set of a process localized to the local node.
In this work, by considering a node simply as a section of memory with a
single common access time, for which the power mode can be set
independently of other nodes, we can employ a NUMA management layer to
simplify the nonuniform treatment of the physical memory.
With a NUMA layer in place below the VM system, physical
memory is partitioned into multiple nodes.
Each node has a separate physical page allocator, to which page allocation
requests are redirected by the NUMA layer.
The VM is modified such that, when it requests a page on
behalf of process i, it passes a hint (e.g.,
) to the NUMA layer indicating the preferred node(s) from
which the physical page should be allocated. If this optional
hint is given, the
NUMA layer simply invokes the physical page allocator that corresponds
to the hinted node. If the allocation fails,
must be
expanded as discussed previously.
By using a NUMA layer, we can implement PAVM with preferential node
allocation without having to re-implement complex low-level
physical page allocators.