Next:Kernel
Support for DAFS Up:Direct
Access File Systems Previous:Server
Design and Implementation
Optimistic DAFS
In DAFS direct read and write operations, the client always uses
an RPC to communicate the file access request along with memory references
to client buffers that will be the source or target of a server-issued
RDMA transfer. The cost associated with always having to do a file access
RPC is manifested as unnecessarily high latency for small accesses from
server memory. A way to reduce this latency is to allow clients to access
the server file and VM cache directly rather than having to go each time
through the server vnode interface via a file access RPC.
Optimistic DAFS [14]
improves on the existing DAFS specification by reducing the number of file
access RPC operations needed to initiate file I/O and replacing them with
memory accesses using client-issued RDMA. Memory references to server buffers
are given out to clients or other servers that maintain cache directories,
and they are allowed to use those references to directly issue RDMA operations
with server memory. To build cache directories, the server returns to the
client a description of buffer locations in its VM cache (we assume a unified
VM and file cache, as in FreeBSD). These buffer descriptions are returned
either as a response to specific queries (i.e. client asks: ``give me
the locations of all your resident pages associated with file foo''),
or piggybacked in the response to a read or write request (i.e. server
responds: ``here's the data you asked for, and by the way, these are
their memory locations that you can directly use in the future'').
In Optimistic DAFS, clients use remote memory references found in their
cache directories but accesses succeed only when directory entries have
not become stale, for example as a result of actions of the server pageout
daemon. There is no explicit notification to invalidate remote memory references
previously given out on the network. Instead, remote memory access exceptions
[14]
thrown by the target NIC and caught by the initiator NIC can be used to
discover invalid references and switch to the slower access path using
file access RPC.
Maintaining the NIC memory management unit in the case where RDMA can
be remotely initiated by a client at any time is tricky and needs special
NIC and OS support. Section 4.3
describes the design of our forthcoming implementation that views the NIC
as another processor in an asymmetric multiprocessor system and is based
on the following design choices:
-
To make sure that exported pages have valid NIC mappings for as long as
they are resident in physical memory and that these mappings are invalidated
when pages are swaped to disk, paging activity on-the-fly adds or invalidates
NIC mappings.
-
Being able to initiate DMA to and from main memory, the NIC (or the driver,
in the absence of NIC support) has to synchronize and integrate with the
VM system. To do that, it has to be able to manipulate lock, reference,
and dirty bits of vm_pages in main memory.
-
To manage NIC mappings in servers with enormous physical memory sizes,
the NIC address translation table is viewed as a cache of translations
(i.e. a TLB). Translation misses are handled by the NIC (or the driver,
in the absence of NIC support) and require access to page tables in main
memory.
Previous research [21,24]
has looked at memory management of network interfaces but has not focused
on kernel modifications or virtual memory system support. In Section 4.3
we address such support for the FreeBSD VM system. Finally, Optimistic
DAFS requires maintainance of a directory on file clients (in user-space)
and on other servers (in the kernel).
Next:Kernel
Support for DAFS Up:Direct
Access File Systems Previous:Server
Design and Implementation
Kostas Magoutis 2001-12-03