There has been a considerable amount of work on the enhancement of system performance through the addition of cryptographic hardware [2]. This early work was characterized by its focus on the hardware accelerator rather than its implications for overall system performance. [24] began examining cryptographic subsystem issues in the context of securing high-speed networks, and observed that the bus-attached cards would be limited by bus-sharing with a network adapter on systems with a single I/O bus. A second issue pointed out in that time frame [20] was the cost of system calls, and a third [21,23,7,11] the cost of buffer copying. These issues are still with us, and continue to require aggressive design to reduce their impacts.
[25] describes an API to cryptographic functions, the main purpose of which is to separate cryptographic libraries from applications, thus allowing independent development. Our service API is similar at a high level, although several differences were dictated by the need to support actual hardware accelerators and allow it to be used efficiently by protocols such as IPsec and SSL, as we discuss in Section 3. Other work includes the Microsoft CryptoAPI [17], GSS-API [16] and IDUP-GSS-API [1], PKCS #11 [14], SSAPI [26], and the CDSA [19]. These are primarily intended for use by applications that also require authentication, authorization, key management and other higher level security services. Our work focuses on low-level cryptographic operations, providing a simple abstraction layer that does not significantly impact performance, compared to a device-specific approach.
[10] describes an open-source cryptographic coprocessor, focusing on protecting keys and other sensitive information from tampering by unauthorized applications. The author extends the cryptlib library to communicate with the co-processor. While he discusses several options for hardware acceleration and identifies some potential performance bottlenecks, it is mostly a qualitative analysis. That work is extended in [9], which presents a comprehensive cryptographic security architecture, again focusing primarily on preserving the confidentiality of users' (and applications') cryptographic keys. We are interested in a much simpler problem: how to accelerate cryptographic operations in a general-purpose operating system using hardware available in the market and with minimal modifications to the kernel, libraries, and applications.
NetBSD uses the dmover facility, which provides an interface to hardware-assisted data movers. This can be used to copy data from one location in memory to another, clear a region of memory, fill a region of memory with a pattern, and perform simple operations on multiple regions of memory, such as XOR, without intervention by the CPU.