Symmetric-algorithm (e.g., DES, AES, MD5, compression algorithms, etc.) operations are built around the concept of the session, since such algorithms are typically used for bulk-data processing, and we wanted to take advantage of the session-caching features available in many accelerators. Asymmetric algorithms are implemented as individual operations: no session caching is performed. Session creation and teardown are synchronous operations.
The producer API allows a driver to register with the OCF the various algorithms it supports and any other device characteristics ( e.g., support for algorithm chaining, built-in random number generation, etc.). The device driver also registers four callback functions that the OCF uses to initialize, use, and teardown symmetric-algorithm sessions, and to issue asymmetric-algorithm requests. The drivers can also selectively de-register algorithms, or remove themselves from the OCF (e.g., a PCMCIA card that is ejected). Any sessions using the defunct driver (or algorithm) are migrated to other cards on-demand (i.e., as the next request for that session arrives). Registration and de-registration can occur at any time; typical device drivers do so at system initialization time. Drivers call the crypto_done() and crypto_kdone() routines to notify the OCF of completed symmetric and asymmetric requests, respectively. A brief description of the API is given in Appendix A.
In addition to any hardware drivers, a software-crypto pseudo-driver registers a number of symmetric-key algorithms when the system boots. The pseudo-driver acts as a last-resort provider of crypto services; any suitable hardware accelerator will be treated preferably. However, the kernel does not implement asymmetric algorithms in software, for performance reasons; we shall see in Section 4.2 how we handle these. Using a generic API for crypto drivers allows us to easily add support for new cards. We briefly discuss these drivers in Section 3.1.
To use the OCF, consumers first create a session with the OCF using crypto_newsession(), specifying the algorithm(s) to use, mode of operation (e.g., CBC, HMAC, etc.), cryptographic keys, initialization vectors, and number of rounds (for variable-round algorithms). The OCF supports algorithm-chaining, i.e., performing encryption and integrity-protection in one operation. Such combined operations are used by practically all data-transfer security protocols. At session-creation time, the OCF determines which card to use based on its capabilities and creates a session by calling its newsession method, provided at device-registration time. When the session is not needed, crypto_freesession() frees any allocated resources.
For the actual encryption/decryption, consumers use crypto_dispatch(). The arguments to this include the data to be processed, a copy of the parameters used to initialize the session, consumer-provided opaque data, and a callback function. The data can be provided in the form of mbufs (linked lists of data buffers, used by the network subsystem to store packets) or as a collection of potentially non-contiguous memory blocks, called uio. The case of a single contiguous data buffer is handled as a uio. Although mbufs are also a special case of uio, we added special support to allow for some processing optimizations when using software cryptography. Furthermore, the issuer of a request can specify whether encryption should be done in-place, or the encrypted data must be returned on a separate buffer. Various offsets indicate where to start and end the encryption, where to place the message authentication code (MAC), and where to find the initialization vector (if already present on the buffer) or where to write it on the output buffer.
The request is queued and crypto_dispatch() immediately returns to the consumer. The crypto kernel thread is periodically invoked by the scheduler and dispatches all pending requests to the appropriate producers. It also handles all completed requests, by calling the specified callback functions. It then returns to sleep, waiting for more requests. As a result of the OpenBSD kernel architecture (common in most non-SMP kernels), the thread is not preemptable by user processes, although hardware interrupts are still handled. Currently, the thread must operate at a high priority to avoid synchronization problems. When using the software pseudo-driver, this can cause significant latency in application scheduling and in low-priority kernel operations, although the same problem manifested before the migration to OCF, when encryption was done in-band with IPsec packet processing.
Once the request is processed, the crypto thread calls the consumer-supplied callback routine. If an error has occurred, the callback is responsible for any corrective action. Session migration is implemented by re-creating the session using the initial parameters to crypto_newsession(), which accompany all requests as we already mentioned. The error EAGAIN is indicated to the callback routine, which re-issues the request after recording the new session number to be used so that subsequent requests are correctly routed. Including the initialization data in each request also allows us to easily integrate cards that do not support the concept of session: the driver simply passes all necessary information (data, algorithm descriptions, and keys) to the card with each request. The opaque data are simply passed back to the consumer unmodified by the OCF; they are used to maintain any additional information for the consumer that is relevant to the request. We shall see an example in Section 4.1.
Asymmetric operations are handled similarly, albeit without support for the concept of session. The parameters in this case include an array of parameters, containing the algorithm-specific big-integers.
When multiple producers implement the same algorithms, the OCF can load-balance sessions across them. This is currently implemented by simply keeping track of the number of sessions active on each producer. At session setup, the OCF picks the producer with the smallest number of active sessions. The software pseudo-driver is currently never used in load-balancing. We evaluate the effectiveness of this simple scheme in Section 5.4. We discuss possible future improvements in Section 6.4.