A key performance attribute of a protocol is its scalability with respect to the number of clients that can be supported by the server. If the network paths or I/O channels are not the bottleneck, the scalability is determined by the server CPU utilization for a particular benchmark.
Table 9 depicts the percentile of the server CPU utilization
reported every 2 seconds by vmstat for the various benchmarks. The table shows that,
the server utilization for iSCSI is lower than that of NFS. The server utilization
is governed by the processing path and the amount of processing for each request.
The lower utilization of iSCSI can be attributed to the smaller processing path seen
by iSCSI requests. In case of iSCSI, a block read or write request at the server
traverses through the network layer, the SCSI server layer, and the low-level block device
driver. In case of NFS, an RPC call received by the server traverses through the
network layer, the NFS server layer, the VFS layer, the local file system, the block layer,
and the low-level block device driver. Our measurements indicate
that the server processing path for NFS requests is twice that of iSCSI requests.
This is confirmed by the server CPU utilization measurements for data intensive TPC-C
and TPC-H benchmarks. In these benchmarks, the server CPU utilization in
for NFS is twice that of iSCSI.
The difference is exacerbated for meta-data intensive workloads. A NFS request that triggers a meta-data lookup at the server can greatly increase the processing path--meta-data reads require multiple traversals of the VFS layer, the file system, the block layer and the block device driver. The number of traversals depends on the degree of meta-data caching in the NFS server. The increased processing path explains the large disparity in the observed CPU utilizations for PostMark. The PostMark benchmark tends to defeat the meta-data caching on the NFS server because of the random nature of transaction selection. This causes the server CPU utilization to increase significantly since multiple block reads may be needed to satisfy a single NFS data read.
While the iSCSI protocol demonstrates a better profile in server CPU utilization statistics, it is worthwhile to investigate the effect of these two protocols on client CPU utilization. If the client CPU utilization of one protocol has a better profile than that of the other protocol, then the first protocol will be able to scale to a larger number of servers per client.
Table 10 depicts the percentile of the client CPU utilization
reported every 2 seconds by vmstat for the various benchmarks.
For the data-intensive TPC-C and TPC-H benchmarks, the clients are CPU saturated
for both the NFS and iSCSI protocols and thus there is no difference in
the client CPU utilizations for these macro-benchmarks.
However, for the meta-data intensive PostMark benchmark, the NFS client
CPU utilization is an order of magnitude lower than that of iSCSI. This
is not surprising because the bulk of the meta-data processing is done
at the server in the case of NFS while the reverse is true in the case of
the iSCSI protocol.