To understand how load affects response time, we measure latencies at
various requests rates. Each server's maximum capacity is determined
by having all clients issue requests in an infinite-demand
(saturation) model, which is defined as load level of 1, and then
relative rates are reported as load fractions relative to the infinite
demand capacity of each server. This process simplifies comparison
across servers, though it may bias toward servers with low
capacity. Response time is measured by recording the wall-clock time
between the client starting the HTTP request and receiving the last
byte of the response. We normally report mean response time, but we
note that it can hide the details of the latency profiles, especially
under workloads with widely-varying request sizes. So, in addition to
mean response time, we also present the ,
(median)
and
percentiles of the latency distribution. Where
appropriate, we also provide the cumulative distribution function
(CDF) of the client-perceived latencies.