In this section, we describe how nettimer implements the algorithms described in the previous section. The issues we address are how to define flows, where to take measurements, and how to distribute measurements.
In Section 2.1, the packet pair property refers to two packets from the same source to the same destination. For nettimer, we interpret this flow to be defined by a (source IP address, destination IP address) tuple (network level flow), but we could also have interpreted it to be defined by a (source IP address, source port number, destination IP address, source port number) tuple (transport level flow). The advantage of using transport level flows is that they can penetrate Network Address Translation (NAT) gateways. The advantage of network level flows is that we can aggregate the traffic of multiple transport level flows (e.g. TCP connections) so that we have more samples to work with. We chose network level flows because when we started implementing nettimer, NAT gateways were not widespread while popular WWW browsers would open several short TCP connections with servers. We describe a possible solution to this problem in Section 5.
In Section 2.1, we assume that we have the transmission and arrival times of packets. In practice, this requires deploying measurement software at both the sender and the receiver, which may be difficult. In this section, we describe how we mitigate this limitation in nettimer and the trade-offs of doing so.
In the ideal case, we can deploy measurement software at both the sender and the receiver. Using this technique, called Receiver Based Packet Pair (RBPP) [Pax97], nettimer can employ all of the filtering algorithms described in Section 2.2 because we have both the transmission times and reception times. However, in addition to deploying measurement software at both the sender and the receiver, nettimer also needs an architecture to distribute the measurements to interested hosts (described in Section 3.3). We show in Section 4 that RBPP is the most accurate technique.
When we can only deploy software at one host, we measure the bandwidth from that host to any other host using Sender Based Packet Pair (SBPP) [Pax97] or from any other host to the measurement host using Receiver Only Packet Pair (ROPP) [LB99].
SBPP works by using the arrival times of transport- or application-level acknowledgements instead of the arrival times of the packets themselves. One application of this technique would be to deploy measurement software at a server and measure the bandwidth from the server to clients where software could not be deployed. The issues with this technique are 1) transport- or application-level information, 2) non-per-packet acknowledgements, and 3) susceptibility to reverse path cross traffic. nettimer uses transport- or application-level information to match acknowledgements to packets. Currently it only implements this functionality for TCP. Unfortunately, TCP does not have a strict per-packet acknowledgement policy. It only acks every other packet or packets out of order. Furthermore, it sometimes delays acks. Finally, the acks could be delayed by cross traffic on the reverse path, causing more noise for the filtering algorithm to deal with. We show in Section 4 that the non-per-packet acknowledgements make SBPP much less accurate that the other packet pair techniques. We describe a solution to this problem in Section 5.
ROPP works by using only the arrival times of packets. This prevents us from using some of the filtering algorithms described in Section 2.2 because we can no longer calculate the sent bandwidth. One application of this technique would be to deploy measurement software at a client and measure the bandwidth from servers that cannot be modified to the client. We show in Section 4 that in some cases where there is little cross traffic, ROPP is close in accuracy to RBPP.
In this section, we describe our architecture to do distributed packet capture. The nettimer tool uses this architecture to measure both transmission and arrival times of packets in the Internet. We first explain our approach and then describe our implementation.
Our approach is to distinguish between packet capture servers and packet capture clients. The packet capture servers capture packet headers and then distribute them to the clients. The servers do no calculations. The clients receive the packet headers and perform performance calculations and filtering. This allows flexibility in where the packet capture is done and where the calculation is done.
Another possible approach is to do more calculation at the packet capture hosts [MJ98]. The advantage of approach is that packet capture hosts do not have to consume bandwidth by distributing packet headers.
The advantages of separating the packet capture and performance calculation are 1) reducing the CPU burden of the packet capture hosts, 2) gaining more flexibility in the kinds of performance calculations done, and 3) reducing the amount of code that has to run with root privileges. By doing the performance calculation only at the packet capture clients, the servers only capture packets and distribute them to clients. This is especially important if the packet capture server receives packets at a high rate, the packet capture server is collocated with other servers (e.g. a web server), and/or the performance calculation consumes many CPU cycles (as is the case with the filtering algorithm described in Section 2.2). Another advantage is that clients have the flexibility to change their performance calculation code without modifying the packet capture servers. This also avoids the possible security problems of allowing client code to be run on the server. Finally, some operating systems (e.g. Linux) require that packet capture code run with root privileges. By separating the client and server code, only the server runs with root privilege while the client can run as a normal user.
Our implementation of distributed packet capture is the libdpcap library. It is built on top of the libpcap library [MJ93]. As a result, the nettimer tool can measure live in the Internet or from tcpdump traces.
To start a libdpcap server, the application specifies the parameters send_thresh, send_interval, filter_cmd, and cap_len. send_thresh is the number of bytes of packet headers the server will buffer before sending them to the client. This should usually be at least the TCP maximum segment size so fewer less than full size packet report packets will sent. send_interval is the amount of time to wait before sending the buffered packet headers. This prevents packet headers from languishing at the server waiting for enough data to exceed send_thresh. The server sends the buffer when send_interval or send_thresh is exceeded. The filter_cmd specifies which packets should be captured by this server using the libpcap filter language. This can cut down on the amount of unnecessary data sent to the clients. For example, to capture only TCP packets between cs.stanford.edu and eecs.harvard.edu the filter_cmd would be ``host cs.stanford.edu and host harvard.stanford.edu and TCP''. cap_len specifies how much of each packet to capture.
To start a libdpcap client, the application specifies a set of servers to connect to and its own filter_cmd. The client sends this filter_cmd to the servers with whom it connects. This further restrict the types of packet headers that the client receives.
After a client connects to a server, the server responds with its cap_len and its clock resolution. Different machines and operating systems have different clock resolutions for captured packets. For example Linux < 2.2.0 had a resolution of 10ms, while Linux > = 2.2.0 has a resolution < 20 microseconds, almost a thousand times difference. This can make a significant difference in the accuracy of a calculation, so the server reports this clock resolution to the client.
To calculate the bandwidth consumed by the packet reports that the distributed packet capture server sends to its clients, we start with the size of each report: cap_len + sizeof(timestamp) (8 bytes) + sizeof(cap_len) (2 bytes) + sizeof(flags) (2 bytes). For TCP traffic, nettimer needs at least 40 bytes of packet header. In addition, link level headers consume some variable amount of space. To be safe, we set the capture length to 60 bytes, so each libdpcap packet report consumes 72 bytes. 20 of these headers fit in a 1460 byte TCP payload, so the total overhead is approximately 1500 bytes / 20 * 1500 = 5.00%. On a heavily loaded network, this could be a problem. However, if we are only interested in a pre-determined subset of the traffic, we can use the packet filter to reduce the number of packet reports. We experimentally verify this cost in Section 4.2.5 and describe some other ways to reduce it in Section 5.
Kevin Lai 2001-01-29