Check out the new USENIX Web site. next up previous
Next: Description of flow slices Up: The Power of Slicing Previous: Introduction

Related work


Table 1: Sampled NetFlow, Adaptive NetFlow and Flow Slices differ in the types of measurements they support, in how they adapt to different traffic mixes, and in their resource consumption (memory usage and reporting traffic).
Issue Sampled NetFlow Adaptive NetFlow Flow Slices
Memory usage Variable Fixed Fixed
Volume of flow data reported Variable Fixed Fixed
Behavior under DDoS with spoofed sources Panicky flow Reduction in Small reduction
and other traffic mixes with many flows expiration accuracy in accuracy
Estimates of traffic in small time bins Less accurate Accurate Less accurate
Reporting overhead when using small bins Unaffected Large increase Unaffected
Lifetime of flow record in router memory Min (active timeout, Bin length Min (slice length,
  flow length +   flow length +
  inactivity timeout)   inactivity timeout)
Resource usage at end of time bin N/A Reporting spike or N/A
    extra memory  
Processing intensive tasks Counting Counting and Counting
    renormalization  
Counting TCP flow arrivals (using SYNs) Yes Yes Yes
Counting all active flows No Separate flow Yes
    counting extension  
Counting all active flows at high speeds No Hardware flow No
    counting extension  


NetFlow [17], first implemented in Cisco routers, is the most widely used flow measurement solution today. Routers maintain flow records collecting various bits of information. Flows are identified by fields present in the header of every packet: source and destination IP address, protocol, source and destination port, and type of service bits. The flow record keeps information such as the number of packets in the flow, the (total) number of bytes in those packets, the timestamp of the first and last packet, and protocol flag information such as whether any of those packets had the SYN flag set. NetFlow uses four rules to decide when to remove a flow record from router memory and report it to the collection station: 1) when TCP flags (FIN or RST) indicate flow termination, 2) 15 seconds (configurable ``inactive timeout'') after seeing the last packet with a matching flow ID, 3) 30 minutes (configurable ``active timeout'')' after the record was created to avoid staleness and 4) when the memory is full.

On every new packet, NetFlow looks up the corresponding entry (creating a new entry if necessary) and updates that entry's counters and timestamps. Since for high speed interfaces, the processor and the memory holding the flow records cannot keep up with the packet rate, Cisco introduced Sampled NetFlow [22] which updates the flow cache only for sampled packets. For a configurable value of a parameter $ N$, a packet is sampled with one in $ N$ probability.

One problem with NetFlow is that the memory required by the flow records and the bandwidth consumed to report them depends strongly on the traffic mix. In particular, large floods of small packets with randomly spoofed source addresses can increase memory and bandwidth requirements by orders of magnitude. Adaptive NetFlow [10] solves this problem by dynamically adapting the sampling rate. Adaptive NetFlow divides the operation of the flow measurement algorithm into equally spaced time bins. Within each bin, the algorithm starts by sampling aggressively (high sampling probability). If memory is consumed too quickly, it switches to less aggressive sampling. It then ``renormalizes'' existing entries so that they reflect the counts they would have had with the new sampling rate in effect from the beginning of the bin. At the end of the bin, all entries are reported.

Using fixed size bins in Adaptive NetFlow increases the memory utilization compared to Sampled NetFlow and causes bursts in reporting bandwidth. Memory utilization is higher because, to operate seamlessly between bin-boundaries, Adaptive NetFlow requires two sets of records (double-buffering), one for current bin and one for records in the previous bin while they are being transmitted. Without double-buffering, flow records that expire at the bin-boundary need to be transmitted immediately to create space for the next set of entries. Large flows spanning multiple bins are reported separately for every bin increasing the bandwidth usage. gives a summary comparison of Sampled NetFlow, Adaptive NetFlow and Flow Slices.

The flow records are used to estimate the number of bytes or packets in various traffic aggregates of interest. This can give network operators information about dominant applications, the network usage of various clients, traffic matrices, and many other useful statistics [12,19,1,14]. Smart Sampling [8] is a way of reducing the data used by such analyses without significantly affecting their results. Smart Sampling retains flow records with probability proportional to the size of their byte counter. The flow records can also be used to estimate the number of active flows which is important when looking for denial of service attacks, scans, and worms in the traffic mix. Unfortunately, if we use Sampled NetFlow it is impossible to recover the number of flows in the original traffic from the collected data [5] unless we use protocol information. By using the SYN flag information in flow records we can accurately estimate the number of TCP flows in the traffic mix [9].


next up previous
Next: Description of flow slices Up: The Power of Slicing Previous: Introduction
Ramana Rao Kompella 2005-08-12