Check out the new USENIX Web site. next up previous
Next: Estimating flow arrivals Up: Estimators based on flow Previous: Estimating byte counts


Estimating the number of active flows

We use two definitions for counting flows: active flows and flow arrivals. A flow is active during a time bin if it sends at least one packet during that time bin. Consecutive TCP connections between the same two computers that happen to share the same port numbers are considered a single flow and they will be reported in the same flow record under our current assumptions. Active flows with none of their packets sampled by the flow slicing process, will have no records; at least some of the flow records we get should be counted as more than one active flow, so that the total estimate will be unbiased. We count records with a packet counter $ c_s$ of $ 1$ as $ 1/p$ flows and other records as $ 1$ flow and this gives us unbiased estimates for the number of active flows.

$\displaystyle \widehat{f}=\left\{\begin{array}{ll} 1/p & \textrm{if $c_s=1$}\\ 1 & \textrm{if $c_s>1$} \end{array}\right.$ (3)

Lemma 3   $ \widehat{f}$ as defined in has expectation $ 1$.

Proof: There are three possible cases: if a packet before the last gets sampled, $ c_s>1$, if only the last packet gets sampled $ c_s=1$, and if none of the packets gets sampled there will be no flow record, so the contribution of the flow to the estimate of the number of active flows will be $ \widehat{f}=0$. The probability of the first case is $ p_{s-1}=1-(1-p)^{s-1}$, the probability of the second is $ p(1-p_{s-1})$ and that of the third is $ (1-p)(1-p_{s-1})$.

$\displaystyle E[\widehat{f}]$ $\displaystyle =$ $\displaystyle p_{s-1}\cdot 1 +p(1-p_{s-1})\cdot 1/p+$  
    $\displaystyle (1-p)(1-p_{s-1})\cdot 0=1$  

$ \blacksquare$

The estimators for the number of bytes and packets in a flow were trivial to generalize to the case where we apply random packet sampling before flow slicing because the expected number of packets and bytes after packet sampling was exactly $ q$ times the number before. For the number of active flows there is no such simple relationship and actually it has been shown that it is impossible to estimate without significant bias the number of active flows once random sampling has been applied [5]. But by changing slightly the definition of flow counts we can take advantage of the SYN flags used by TCP flows.


next up previous
Next: Estimating flow arrivals Up: Estimators based on flow Previous: Estimating byte counts
Ramana Rao Kompella 2005-08-12