Check out the new USENIX Web site. next up previous
Next: Slices shorter than bins Up: Bins, timeouts, and flow Previous: Bins, timeouts, and flow

Continuous operation

The most elementary relaxation of the assumption is to consider continuous operation of the algorithm: records still last longer than the bin length, and we still have separate counters for each bin, but there can be active records at the start of our bin, records created earlier.

The simplest case is that of records spanning the entire bin. The byte and packet counters will reflect the actual traffic, so we use $ \widehat{S}=1/qc_s$ and $ \widehat{B}=1/qc_b$. If we do not have a packet sampling stage we can also compute $ \widehat{f}=1$ if $ c_s>0$ and $ \widehat{f}=0$ otherwise. $ \widehat{A}=0$ because the flow started in an earlier bin.

If a flow record expires within the bin we run the analysis on, it can be the only record for the flow, but it is also possible that another record for the same flow would get created after the first record's expiration. For byte and packet counts which are additive we can just add the counters from the first record to the estimates from the second $ \widehat{s}=\widehat{s}_1+\widehat{s}_2$ and $ \widehat{b}=\widehat{b}_1+\widehat{b}_2$. The analysis of unbiasedness carries through because we can consider that the bin is actually two sub-bins, one ending when the first record ends and the other starting at the same time. Since we have unbiased byte and packet estimates for both sub-bins, our estimates for the sum of the bins will still be unbiased.

If $ c_{s1}>0$, we know that the flow sent packets during the bin, so we set $ \widehat{f}$ to 1, otherwise we use with $ c_{s2}$ since an unbiased estimator for whether the flow was active in the second sub-bin will tell use whether it was active overall. This approach preserves overall unbiasedness, but it makes analysis more complicated because the two flow records representing the flow cannot be processed independently anymore: the contribution of the second record to the flow count of the bin depends on whether there was a first record with the same flow identifier. When the router reports the records, they might not be near each other, so the analysis has to do ``flow reconstruction'': keep a hash table with flow identifiers and find flow records with the same flow identifier covering parts of the same bin. The consequence of not doing flow reconstruction is running the risk of double counting such flows with more than one record (which might be acceptable in many settings).

By our definition of flow arrivals from , as long as assumption 1 holds, if a flow has a record that starts before the start of the bin, we should use $ \widehat{A}=0$, irrespective of whether we have a second flow record (possibly with a SYN flag) or not. If we have a second flow record with the SYN flag set we can clearly say that assumption 1 does not hold, but without flow reconstruction we might count it separately against the flow arrival count. In many settings this type of over-counting is not a serious concern. $ \widehat{A}^{(2)}$ should not be used because assumption 2 does not hold.


next up previous
Next: Slices shorter than bins Up: Bins, timeouts, and flow Previous: Bins, timeouts, and flow
Ramana Rao Kompella 2005-08-12