We will describe a hybrid solution which uses a combination of FEC and ARQ to construct a CLVL. Recall that a CLVL abstraction aims to bound the bundle loss rate to a small value . Since burstiness of cross-traffic is usually unpredictable, we define as a statistical bound on the average loss rate observed over some larger period of time (on the order of seconds).
FEC vs ARQ trade-off: The main distinction between FEC and ARQ is in the trade-off between bandwidth overhead and packet recovery time. While FEC can help in quickly recovering from packet losses, the bandwidth overhead can be high especially over virtual links experiencing bursty losses . On the other hand, an ARQ based solution will have a high packet recovery time if the between two overlay nodes is large. To strike a balance between these two approaches, we present a hybrid approach that uses the best features of both these mechanisms.
We will first briefly describe how one will construct a CLVL using purely ARQ or FEC and combine these approaches to obtain a hybrid CLVL construction.
ARQ-based CLVL: A purely ARQ-based solution for building CLVLs is easy to construct. In a reliable transmission (), a packet is repeatedly retransmitted until the sender receives an acknowledgment from the receiver. Similarly, to achieve a non-zero target loss rate, , it is enough to retransmit any lost packet times, where represents the average loss rate over the interval over which we want to bound . However, if , a pure-ARQ based CLVL is unattractive since it uses multiple RTTs to achieve the bound .
FEC-based CLVL: In an FEC-based approach, we divide time into windows of period, , where a window is a unit of encoding/decoding. We consider an erasure code such as Reed-Solomon, characterized by , where is the number of packets arriving at the entry node during the window, and represents the number of redundant packets added. Let denote the redundancy factor, where . The FEC problem reduces then to determining a minimum redundancy factor, , such that the target loss rate is achieved. Since the hybrid approach (i.e., FEC+ARQ based CLVLs) presented below outperforms the FEC based CLVLs in most of the cases, we skip the description of our algorithm for computing the ideal value of .
FEC+ARQ based CLVL: Due to delay constraints for loss recovery, we restrict the number of retransmissions to at most one. We divide packets into windows and add an FEC redundancy factor of for each window in the first round. In the second round, if a window is non-recoverable, the entry node retransmits the lost packets with a redundancy factor .
We need to estimate the parameters, and . Let denote the PDF of the loss rate , where each value of is measured over an encoding/decoding window. FEC offers loss protection within a window if the fraction of packets lost in a window, , is less than the amount of redundancy added for that window. Given a redundancy factor, , the expected packet loss rate after recovering from FEC is given by:
Hence the expected packet loss rate after the two rounds in the hybrid approach is equal to . Given a target loss-rate, , we require:
For a given window, is the FEC overhead in the first round, is the expected number of retransmitted packets and , the expected overhead in the second round. The expected bandwidth overhead is given by
This yields the following optimization problem: Given a target loss rate , determine the redundancy factors and that minimizes the expected overhead, subject to the target loss constraint: .
For many loss distributions that occur in practice, the optimal solution for this problem is when . This solution implies that it is better not to use FEC in the first round, and use FEC only to protect retransmitted packets. When and , an FEC+ARQ CLVL reduces to a pure ARQ based CLVL. This happens when where is the average loss-rate along the virtual link. An FEC+ARQ CLVL can be made adaptive to sudden variations in the loss characteristics by always applying a minimal amount of FEC (), to the retransmitted packets in a window.
We made a simplistic assumption in the above calculation. We used the same distribution to model the fraction of losses during both the first and second round. Since the number of packets in a retransmitted window may be much smaller than the original window, the same distribution may not apply. To overcome this problem, we estimate a table of loss distributions (rather than one ) across different time-scales and apply the appropriate distribution based on the number of retransmitted packets.