We will describe a hybrid solution which uses a combination of FEC and
ARQ to construct a CLVL. Recall that a CLVL abstraction aims to bound
the bundle loss rate to a small value . Since burstiness of
cross-traffic is usually unpredictable, we define
as a statistical
bound on the average loss rate observed over some larger period of
time (on the order of seconds).
FEC vs ARQ trade-off: The main distinction between FEC and ARQ is
in the trade-off between bandwidth overhead and packet recovery
time. While FEC can help in quickly recovering from packet losses, the
bandwidth overhead can be high especially over virtual links
experiencing bursty losses [22]. On the other hand, an ARQ
based solution will have a high packet recovery time if the
between two overlay nodes is large. To strike a balance between
these two approaches, we present a hybrid approach that uses the best
features of both these mechanisms.
We will first briefly describe how one will construct a CLVL using purely ARQ or FEC and combine these approaches to obtain a hybrid CLVL construction.
ARQ-based CLVL: A purely ARQ-based solution for building CLVLs is
easy to construct. In a reliable transmission (), a packet is
repeatedly retransmitted until the sender receives an acknowledgment
from the receiver. Similarly, to achieve a non-zero target loss rate,
, it is enough to retransmit any lost packet
times, where
represents the average loss rate over the
interval over which we want to bound
. However, if
, a
pure-ARQ based CLVL is unattractive since it uses multiple RTTs to
achieve the bound
.
FEC-based CLVL: In an FEC-based approach, we divide time into
windows of period, , where a window is a unit of
encoding/decoding. We consider an erasure code such as
Reed-Solomon, characterized by
, where
is the number of
packets arriving at the entry node during the window, and
represents the number of redundant packets added. Let
denote the
redundancy factor, where
. The FEC problem reduces then to
determining a minimum redundancy factor,
, such that the target
loss rate
is achieved. Since the hybrid approach (i.e., FEC+ARQ
based CLVLs) presented below outperforms the FEC based CLVLs in most of
the cases, we skip the description of our algorithm for computing the
ideal value of
.
FEC+ARQ based CLVL:
Due to delay constraints for loss recovery, we restrict the number of
retransmissions to at most one. We divide packets into windows and add
an FEC redundancy factor of for each window in the first round. In
the second round, if a window is non-recoverable, the entry node
retransmits the lost packets with a redundancy factor
.
We need to estimate the parameters, and
.
Let
denote the PDF of the loss rate
, where each value of
is measured over an encoding/decoding window.
FEC offers loss protection within a window if the fraction of
packets lost in a window,
, is less than the amount of redundancy
added for that window. Given a redundancy factor,
, the expected
packet loss rate after recovering from FEC is given by:
Hence the expected packet loss rate after the two rounds in the hybrid
approach is equal to
. Given a target
loss-rate,
, we require:
For a given window, is the FEC overhead in the first round,
is the expected number of retransmitted packets and
, the expected overhead in the second round. The expected
bandwidth overhead is given by
This yields the following optimization problem: Given a target loss
rate , determine the redundancy factors
and
that
minimizes the expected overhead,
subject to the target
loss constraint:
.
For many loss distributions that occur in practice, the optimal
solution for this problem is when . This solution implies that
it is better not to use FEC in the first round, and use FEC only to
protect retransmitted packets. When
and
, an FEC+ARQ
CLVL reduces to a pure ARQ based CLVL. This happens when
where
is the average loss-rate along the
virtual link. An FEC+ARQ CLVL can be made adaptive to sudden
variations in the loss characteristics by always applying a minimal
amount of FEC (
), to the retransmitted packets in a window.
We made a simplistic assumption in the above calculation. We used the
same distribution to model the fraction of losses during both
the first and second round. Since the number of packets in a
retransmitted window may be much smaller than the original window, the
same distribution
may not apply. To overcome this problem, we
estimate a table of loss distributions (rather than one
) across
different time-scales and apply the appropriate distribution based on
the number of retransmitted packets.