Large enterprises, campuses, and data centers have traditionally used multihoming to multiple ISPs as a way of ensuring continued operation during connectivity outages or other ISP failures. While increased resilience and availability remain primary objectives of multihoming, there is increasing interest in deriving other benefits from multiple ISP connections. In particular, multihoming can be leveraged for improving wide-area network performance, lowering bandwidth costs, and optimizing the way in which upstream links are used [12].
A number of products provide these route control capabilities to large enterprise customers which have their own public AS number and advertise their IP address prefixes to to upstream providers using BGP [20,18,10]. Recognizing that not all enterprises are large enough to warrant full BGP peering with upstream ISPs, another class of products extends these advantages to smaller multihomed organizations which do not use BGP [14,17,7]. All of these products use a variety of mechanisms and policies for route control but aside from marketing statements, little is known about the relative quantitative benefits of these mechanisms.
In an recent measurement study to quantify the performance benefits from multihoming, it was shown that performance could potentially improve by more than 40% when multiple upstream providers are employed [4]. In that study, the focus was on the the maximum achievable benefits, assuming that the multihomed network had perfect information about the performance across all providers at any time and could change routes arbitrarily often. Hence, it is still unclear if, and how, these benefits can be realized in a more practical multihoming scenario.
In this paper we explore design alternatives to realize performance benefits from multihoming in practice, particularly for enterprises with multiple ISP connections. We focus primarily on mechanisms used for inbound route control, since enterprises are mainly interested in optimizing network performance for their own clients who download content from the Internet (i.e., sink data).
We evaluate a variety of active and passive measurement strategies for multihomed enterprises to estimate the instantaneous performance of their provider links and pick the best provider for a given transfer. These strategies are evaluated in the context of a NAT-based implementation to control the inbound ISP link used by enterprise connections. We address a number of practical issues such as the usefulness of past history to guide the choice of the best provider link, the effects of sampling frequency on measurement accuracy, and the overhead of managing performance information for a potentially very large set of target destinations. We evaluate these policies using several client workloads, and an emulated wide-area network where delay characteristics are based on a large set of real network delay measurements.
Our evaluation shows that active and passive measurement-based techniques are equally effective in extracting the performance benefits of using multiple providers, both offering about 15-25% improvement when compared to using a single provider. We also show that the most current sample of the performance to a destination via a given provider is a reasonably good estimator of the near-term performance to the destination. We show that the overhead of collecting and managing performance information for various destinations is negligible. We also conduct an initial study of mechanisms to control the ISP link used by external Internet clients who initiate connections to servers hosted in the enterprise.
The rest of this paper is structured as follows. In Section 2, we describe our enterprise multihoming solution and the various strategies for estimating ISP performance and for route control. Section 3 describes our implementation in further detail. In Section 4, we discuss the experimental set-up and results from our evaluation of the solution. Section 5 discusses some limitations inherent to our approach. Related work is presented in Section 6. Finally, Section 7 summarizes the contributions of this paper.