Geolocalization on the Internet through Constraint SatisfactionBernard Wong, Ivan Stoyanov, Emin Gün Sirer |
Abstract: This paper outlines a novel, comprehensive framework for geolocalization, that is, determining the physical location of Internet hosts based on network measurements. The core insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a system of constraints by deriving them aggressively from network measurements, and to solve the system using cheap and accurate geometric methods. The framework is general and accommodates both positive and negative constraints, that is, constraints on where the node can or cannot be, respectively. It can reason in the presence of uncertainty, enabling it to gracefully cope with aggressively derived constraints that may contain errors. Since the solution space is represented geometrically as a region bounded by Bezier curves, the framework yields an accurate set of all points where the target may be located. Preliminary results on PlanetLab show promise; the framework can localize the median node to within 22 miles, a factor of three better than previous approaches, with little error.
Given a set Ω of positive constraints and a set Φ of negative constraints on the position of a target node i, the estimated location region for the target is given by:
Figure 1: Octant computes an estimated location region for a target node by combining positive and negative information available through latency measurements. The resulting location estimate comprises non-convex, potentially disjoint regions separated by weight.
Yet the correlation between latency measurements and real-world distances is typically better and tighter than constraints based on the speed of light. Figure 2 plots the network latency against physical distance from a primary landmark (planetlab1.cs.rochester.edu) to all other primary landmarks in our study. The figure makes clear the loose correlation between physical distance and illustrates how overly conservative the speed of light bounds can be. In addition, the empty region to the lower right suggests that few links are significantly congested; nodes that are physically close are typically reachable in a short amount of time. This presents an opportunity for a system wishing to aggressively extract constraints at the risk of occasionally making overly aggressive claims, to both tighten the bounds on positive constraints and to introduce negative constraints.
Figure 2: The latency-to-distance plot of peer landmarks for a representative landmark (planetlab1.cs.rochester.edu). The shaded region denotes the valid point locations as bounded by the propagation delay time of light in fiber. The convex hull around the data-points serves as the positive and negative constraints for the node. For a given latency, the top and bottom of the hull represent the outer and inner radius respectively of the constraint annulus. As distances increase, fewer representative nodes remain, rendering the convex hull overly aggressive. Vertical lines indicate the 50 and 75th percentile cutoffs, where the convex hull is cut and replaced with conservative positive and negative constraints when insufficient representative nodes remain.
|
= |
|
a' + t' + (a,t) | = | [a,t] |
b' + t' + (b,t) | = | [b,t] |
c' + t' + (c,t) | = | [c,t] |
We evaluated Octant using physical latency data collected from 51 PlanetLab [3] nodes whose real world geographic locations we were able to determine externally. The latency data was collected via 10 time-dispersed round-trip measurements using ICMP ping probes. To evaluate the efficacy of using secondary landmarks, we also collected the full traceroute information between every landmark pair, as well as latency data between the landmarks and intermediate routers. Following [9, 6], nodes serve both as landmarks and targets in our evaluation; of course, the node's own position information is not utilized when it is serving as a target. No two hosts in our evaluation reside in the same institution, which rules out simple yet unrealistic and unscalable solutions to geolocalization that rely on having a nearby landmark for every target. We compare Octant with GeoLim, GeoPing, and GeoTrack, the current state-of-the-art in geolocalization.
Figure 3: Comparison of the accuracy of different localization techniques. Octant achieves significantly greater accuracy than previous work, yielding point estimates for nodes that are substantially closer to the real positions of the targets.
Octant's runtime latency and memory requirements are linear in the number of landmarks. This is achieved by restricting the number of regions with distinct weights to a system specified limit. The lowest weight regions that are least likely to meet the final confidence threshold are removed in order until the limit is met. On a modern workstation and a deployment with 50 landmarks, a target can be geolocalized in a few seconds.
To provide insight into Octant's accuracy, we examine its performance as we disable various optimizations. We examine the individual contribution of each of our optimizations, namely heights, weights, exponential weights and intermediate nodes, by turning off each one in turn and comparing their accuracy with that of the complete Octant system. Figure 4 shows the resulting CDFs. The largest improvement to system accuracy is due to the use of intermediate nodes, which significantly increases the number of usable landmarks in the system.
This document was translated from LATEX by HEVEA.