Next: Conclusion Up: Reliability and Security in Previous: Requests Rejected for Security

Related Work

Similar to CoDeeN, peer-to-peer systems [20,22,24] also run in a distributed, unreliable environment. Nodes join or depart the system from time to time, and node failures can also happen often. Besides maintaining a membership directory, these systems typically follow a retry and failover scheme to deal with failing nodes while routing to the destinations. Although practically, these trials can be expected by peer-to-peer system users, the extra delays in retrying different next hops can cause latency problems. For latency-sensitive applications implemented on a peer-to-peer substrate, multiple hops or trials in each operation become even more problematic [7]. The Globe distribution network also leverages hierarchy and location caching to manage mobile objects [3]. To address multiple-hop latency, recent research has started pushing more membership information into each node in a peer-to-peer system to achieve one-hop lookups [12,21]. In this regard, similar arguments can be made that each node could monitor the status of other nodes.

Some researchers have used Byzantine fault tolerant approaches to provide higher reliability and robustness than fail-stop assumptions provide [1,5]. While such schemes, including state machine replication in general, may seem appealing for handling failing nodes in CoDeeN, the fact that origin servers are not under our control limits their utility. Since we cannot tell that an access to an origin server is idempotent, we cannot issue multiple simultaneous requests for one object due to the possibility of side-effects. Such an approach could be used among CoDeeN's reverse proxies if the object is known to be cached.

In the cluster environment, systems with a front end [11] can deploy service-specific load monitoring routines in the front end to monitor the status of server farms and decide to avoid failing nodes. These generally operate in a tightly-coupled environment with centralized control. There are also general cluster monitoring facilities that can watch the status of different nodes, such as the Ganglia tools [9], which have already been used on PlanetLab. We can potentially take advantage of Ganglia to collect system level information. However, we are also interested in application-level metrics such as HTTP/TCP connectivity, and some of resources such as DNS behaviors that are not monitored by Ganglia.

Cooperative proxy cache schemes have been previously studied in the literature [6,19,25,27], and CoDeeN shares many similar goals. However, to the best of our knowledge, the only two deployed systems have used the Harvest-like approach with proxy cache hierarchies. The main differences between CoDeeN and these systems are in the scale, the nature of who can access, and the type of service provided. Neither system uses open proxies. The NLANR Global Caching Hierarchy [15] operates ten proxy caches that only accept requests from other proxies and one end-user proxy cache that allows password-based access after registration. The JANET Web Cache Service [14] consists of 17 proxies in England, all of which are accessible only to other proxies. Joining the system requires providing your own proxy, registering, and using an access control list to specify which sites should not be forwarded to other caches. Entries on this list include electronic journals.

A new Akamai-like system, CoralCDN [8], is in the process of being deployed. Bad nodes are avoided by DNS-based redirection, sometimes using an explicit UDP RPC for status checking.

Next: Conclusion Up: Reliability and Security in Previous: Requests Rejected for Security

Vivek Pai
2004-05-04