To address the issue of restricting access to content, we employ privilege separation, which works by observing that when a proxy forwards a request, the request assumes the privilege level of the proxy since it now has the proxy's IP address. Therefore, by carefully controlling which proxies handle requests, appropriate access privileges can be maintained. The ideal solution for protecting licensed content would be to insert an 'X-Forwarded-For' header, but it requires cooperation from the content site - checking whether both the proxy address and forwarded address are authorized. Although this is a simple change, there are some sites that do not handle the header. For such sites, content protection requires CoDeeN to identify what content is licensed and we take an approximate approach. Using Princeton's e-journal subscription list as a starting point, we extracted all host names and pruned them to coalesce similarly-named sites, merging journal1.example.com and journal2.example.com into just example.com. We do not precisely associate subscriptions with universities, since that determination would be constantly-changing and error-prone.
When accessing licensed content, we current only allow requests that preserve privilege. Clients must choose a CoDeeN forward proxy in their own local domain in order to access such content. These local clients are assumed to have the same privilege as the CoDeeN forward proxy, so this approach does not create additional exposure risks. These requests are sent directly to the content provider by the forward proxy, since using a reverse proxy would again affect the privilege level. All other client requests for licensed content currently receive error messages. Whether the local client can ultimately access the site is then a decision that the content provider makes using the CoDeeN node's IP address. Though we cannot guarantee the completeness of the subscription list, in practice this approach appears to work well. We have seen requests rejected by this filter, and we have not received any other complaints from content providers. In the future, when dealing with accesses to licensed sites, we may redirect clients from other CoDeeN sites to their local proxies, and direct all ``outside'' clients to CoDeeN proxies at sites without any subscriptions.
A trickier situation occurs when restricted content is hosted in the same domain as a CoDeeN node, such as when part of a university's Web site is restricted to only those within the university. Protecting these pages from outside exposure cannot use the coarse-grained blacklisting approach suitable for licensed content. Otherwise, entire university sites and departments would become inaccessible. To address this problem, we preserve the privilege of local clients, and de-escalate the privilege of remote clients. We determine if a request to example.edu originates locally at example.edu, and if so, the request is handled directly by the CoDeeN forward proxy. Otherwise, the request is forwarded to a CoDeeN node at another site, and thereby gets its privilege level dropped to that of the remote site through this ``bouncing'' process. To eliminate the exposure caused by forwarding a request to a site where it is local, we modify our forwarding logic - no request is forwarded to a CoDeeN proxy that has the same domain as the requested content.
Since our security mechanisms depend on comparing host names, we also disallow ``outside'' accesses to machines identified only by IP addresses. After implementing this approach, we found that some requests using numerical IP addresses were still being accepted. In the HTTP protocol, proxies receive requests that can contain a full URL, with host name, as the first request line. Additional header lines will also identify the host by name. We found some requests were arriving with differing information in the first line and in the Host header. We had not observed that behavior in any Web browser, so we assume such requests were custom-generated, and modified our redirector to reject such abnormal requests.