Hot Data Centers vs. Cool Peers

Sergiu Nedevschif, Sylvia Ratnasamyf and Jitu Padhyef
fIntel Reserch, fMicrosoft Research

1  Introduction

Most networked applications use either a largely centralized architecture (e.g. iTunes) or a p2p architecture (e.g. BitTorrent). The popularity of centralized Internet applications such as Search and web portals has fueled the growth of large data centers. Modeling and minimizing the power consumption of large data centers is the hot new research area (pun intended).
However, no attention has been paid to the power consumption of p2p systems. This despite the fact that programs like BitTorrent account for up to 95% of Internet traffic today. There are two main reasons why the power consumption of p2p applications has not yet been studied. First, these applications do not consume a large amount of power at a single location. Second, no central entity pays for the power consumed by applications like BitTorrent.
Yet, it is undeniable that systems like BitTorrent do consume power, and quite likely, a lot of power. End hosts consume more power as they do more work and routers consume more power as they route additional traffic [3]. Thus one many wonder: is it more energy efficient to download a song from BitTorrent rather than iTunes?
To answer such questions, one needs to build models that can predict the power consumption of p2p applications. In this paper, we present an outline of a model that allows us to compare the energy consumed by a p2p system to its centralized counterpart. Note that it is not our aim to advocate one system architecture over another. Many issues such as manageability, reliability and ease of deployment must be taken into account when making high-level architectural decisions. Indeed, it may be entirely infeasible to deploy applications such as web search in a p2p manner.
Thus, this model is best viewed as an exercise driven primarily by academic curiosity. And yet, this exercise is not entirely without practical interest. Today energy costs represents a significant component of the capital and operational costs of data centers, and hence it is important to understand whether, and when, alternate system architectures might prove more cost effective. For example, perhaps in controlled environments such as enterprise networks it is reasonable to harness user desktops to take on some of the tasks traditionally assigned to server rooms and data centers? More broadly, models like this help us understand the energy impact of the different design decisions that go into building a large scale system. For example, our model shows that numerous factors such as power consumption of routers, whether the application is communication vs. computation heavy and the efficiency of the protocol design affect the overall power consumption.
To the best of our knowledge, this is the first attempt to model the energy consumption of complex networked systems in their entirety. We hope our initial exploration spurs further research on this topic.

2  Background

Before we formalize the details of the model, we discuss some of the overarching factors that impact the power consumption of p2p systems and data centers.
P2P systems: Several studies have shown that end-systems such as enterprise desktops, home PCs and laptops typically spend a significant portion of the day fully powered-up (i.e.,neither turned off, nor in low-power hibernation). Moreover, the average utilization of these machines is typically very low. Similar to prior studies, in a recent measurement we performed at 300+ enterprise hosts, we found an average "fully on" time of 14.4 hours/day per machine, with an average utilization below 5%. In this paper, we assume that a p2p application makes use of such already-on, but underutilized, end-systems. In other words, we assume that only peers that are (for whatever reason) already powered up will participate in a p2p service and that users will not leave their machines powered on for the sole purpose of participating in the p2p service.1 This assumption will play a key role in the power consumption we attribute to a p2p architecture.
Needless to say, the use of peers has certain drawbacks as well. The primary overhead is that p2p systems make heavier use of the network, by generating overhead traffic for tasks such as membership maintenance and peer discovery. Furthermore, peers are at the edges of the network, and the average path length between two peers is likely to be more than the average path length from the client to the central server.
Data Centers: Two factors reduce the energy efficiency of data centers. First, because data centers consolidate thousands of computers in a single facility, cooling becomes a gargantuan issue. Several studies report that as much as 50% [2,4] of the power consumed by today's data centers is spent on cooling. By contrast, cooling is less of an issue for more decentralized systems such as p2p systems. For example, in its heyday, Napster routinely supported upto 25 million users with no special concern for cooling simply because its "servers" were geographically dispersed. And while one may argue that enterprise and homes may also require air-conditioning systems, we note that this need arises regardless of the p2p service. I.e., for an enterprise, the presence of a large number of employees (and their machines) necessitates cooling whether or not their machines participate in p2p applications. Moreover, since we assume a p2p application only uses machines that were already fully powered on, the additional cooling, if any, required to run a p2p application is negligible.
Another factor is the high baseline energy consumption of computers. As noted in [1], the power consumed by a computer is roughly linear in its CPU utilization, with an offset. This offset represents the power required simply to have the machine powered up and ready to process work - we term this the baseline energy consumption. Several empirical studies [3,1,4] reveal that this baseline consumption typically dominates the incremental consumption due to increased utilization.
In a data center, machines exist solely to run the service(s) in question and hence the service(s) can be viewed as responsible for the entire power consumption of the servers. By contrast, if a p2p service makes use of peers that are already powered-on, then it is only responsible for the resultant increase in the peer's power consumption (since the peer was already consuming at least the baseline power draw just by virtue of being fully powered on). Hence, from the standpoint of energy consumption, the p2p option may be preferable, especially when the baseline power consumption is the dominant component of a machine's overall power consumption. This is essentially an argument for reusing machine deployments.
In the rest of the paper, we shall see that the assumptions we make about the issues mentioned above are critical in determining which architecture is more energy efficient.

3  Model components

In this paper, we focus on a simple file transfer application. Both centralized (e.g. iTunes) and p2p (e.g. BitTorrent) versions of this application are in popular use. Even for the simple file transfer application, creating a good model is a difficult exercise. First, we must decide what it means for a centralized architecture and a p2p architecture to deliver equivalent functionality. Only then can one compare their power consumption. Second, we must account for the energy consumption due to the network/routers involved. Finally, the various servers, peers and routers might all be handling more than just the particular service under evaluation and hence we must be careful in deciding what fraction of the energy consumption at (for example) a router should be attributed to the particular service we're considering.
We now describe the key components of our model. The model contains many simplifications since our goal is to roughly estimate how various parameters influence the energy consumption in the two architectures.
Consumers of energy: There are three primary consumers of energy in the scenarios we consider: servers, peers and the routers along the path. The number of routers depends on the distribution of path lengths in the Internet. For the purpose of this paper, we assume Internet paths are linearly distributed with an average path length of ds and dp hops to (data-center) servers and peers respectively.2 Finally, for a p2p system, we use n (>1) to denote the number of peers that would be required to service a request at the same level as a single server.
Components of energy: We divide the energy consumption in each of the above consumers - servers, peers and routers - into two components. The first is the energy consumed just from having the equipment be powered up and ready to process work - we refer to this as the baseline energy of the equipment. In addition, we have the energy required to actually process work; this work-induced energy is directly dependent on the equipment's level of utilization when processing the offered workload. In keeping with recent empirical findings[1], we assume that the work-induced component of energy scales linearly with equipment utilization. Thus, the energy consumed by (for example) a server over time t as:

E = (Sbase + (Smax - Sbase) ·us) ·t
(1)
where Sbase is the server's baseline power draw, Smax is its power draw when serving at maximum capacity and us is the average server utilization. The consumption at peers and routers could be expressed similarly.
Attributing energy components: We must determine what portion of the energy consumed at each of the servers, routers and peers can be attributed to the service in question. This is where a key distinction arises between peers and servers. In the case of data center servers, since they are machines dedicated only to serving client requests, we hold the service responsible for some fraction of both the server's baseline and work-induced energy consumption. The p2p architecture, however, however, makes use of peers that were already powered on, and would have been up anyway, regardless of whether or not they were processing service requests. We thus do not hold the service accountable for the baseline energy consumption at a peer and only hold it responsible for some fraction of the peer's work-induced energy consumption.
For routers, the question of whether or not to attribute a portion of a router's baseline consumption to our service is arguable. On the one hand, routers run regardless of our particular service but, on the other hand, routers' baseline consumptions must eventually be accounted for and a fair way to do so would be to split the charge across all the traffic they serve. We thus consider both possibilities.
Cooling overhead: Data centers require significant amount of energy for cooling the systems. On the other hand, cooling is typically not required for p2p systems. We capture this overhead with a multiplicative factor c ( >= 1) that we apply only to server consumption. I.e., if we were to attribute an amount Es of a server's energy consumption to our service, then we say that cEs is the energy consumption to be attributed when taking cooling costs into account.
P2P communication Overhead: Most p2p systems incur some additional computation and communication overhead due to various factors: the redundancy typically required to compensate for poor quality peers, protocol overheads due to communicating with multiple peers, tracking peer membership and so forth. This overhead results in an increase in the workload seen by both peers and routers and again we model these with multiplicative factors wp and wr ( >= 1) that capture the overall overhead at peers and routers respectively.

4  Model

Our approach is to compute the energy consumption for a single service request under both the data-center and the p2p architectures. Let Es, Ep and Er denote the energy consumption due to a single request at each of a server, peer and router respectively. Then, we can express Edc, the energy consumption for a single request in the data-center scenario as:

Edc = c Es + ds Er
(2)
Similarly, the energy consumption for a p2p-based system, Ep2p, can be expressed as:

Ep2p = n wp Ep + n wr dp Er
(3)
where n is the number of peers, ds and dp the path length to servers and peers and c, wp and wr the cooling and p2p overheads respectively.
Now, to estimate the per-request energy consumptions Es, Ep and Er we must decide what fraction of the baseline and work-induced energy at each of the servers, routers and peers can be attributed to our service request. Computing this is non-trivial because all three entities handle more than just this particular service - servers handle multiple client requests simultaneously, routers handle other network traffic and peers might be running additional user tasks unrelated to our service.
We proceed as follows: we assume that server and peer workloads are linearly proportional to the number of requests. The assumption is reasonable for moderate utilizations, especially for simple tasks like file transfer. Thus, if a request involves the transfer of an average of B bits, then we can view server (and peer) workloads as linear in the number of bits transferred over the network. We therefore introduce the notion of the energy consumed per bit transferred by each of servers, peers and routers and then measure the per-request consumption by multiplying the per-bit consumption with the number of bits transferred per request. Define:

δs, δp, and δr: these denote the work-induced energy consumed per additional bit transferred by a server, peer and router respectively.
γs, γp, and γr: these denote the baseline energy consumed per processed bit. For completeness, we refer to γp even though, as argued earlier, we set γp = 0.

We can now expand the energy equations ( 23) as:

Edc = c Es + ds Er= c (δs + γs) B + ds (δr + γr) B (4)

Ep2p = n wp Ep + n wr dp Er
= n wp (δp + γp) B/n + n wr dp (δr + γr) B/n
= wp (δp + γp) B + wr dp (δr + γr) B
(5)

where B is the total number of bits transferred per request served at a server. Note that the above assumes that the bits transferred per peer is inversely proportional to n, the number of peers. This is reasonable since we're using bits transferred as indicative of workload. While peers may see additional overheads, recall that these are reflected in the values of wp and wr.
We now describe how to express these per-bit work-induced and baseline consumptions in terms of measurable quantities. Define:

Ms, Mp, Mr: these denote the maximum capacity measured in bits-per-second for servers, peers and routers respectively. I.e., this is the network transfer rate corresponding to the point at which the machine is processing the task to its fullest capability.

Sbase and Smax : these denote a server's baseline power consumption and its consumption when operating at maximum capacity Ms. Sbase is independent of workload but Smax will again depend on workload. For example, for a computation-intensive task, we'd expect higher levels of CPU utilization and since the CPU is a major contributor to system consumption, we'd expect a higher gap between Sbase and Smax . Communication-intensive tasks tend to saturate the I/O or memory capacity first and hence see much lower CPU utilization. So we'd expect a lower gap between Sbase and Smax for such workloads.

Pbase and Pmax , Rbase and Rmax : Similar to the above, these denote the baseline and max-capacity consumption for peers and routers respectively.

μs and μr: these denote the average utilization at a server and router respectively.

Given the above, the total work-induced energy consumed per second by a server operating at maximum capacity Ms can be expressed as: Smax - Sbase, and therefore the per-bit energy is:

δs = (Smax - Sbase)/Ms (6)
Similarly, the work-induced energy per bit in peers and routers can be expressed as:

δp = (Pmax - Pbase)/Mp (7)
δr = (Rmax- Rbase)/Mr (8)

Recall that we do attribute a fraction of the server's baseline consumption to each service request. To compute this fraction, we consider the average bits/second handled by the server as μs ·Ms bps and hence compute the per-bit baseline energy consumption at the server as:

γs = Sbase/(μs Ms) (9)

Since our service is not held responsible for the baseline consumption at peers, we set:
γp = 0 (10)
Similarly, depending on whether or not a router's baseline consumption is to be amortized across our service requests, we can compute the per-bit baseline consumption at routers as either:
γr = 0,       or       γr r = Rbase/(μrMr) (11)
We complete the model by substituting equations (6) - (11) into equations (4,5).

5  Comparison of two architectures

We now derive values of various model parameters from empirical measurements, and use them to compare the power consumption of p2p and centralized (data center) architectures.
Router parameters - δr and γr: We derive our router parameters from a recent measurement study that reports on the power consumption of a Cisco GSR router [3]. The study reports an idle power draw Rbase = 750W. In a typical configuration, the routers uses 4 cards with speeds of 2.5Gbps/card for a total capacity of 10Gbps. If we conservatively assume an average router utilization of 50% this gives us a per-bit baseline consumption of γr = 750W/5Gbps =150 ·10-9 J/bit. When routing at 2.5Gbps, the study reports an increase of at most 20W for a per-bit energy increase of 8 ·10-9J/bit.
Server and peer parameters - δs, δp, and γs: To measure the per-bit energy for peers and servers, we pick two machines with typical configurations for their class. As representative of peers, we use a single-core Intel Xeon 3.0GHz desktop running Linux, equipped with 1GB RAM and a gigabit Ethernet Intel PRO/1000 NIC. To represent servers, we use an 8-core machine featuring an Intel S5000PSL Server Board, two Intel Xeon X5355 processors with four cores each, 16GB RAM and 8 Gbps Intel PRO/1000 NICs.
We measure an idle power draw of 291W for the server, and 140W for the desktop; these are the baseline power consumptions Sbase and Pbase respectively.
To measure work-induced power consumption at peers and servers, we repeatedly downloaded files from the two machines using a number of clients and httperf [5] benchmarking tool. We gradually increase the client request rate and record the maximum web server capacity (in terms of processed requests).
We measured the average power draw, network throughput and CPU utilization at maximum capacity. These measurements allow us to calculate δs and δp using Equations (6) and (7). To calculate the baseline power consumption, we use Equation (9) and assume that a server is on average utilized to 50% of its maximum capacity. Note that γp is 0, as per Equation (10). The results are summarized in Table 5.
Note that the per-bit server baseline energy γs dwarfs the work-induced energy consumption. This is because the workload is communication intensive, and the CPU utilization at maximum load is relatively low. For more CPU intensive applications, the difference in the numbers is smaller.
Other parameters - c, wp, wr, ddc, dp2p: To a first approximation, a well-managed data center has an overhead c of about 2x, which implies that for every Watt of server power, an additional Watt is consumed by the chillers, UPSs, air handlers, pumps, etc. Indications are that for some data centers this value is as much as 3x and higher [2]. We note that this factor does not include the cost of provisioning for this cooling. We use a value of c=2 in our calculations.
For the communication and workload overheads, we pick conservative estimates of wr=wp=2 [7]. Finally, we estimate network path lengths using a recent measurement study [6] that reports average Internet path length to/from a CDN (representative of a datacenter) to be around 13 hops and average path lengths between peers to be around 15 hops. We thus set ddc = 13 and dp2p = 15.
We now compute the per-request energy consumption for both the p2p and data-center scenarios. In each case, we consider the consumption with and without "charging" our service for the router baseline consumption.
Not charging for router baseline consumption: γr=0:
Using the values from the previous section, we obtain:

Edc = c (δs + γs) B + δs γs B
= (2 (5 * 10-9 + 673 * 10 -9) + 13 * 8 * 109) B
= 1.460 * 10-6 * B  (J)
(12)
and:

Ep2p = wp δp B + wr dp γr B
= (2 * 16.2 10-9 + 2 * 15 * 8 * 10-9) B
= 0.272 * 10-6 B  (J)
(13)
We see that, even though the energy spent within the network path is larger for the p2p scenario, the baseline energy consumption of servers proves to be the dominant factor leading the data center scenario to a higher overall consumption.

Charging for router baseline consumption:
In this case, we find:

Edc = 3.140 * 10-6 B  (J)

Ep2p = 4.762 * 10-6 B  (J)

In this case, we see that the balance tips, with the data-center scenario proving more efficient than the p2p one. This is due to the per-bit router baseline energy which is both large and incurred at all routers along the path, making network consumption the dominant factor in the overall consumption in this case of communication-intensive workloads. Moreover, the higher p2p network consumption is exacerbated by: (a) the longer paths longer paths since dp > ds, and (b) the p2p overhead factor wr.

Load Avg CPU BW δ γ
pwrUtil(Mbps) (J/bit · (J/bit ·
(W) (%) 10-9) 10-9)
Peer Idle 140 1 0 - -
Srv Idle 291 0 0 - -
Peer Max 153 30 80 16 0
Srv Max 336 50 864 5 673
Table 1: Measured power, CPU utilization and network throughput for a peer and a server machine.

6  Extrapolations

The previous section compared the energy efficiency of data-center and p2p systems for select data points. For a more general comparison, we consider the asymptotic behavior of the ratio Edc/Ep2p. For simplicity, we consider energy consumption in the network separately from that at servers and peers and look at the above ratio for each case individually. We omit detailed derivations and report only our final results.
a) Network energy: When considering only the in-network component of energy consumption, we find:
Edc

Ep2p
= 1

wr
ds

dp
(4)
This is usually £ 1, since wr ³ 1 and, for random peer selection ds < dp. Thus, as expected, p2p usually fares worse, to an extent determined primarily by the efficiency of the p2p protocol.
b) End systems: For the non-network energy component, we find:
Edc

Ep2p
= c

wp
{ 1 + 1

μs(r-1)
}
(5)
where, μs is the server utilization, and
r = Smax

Sbase
( = Pmax

Pbase
)
(6)
With regard to r: since Smax and Pmax depend on the nature of the workload, we capture this influence by considering two extreme values of r (in today's machines). In summary, from the point of view of end systems, p2p is likely to always win. This is to be expected, since the baseline energy consumption at peers comes for free. The above results lead us to speculate on the potential impact of different strategies for greater energy efficiency:
Improved data-center efficiency: To what extent can more energy-efficient data-center design narrow the gap? At best, data centers might eliminate cooling altogether (c=1), and consolidate workload to run each server at full utilization (μs = 1). The latter may be accomplished by using virtual machines (VMs) to host services, and allocating VMs to servers in a way that maximizes utilization. Amazon's EC2 is an example of such an architecture. Even with these utopian improvements, the ratio in eqn. (17) equals 2/wp suggesting that, even for energy-optimal data centers, p2p systems will be better if wp overhead is smaller than 2x. The culprit here is, of course, the high baseline consumption at servers which leads to our next question.
Energy-proportional machines: What would be the impact of lowering the baseline consumption at servers and having computers consume energy in proportion to their utilization[1]? In this case, our parameter r = Smax/Sbase tends to infinity, and the ratio contrasting end-system energy consumption (eqn. (15)) becomes Edc/Ep2p = c / wp. At this point, the comparison between data-centers and p2p systems depends only the relative penalties due to data-center cooling (c) and p2p overheads (wp).
More efficient peers: We've seen that p2p systems are often less energy-efficient within the network. Can this be remedied? One approach would be to use smarter peer selection. Studies show that a sizable fraction of peers are closer to any given client than is the data center[6]. Intelligently selecting such closeby peers would lead to ds/dp ³ 1. With this, a p2p system can outperform data-centers even for in-network consumption provided its communication overhead wr < (ds/dp).
Weighing network energy and end system energy: We saw that while p2p systems are more efficient in end-system consumption, data centers fare better on network consumption. Ultimately, the decision on which is preferable depends on the relative magnitude of these two components which in turn depends greatly on the nature of the task; computation-intensive tasks spend most of their energy in end systems while network energy is more relevant for communication-intensive tasks .
The above discussion illustrates the importance of overall system architecture in determining the energy efficiency of network services, as also the value of system-wide models that capture the energy consumption of networked systems in their entirety. .85

References

[1]
L. Barroso and U. Hölzle. The Case for Energy-Proportional Computing. COMPUTER, 2007.
[2]
C. Belady. In the Data Center, Power and Cooling Costs More Than The IT Equipment it Supports. ElectronicsCooling, Vol. 13, No. 1, 2007.
[3]
J. Chabarek, J. Sommers, P. Barford, C. Estan, D. Tsiang, and S. Wright. Power Awareness in Network Design and Routing. In INFOCOM, 2008.
[4]
X. Fan, W. Weber, and L. Barroso. Power provisioning for a warehouse-sized computer. Proceedings of the 34th annual international conference on Computer architecture, pages 13-23, 2007.
[5]
httperf. http://www.hpl.hp.com/research/linux/httperf/.
[6]
S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. Towsley. Measurement and Classiļ¬cation of Out-of-Sequence Packets in a Tier-1 IP Backbone. IEEE INFOCOM, 2003.
[7]
J. Li, J. Stribling, R. Morris, M. Kaashock, and T. Gil. A performance vs. cost framework for evaluating DHT design tradeoffs under churn. IEEE INFOCOM, 1, 2005.

Footnotes:

1Strategies that incentivize users to cut down on the up-time of their machines will admittedly diminish the pool of potential peers in which case we might have to revisit this assumption.
2 We can incorporate more realistic models to improve accuracy.