This section demonstrates the role of brokers to arbitrate resources under changing workload, and coordinate resource allocation from multiple sites. This experiment runs under emulation (as described in Section 4.2) with null resource drivers, virtual time, and lease state stored only in memory (no LDAP). In all other respects the emulations are identical to a real deployment. We use two emulated 70-node cluster sites with a shared broker. The broker implements a simple policy that balances the load evenly among the sites.
![]() |
We implemented an adaptive service manager that requests resource leases
at five-minute intervals to match
a changing load signal. We derived sample input loads from traces of
two production systems: a job trace from a production
compute cluster at Duke, and a trace of CPU load from a major e-commerce
website. We scaled the load signals
to a common basis.
Figure 6 shows scaled cluster resource
demand--interpreted as the number of nodes to request--over a one-month segment
for both traces (five-minute intervals).
We smoothed the e-commerce demand curve with a ``flop-flip'' filter
from [6]. This filter holds a stable estimate of
demand
=
until that estimate falls outside some
tolerance of a moving average (
)
of recent observations, then it switches the estimate to the current
value of the moving average. The smoothed demand curve shown
in Figure 6 uses a 150-minute sliding window moving
average, a step threshold of one standard deviation, and a heavily
damped average
=
.
![]() |
![]() |
![]() |
Figure 7 demonstrates the effect of varying lease terms on the broker's ability to match the e-commerce load curve. For a lease term of one day, the leased resources closely match the load; however, longer terms diminish the broker's ability to match demand. To quantify the effectiveness and efficiency of allocation over the one-month period, we compute the root mean squared error (RMSE) between the load signal and the requested resources. Numbers closer to zero are better: an RMSE of zero indicates that allocation exactly matches demand. For a lease term of 1 day, the RMSE is 22.17 and for a lease term of 7 days, the RMSE is 50.85. Figure 7 reflects a limitation of the pure brokered leasing model as prototyped: a lease holder can return unused resources to the authority, but it cannot return the ticket to the broker to allocate for other purposes.
To illustrate adaptive provisioning between competing workloads,
we introduce a second service
manager competing for resources according to the batch load signal.
The broker uses FCFS priority scheduling to arbitrate
resource requests; the interactive e-commerce service receives a higher
priority. Figure 8 and Figure 9
shows the assigned slice sizes
for lease terms of (a) 12 emulated hours and (b) 3
emulated days, respectively. As expected, the
batch cluster receives fewer nodes during load surges in the e-commerce service.
However, with longer lease terms, load matching becomes less
accurate, and some short demand spikes are not served. In some
instances, resources assigned to one guest are idle while the other
guest saturates but cannot obtain more.
This is seen in the RMSE calculated from
Figure : the website has a RMSE of (a) 12.57 and (b)
30.70 and the batch cluster has a RMSE of (a) 23.20 and (b) 22.17.
There is a trade-off in choosing the length of lease terms: longer
terms are more stable and better able to amortize resource
setup/teardown costs improving fidelity (from
Section 5.1), but are not as agile to changing
demand as shorter leases.