Taoyu Li
Minghua Chen
Dah-Ming Chiu
Maoke Chen
Tsinghua University, Beijing, China. {taoyu,cmk}@ns.6test.edu.cn
The Chinese University of Hong Kong, Shatin, Hong Kong. {tyli,minghua,dmchiu}@ie.cuhk.edu.hk
Classical queueing theory models systems where jobs arrive randomly at static service stations of given service capacities, and provides analysis of the system's properties, such as waiting time and stability. Queueing theory has wide application in many scenarios of operations research. In particular, its application in studying computer networks and operating systems led to a generalization of queueing theory to model a network of queues and many different service policies [1,2].
Recently, the modeling of peer-to-peer (P2P) systems is pointing to a new kind of queueing system not studied before. In this new model, jobs still arrive randomly, but the service stations also arrive randomly, possibly correlated to the arrival of jobs. Like classical queueing theory, this new kind of model can also help answer some fundamental questions in the design of (P2P) systems. For example, what are the necessary or sufficient conditions to guarantee the stability of a P2P system? And what would be the performance for a given workload and service parameters?
One of the first dynamic P2P models was introduced by Qiu and Srikant [3] to model BitTorrent, a P2P file sharing system. The model is simple, but very inspiring. Although they did not mention queueing theory, they implicitly modeled randomly arriving service stations (which are peers themselves) providing an effective service (file sharing) rate. Subsequently, Fan, Chiu and Lui [4] modeled and studied the tradeoff between different service rate allocations in a dynamic P2P system similar to the Qiu-Srikant model. Clevenot, Nain and Ross [5] generalized the Qiu-Srikant model fluid model to describe more realistic cases. The authors in [6] and [7] also used randomly arriving peers with certain service rate to model P2P streaming systems. There are many other examples of dynamic P2P system models, yet there has not been a unifying analysis in terms of an generalized queueing theory.
In this paper, we first develop a taxonomy and a family of notations similar to that in [1] for different variations of queueing models with server dynamics (Section II). For several basic classes of these systems, we derive the stability conditions and compare them to those results in the classical queueing theory (Section III). Next, to demonstrate the application of a P2P queueing model to real systems, we study a P2P storage system known as Wuala [8] (Section IV). We present some numerical simulation results in Section V, and conclude the work in Section VI.
Notation | Definition |
---|---|
A | job arrival process. |
B | job service time distribution. |
s | number of servers (used in traditional queuing model). |
C | server arrival process. |
E | server life time distribution. |
POLICY | the service policy assumed. |
average interarrival time between two job arrivals. | |
average job service time. | |
job load demand of the system. | |
average interarrival time between two server arrivals. | |
average server life time. | |
service capacity of the system. | |
the number of jobs in the system at time . | |
the number of servers in the system at time . |
In classical queuing theory, Kendall's notation [1], i.e., ABsPOLICY , is widely used to represent queuing model for static service systems where servers are static.
To represent new queuing models for P2P service systems in which both job and server dynamically arrive and depart, we use the following notation
To represent systems with different job and server dynamics, some notations we use for arrival processes (A or C) and distributions (B or E) are 1) for a Memoryless process (e.g. Poisson process), or an exponential distribution. 2) for a deterministic process, or a deterministic distribution. 3) for a process with general independent arrivals, or arbitrary distribution.
Notice that for or , it may be the case that server arrival and lifetime distribution has correlation with job arrival and job service time distribution. We use notation ` ' for or in the case that server arrival and lifetime distribution is identical to the job arrival and job service time distribution. We use notation for in the case that server lifetime equals to the summation of job service time and an extra period of time following exponential or arbitrary distribution, respectively.
Similarly, notations we use for service policy (POLICY) include 1) (First-Come-First-Served), which means jobs are served in their arrival order, each by all the servers currently in the system. 2) ( -Processor-Sharing), which means all jobs in the system are served, each by no more than servers simultaneously. In real systems, this constraint is often due to downlink capacity limitation of end users. Furthermore, we assume the job-server allocation is efficient so that the number of busy servers (also the total service capacity, since each server has a unit service capacity) at time is exactly the maximum allowed value .
We discuss several classes of P2P service systems as follows.
Note that if the Markov process is stable, then not only it has a stationary distribution, but also the states , will be visited within finite amount of time. Practically, this means that all arriving jobs will be served and cleared by the P2P service system in finite time.
The job-server process
of this type of
systems is a two-dimension birth-death process with infinite states.
Its transition rate is given in Table II. Since at
any time
one job is served by
servers, the job
departure rate is
, as shown in the third row in
Table II. The server dynamics, on the other hand,
does not depend on the number of jobs in the system, and can be
studied by an
queue.
A routine way to derive the stability condition is to solve the balance equations. However, for the above process, applying this approach involves solving infinite number of balance equations, each having four unknown variables. Observing that the approach is very challenging, we thus re-interpret the job-server process as a quasi-birth-death (QBD) one and proceed with a matrix-analytical method.
It is not difficult to verify that a system is a homogeneous QBD process, and its transition matrix, denoted by , is given by:
The stability condition of a homogeneous QBD process is given in [9] as follows.
(6) |
Using Lemma 3, we derive the stability condition for systems and summarize it in the following theorem.
It is not difficult to verify is the transition matrix of an queuing process, which is irreducible and positive recurrent. Furthermore, by solving balance equations, the corresponding stationary distribution is given by
The average queue length in the stationary state is .
Since diag , we have . With diag , we can derive .
Combining above observations and applying Lemma 3, we conclude that is the stability condition for systems.
Remarks: Recall the server dynamics of this type of systems can be modeled by a queue. In stationary state, the average service capacity, i.e. the average number of servers, is . Therefore, Theorem 4 states that a P2P service system is stable if and only if the average workload does not exceed the average service capacity . This result is consistent with stability conditions of classical static service systems, e.g. for systems and for systems.
It is not difficult to verify that an system is a non-homogeneous QBD, and its transition matrix is given by
By building a series of homogeneous QBD and applying Lemma 3, we have the stability condition for systems as follows.
We now show the sufficiency of the condition. We first construct two series of QBD processes, whose transition matrices are given as: for all ,
and
respectively.
For any , the two QBD processes associated with and respectively will have the same stability since their transition rates are only different in the first finite levels. Since the QBD processes associated with are homogeneous, their stability conditions are simply according to Lemma 3, where
is a constant depending on the stationary distribution .
Notice that sequence is increasing and has a limit of . Thus if we have , there must exists a that . Therefore the QBD process associated with is stable, and so is the QBD process associated with .
Now we compare and given in (8). Observing the departure rate in is larger than or equals to that in for every state, we conclude the job-server process associated with is stable. This completes the proof of sufficiency.
Remarks: Theorem 5 states that for systems, using or policy leads to the same stability condition. Intuitively, this is because these two policies both allow full utilization of the system service capacity when the number of jobs is large (i.e. when the system is crowded), and differ only when the number of jobs is small. Since the stability is determined by the service capacity utilization when the system is crowded, it is not surprising that these two policies leads to the same stability condition.
In systems, every job arrival (departure) also brings in (takes away) a server. So it is sufficient to describe only the job dynamics to represent the entire system.
The job process is a one-dimension birth-death process with transition rate given in Table IV, which is exactly same with the job process associated with a classical system. Since systems are always stable, so are systems.
For systems, since peers may stay in the system for a while to serve others after its job has been finished, the total number of servers would always be larger than the total number of jobs.
In practical P2P file downloading systems, this result indicates that as long as every joining peer brings in some service capability with it, the system is always stable.
Let us consider Wuala [8] as an example of a P2P storage system. Wuala allows users to store and share files online. Instead of relying purely on centralized (and deployed) servers, Wuala relies on peers to contribute their disk space to help provide the service. These peers may be users of the system or may be pure storage sellers who only contribute to (without using) the service. For the purpose of this modeling exercise, we assume the requests for files are independent from the peers providing the storage, who are referred to as storage peers.
To store a file into the system, Wuala encrypts it, erasure codes it into fragments, and stored the coded fragments at different storage peers. To retrieve a file from the system, Wuala first locates the fragments using a distributed hash table (DHT), then downloads fragments from multiple storage peers simultaneously. After sufficient fragments are downloaded, the file can be decoded and restored. This approach not only saves the deployment (and maintenance) costs of centralized servers, but can also provide better download rate for users, since it utilizes the upload bandwidth of multiple peers instead of sharing the upload bandwidth of a centralized server.
In studying the performance of systems based on centralized servers, we may apply classical queueing theory to determine the number of servers needed to support a certain workload. For a P2P storage system such as Wuala, there is a corresponding question of what type of online behavior of the storage peers is necessary to ensure the download request rates can be satisfied, which is exactly the kind of questions that can be addressed by the P2P queueing model proposed in Section II.
Notation | Definition |
---|---|
average online time of a storage peer. | |
average offline time of a storage peer. | |
upload bandwidth of each storage peer. | |
download bandwidth of each downloading peer. | |
average file length in the system. | |
file download request arrival rate. |
In modeling Wuala as a queueing system, we assume the availability of the file is not an issue, as it can be taken care of by sufficient redundancy. The key performance problem would be whether there is sufficient bandwidth to support the download requests.
The key parameters used in modeling a P2P storage system are listed in Table V.
If we assume server online/offline time and file length to follow
exponential distributions, and the arrival of file download requests
to follow a Poisson process, then such a P2P storage system can be
modeled by an
model, where
(11) | |||
and | (12) | ||
(13) |
Applying Theorem 5, we know that stability does not depend on , and the stability condition reduces to
The left hand side represents the total supply (of uplink bandwidth) whereas the right hand side represents the total demand of downloading requests. The insight from this result is the relationship between storage peers' online/offline time ratio and the system's capacity. The online behavior of storage peers can be adjusted by the policy for rewarding them, thus the system capacity could be controlled.
In the next section we show some simulation results which may give more insights on building P2P storage systems.
In this section, we run numerical experiments to verify the stability conditions we derive in Section III for different types of systems, and explore other system performance metrics. The value of time and parameter and are normalized so we omit the unit of them when showing the results.
We also evaluate the stability of a system with general service time distribution, i.e. an system. We use a service time distribution which is converted from a file size distribution measured from practical P2P file sharing system[10].
We fix , and , adjust so that , , , , and respectively, and the result is shown in Fig.1. From the result, it can be inferred that when , the total time a job spends in the system is bounded. However, if , the time a job spends in the system has a trend of increasing unboundedly by time. This verifies that the stability condition holds for and systems, as well as for systems.
It would be interesting to compare the performance of P2P service systems with same average service capacity (i.e. same ) but different server dynamics (i.e. different and ).
In this experiment, we fix , and adjust the value of and proportionally to form several systems with different server dynamics. All systems have the same and ; hence, they have the same average workload.
We simulate the systems with job arrivals, and record the cumulative percentage of time a job spends in the system. The initial value of and is 0 and , respectively. The results of equal to , , and are shown in Fig.2. The average and standard deviation of the time a job spends in different systems are summarized in Table VI.
These results indicate that with fixed average service capacity, a job would spend less time in systems with high server dynamics than in systems with low server dynamics, averagely. To explain this, we first define the system is in ``surplus stage'' if the service capacity is no less than workload , and is in ``deficiency stage'' otherwise. Compared to high server dynamics, low server dynamics lead to longer stay in a stage each time the system enters it but less frequent switches between stages.
In a system with low server dynamics, the queue builds up when the system is in ``deficiency stage'' before it gets clear after the system switches to ``surplus stage''. Thus the queued-up jobs spend longer time in the system. On the contrary, in a system with high server dynamics, the switching between stages is more frequent. Consequently, less jobs get queued up in the ``deficiency stage'' before the system switches stage, resulting in less time in the system in average.
Consider the P2P storage system we modeled in Section IV. Remembering that we have and , the simulation result indicates that, a system in which servers get online/offline more frequently would have a better performance, by shorter average file downloading time.
For models where job and server dynamics are identical, we show that the system is always stable. This confirms the observations in practical P2P streaming systems.
For models where job and server arrive independently, we show that the system is stable if its average service capacity is larger than the average workload. This stability condition is similar to that of static service systems. The limitation of that a single job can get service from only limited number of servers has no effect on this stability condition.
As shown in our numerical experiments, the service dynamics in the P2P service systems helps to reduce the time a job spends in the system. We plan to further characterize this effect in future work.
Since this paper is an exordial study of applying queueing model to P2P systems, we believe that there is lots of possible further work in the area. Future work directions includes study of systems with general service time distribution, system with different classes of service policy, system with different type of job/server correlation, and system with heterogeneous servers. It would also be interesting to derive more analytical results than just stability. An example would be average queue length. Applying Little's Law[1], the study on average queue length would obtain the result on average service time, which may give us more insight on the performance of P2P systems.
This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -show_section_numbers -local_icons -no_navigation p2pqueuing.tex
The translation was initiated by 3P1C on 2009-03-25