Check out the new USENIX Web site. next up previous
Next: User Behavior Analysis Up: Notification Log Analysis Previous: Notification Message Size and

   
Message Popularity Analysis and Its Implications


Several studies have found that web accesses follow Zipf-like distribution: the number of requests to the ith most popular object is proportional to $\frac{1}{i^{\alpha}}$ [3,4,6,7,10,14,16]. The estimates of $\alpha$ range from 0.5 to 1 for web proxy logs [7,10,14], and range from 1 to 2 for web server logs [4,16]. It is interesting to examine whether notification messages exhibit a similar property. To do the above, we take the following approach: For each notification document, we count the number of notification messages (i.e., copies) that were sent on a given day. We plot the total number of transmissions of a document (i.e., notification messages) versus the popularity ranking of the document on a log-log scale. Figure 5 shows the plot for August 21, 2000. The plots for the other days are similar, and are omitted for brevity. If we ignore the first few notification documents and the flat tail in Figure 5 (as is done in the previous work [6,7,16]), we note that the curve fits a straight line reasonably well. We compute the values of $\alpha$ using least-square fitting, after excluding the top 20 documents and the flat tail (the latter set represents the notification documents that were sent only once or twice). The straight line on the log-log scale implies that the notification documents follow a Zipf-like distribution. We find that for our complete data-set the value of $\alpha$ varies from 1.137 to 1.267 (in Figure 5, the value of $\alpha$ is 1.146). These values are higher than the $\alpha$ in the web proxy logs [7,10,14], and lower than (but close to) the $\alpha$ observed for popular web server logs [16].
  
Figure 5: Frequency of notification documents versus ranking in log-log scale (for August 21, 2000).
\begin{figure}
\centerline{\psfig{figure=figures/zipf.notify.08-21.ps,width=2.4in}}
\end{figure}

Figure 6 shows the cumulative distribution of notification documents on August 21, 2000. The top 1% of notification documents (i.e., 1704) account for 54.24% of the total notification messages. In the logs for other days, the top 1% of notification documents account for 54.15% - 63.66% of the total messages. Such a high concentration of messages containing popular documents suggests that using application-level multicast [8,11,17,22] for popular documents would yield significant savings in both bandwidth and server load.
  
Figure 6: Cumulative distribution of notification messages to documents (for Aug 21, 2000).
\begin{figure}
\centerline{\psfig{figure=figures/cdf.notify.08-21.new.ps,width=2in}}
\end{figure}

A possible optimization is to distribute a set of caches over the Internet to form an overlay multicast tree rooted at the notification server. When a notification message needs to be sent to multiple recipients simultaneously, it can be sent over the overlay tree and also stored at the caches that it traverses. These caches can help in offloading the retransmission work (say, due to a client coming online) from the server: when the same copy of notification needs to be sent at a later time, the caches closest to the receiver can forward the message Note that even though the current notification traffic is not significant, as the popularity of notification services increases, bandwidth usage will become an important factor for scaling the notification system. Consequently, optimizations such as application-level multicast will become more important. We also observed that the concentration of notification messages to documents becomes less pronounced as the number of the documents considered increases. For example, the top 7.6% - 42.0% of the documents account for 80% of the total messages, and the top 45.1% - 71.0% of notifications account for 90% of the total messages. This implies that a large performance benefit can be obtained by multicasting only the most popular notification documents.
next up previous
Next: User Behavior Analysis Up: Notification Log Analysis Previous: Notification Message Size and
Lili Qiu
2002-04-17