We provide an estimate of the prevalence of web-malware based on data
collected over a period of ten months (Jan 2007 - Oct 2007). During
that period, we subjected over 60 million URLs for in-depth processing
through our verification system. Overall, we detected more than
million malicious URLs hosted on more than
thousand landing
sites. Overall, we observed more than
thousand different
distribution sites. The findings are summarized in
Table 1. Overall, these results show the scope of the
problem, but do not necessarily reflect the exposure of end-users to
drive-by downloads. In what follows, we attempt to address this
question by estimating the overall impact of the malicious web sites.
|
To study the potential impact of malicious web sites on the end-users,
we first examine the fraction of incoming search queries to
Google's search engine that return at least one URL labeled as malicious in the
results page. Figure 3 provides a running average of
this fraction. The graph shows an increasing trend in the search
queries that return at least one malicious result, with an average
approaching
of the overall incoming search queries. This
finding is troubling as it shows that a significant fraction of search
queries return results that may expose the end-user to exploitation
attempts.
![]() |
To further understand the importance of this finding, we inspect the
prevalence of malicious sites among the links that appear most often
in Google search results. From the top one million URLs appearing in
the search engine results, about
belong to sites that have
been verified as malicious at some point during our data
collection. Upon closer inspection, we found that these sites appear
at uniformly distributed ranks within the top million web sites--with
the most popular landing page having a rank of
. These results
further highlight the significance of the web malware threat as they
show the extent of the malware problem; in essence, about
of
the top million URLs that appeared most frequently in Google's search
results led to exposure to malicious activity at some point.
An additional interesting result is the geographic locality of web
based malware. Table 2 shows the geographic breakdown of
IP addresses of the top 5 malware distribution sites and the landing
sites. The results show that a significant number of Chinese-based
sites contribute to the drive-by problem. Overall,
of the
malware distribution sites and
of the landing sites are
hosted in China. These findings provide more
evidence [13] of poor security practices by web site
administrators, e.g., running out-dated and unpatched versions of
the web server software.
Upon closer inspection of the geographic locality of the web-malware
distribution networks as a whole (i.e., the correlation between
the location of a distribution site and the landing sites pointing to
it), we see that the malware distribution networks are highly
localized within common geographical boundaries. This locality varies
across different countries, and is most evident in China, with
of the landing sites in China pointing to malware distribution servers
hosted in that country.