Our primary objective is to identify malicious web sites (i.e., URLs that trigger drive-by downloads) and help improve the safety of the Internet. Before proceeding further with the details of our data collection methodology, we first define some terms we use throughout this paper. We use the terms landing pages and malicious URLs interchangeably to denote the URLs that initiate drive-by downloads when users visit them. In our subsequent analysis, we group these URLs according to their top level domain names and we refer to the resulting set as the landing sites. In many cases, the malicious payload is not hosted on the landing site, but instead loaded via an IFRAME or a SCRIPT from a remote site. We call the remote site that hosts malicious payloads a distribution site. In what follows, we detail the different components of our data collection infrastructure.