To elucidate the discrepancies among different counting techniques, we now provide botnet size estimates using the different approaches discussed in Section 2. Where possible, we outline the factors that contribute to inflating or deflating the botnet population estimates derived by these techniques. For comparison purposes we analyze the traces of a large collection of botnets captured and tracked over a period of more than 9 months using a distributed data collection infrastructure. We established this infrastructure as part of an ongoing effort to study the botnet phenomenon. In short, we use a combination of lightweight responders (based on the nepenthes framework [1]) as well as deep interaction honeypots to collect malware binaries. The collected binaries are analyzed in an isolated environment to elicit any IRC related features and then produce configuration templates. These templates are used to create several customized IRC tracker instances that infiltrate the botnets specified in the collected binaries (see [14] for more details). Table 1 summarizes the data we collected, including traffic traces captured at our distributed darknet, IRC logs gathered from 472 botnet channels either visited by our IRC tracker or observed on our honeynet, and DNS cache hits from tracking 100 IRC servers for more than 45 days.
|