This class of analyses seeks to understand the statistical characteristics and trends in alert production that have been observed over various durations of time. For example, [31] offers a compendium of the trends observed in firewall and intrusion detection alert production from a sample set of over 400 organizations in 30 countries.
Source- and target-based. Given a large alert corpus, alert sources and targets may be categorized from various perspectives, such as event production patterns. Because of privacy-preserving data sanitization, geographical information and domain types cannot be inferred from the published alerts. One possible solution is to rely on self-classification and allow contributors to associate concise high-level profiles with each alert, including such attributes as country, business type, and so on (e.g., ``an academic institution in California''). This will enable some forms of trend/categorical analysis, but will also potentially make alert contributors more vulnerable to dictionary attacks.
We do enable identification of (anonymous) sources producing the greatest volume of alerts and alerts with the greatest aggregate severity. The activity of egregious sources is likely to be reported by multiple organizations, thus the corresponding address will be hashed using a universally computable hash function such as SHA-1. These sources can be blacklisted by distributing filters with the corresponding hash value. When installed, they would filter out all traffic for which the hash of the source IP address matches the provided value. There is a cost to this filtering, since it requires the firewall to hash the IP addresses of all incoming traffic to determine which ones need to be filtered out, although this may be acceptable when the network is under a heavy attack (this hashing is benign as opposed to dictionary attacks described in section 4.3). Repositories should beware of malicious blacklisting caused by the attacker submitting a large number of fake alerts implicating an innocent system.
Port/protocol- and event production-based. These analyses may offer help in understanding which kinds of reconnaissance are performed as a precursor to a larger scale exploit, or help characterize the extent to which an attack has spread.