Check out the new USENIX Web site. Check out the new USENIX Web site.

USENIX Home . About USENIX . Events . membership . Publications . Students
USENIX 2004 Annual Technical Conference, General Track — Abstract

Pp. 45–58 of the Proceedings

Email Prioritization: Reducing Delays on Legitimate Mail Caused by Junk Mail

Dan Twining, Matthew M. Williamson, Miranda J. F. Mowbray, and Maher Rahmouni, Hewlett-Packard Labs


In recent years the volume of junk email (spam, virus etc.) has increased dramatically. These unwanted messages clutter up users' mailboxes, consume server resources, and cause delays to the delivery of mail. This paper presents an approach that ensures that non-junk mail is delivered without excessive delay, at the expense of delaying junk mail.

Using data from two Internet-facing mail servers, we show how it is possible to simply and accurately predict whether the next message sent from a particular server will be good or junk, by monitoring the types of messages previously sent. The prediction can be used to delay acceptance of junk mail, and prioritize good mail through the mail server, ensuring that loading is reduced and delays are low, even if the server is overloaded.

The paper includes a review of server-based anti-spam techniques, and an evaluation of these against the data. We develop and calibrate a model of mail server performance, and use it to predict the performance of the prioritization scheme. We also describe an implementation on a standard mail server.

We describe a method of counteracting problems caused to email systems by high volumes of incoming bad mail. By bad mail we mean spam, virus-carrying email, and undeliverable email. When there is a new SMTP connection from a mail host, we treat messages from the server with suspicion if most past emails from the host's IP address were bad. We delay the processing of such messages, giving priority to emails sent from other hosts.

By analyzing data from three heavily-used mail servers in a large corporation, we found that 90% of bad mail comes from IP addresses from which either no messages were received in the past, or most most past messages were bad. This method of identifying bad mail had 10% false positives. We have modelled our system and found that it significantly improves the delivery of good mail in times of high traffic volume.

  • View the full text of this paper in HTML and PDF.
    The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

?Need help? Use our Contacts page.

Last changed: 25 June 2004 ch
Technical Program
USENIX '04 Home