USENIX 2004 Annual Technical Conference, General Track Abstract
Pp. 4558 of the Proceedings
Email Prioritization: Reducing Delays on Legitimate Mail Caused by Junk Mail
Dan Twining, Matthew M. Williamson, Miranda J. F. Mowbray, and Maher Rahmouni, Hewlett-Packard Labs
Abstract
In recent years the volume of junk email (spam, virus etc.) has
increased dramatically. These unwanted messages clutter up users'
mailboxes, consume server resources, and cause delays to the delivery
of mail. This paper presents an approach that ensures that non-junk
mail is delivered without excessive delay, at the expense of delaying
junk mail.
Using data from two Internet-facing mail servers, we show how it is
possible to simply and accurately predict whether the next message
sent from a particular server will be good or junk, by monitoring the
types of messages previously sent. The prediction can be used to delay
acceptance of junk mail, and prioritize good mail through the mail
server, ensuring that loading is reduced and delays are low, even if
the server is overloaded.
The paper includes a review of server-based anti-spam techniques, and
an evaluation of these against the data. We develop and calibrate a
model of mail server performance, and use it to predict the
performance of the prioritization scheme. We also describe an
implementation on a standard mail server.
We describe a method of counteracting problems caused to email systems by high
volumes of incoming bad mail. By bad mail we mean spam, virus-carrying email,
and undeliverable email. When there is a new SMTP connection from a mail host,
we treat messages from the server with suspicion if most past emails from the
host's IP address were bad. We delay the processing of such messages, giving
priority to emails sent from other hosts.
By analyzing data from three heavily-used mail servers in a large corporation,
we found that 90% of bad mail comes from IP addresses from which either no
messages were received in the past, or most most past messages were bad.
This method of identifying bad mail had 10% false positives. We have modelled
our system and found that it significantly improves the delivery of good mail
in times of high traffic volume.
- View the full text of this paper in HTML and PDF.
The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|