Check out the new USENIX Web site. next up previous
Next: 4 Evaluation Up: ASK: Active Spam Killer Previous: 2 Background

Subsections


3 Design

We designed ASK to be simple to install and readily available to the widest possible audience. Supervisory rights or re-configuration of the mail server is not needed, allowing regular users to install the program under their home directories.

ASK was developed in Python [7], an easy to read and portable language that has been gaining a lot of popularity lately. Python is available for most Unix variants, making ASK portable across many platforms.

ASK works by reading emails from the standard input. After processing, emails can be directly stored into the user's mailbox or sent to the standard output for post-processing by other mail filters. This makes the program compatible with a number of Mail Transfer Agents and Mail Filters, like Sendmail [6], Qmail [4], Exim [12], Postfix [21] and others. Support for Procmail [19] is also embedded in the program, as well as direct delivery to ``mbox'' and ``Maildir'' style mailboxes.

Emails pending confirmation (those for which no confirmation return has been received) are stored as individual text files. The file names contain the MD5 hash sent in the confirmation, minimizing CPU utilization when matching ordinary confirmation returns. The pending mail queue can also optionally follow the Maildir format, allowing Qmail users to access and manage their queues remotely via IMAP.

ASK is normally invoked by the user's ~/.forward file mechanism or by procmail. Configuration is stored in the ~/.askrc file and all the control files and spool directories are created by default under the ~/.ask directory.

A flowchart of the program's operation can be seen on Figure 1. In the following paragraphs, numbers enclosed in squares represent references to the corresponding boxes in the flowchart.

Figure 1: ASK Mail Processing Flowchart
\begin{figure*}\begin{centering}
\epsfig{file=figures/askflow.eps, height=8.6in}\end{centering}\end{figure*}

Incoming mails are always checked against the whitelist \fbox{1}, blacklist \fbox{2}, and ignorelist \fbox{3} (in this order). The lists are implemented as text files containing a set of regular expressions. Emails can be authenticated based on the sender's email address, the recipient's email address or the message subject. Plain substring comparison is also available for simple matches.

A match in the whitelist will cause instant delivery of the email \fbox{16}. If a match is not found in the whitelist, the program tries to match the email on the blacklist and then on the ignorelist. In either case, the original email is discarded \fbox{19}, but matching the blacklist means a warning message will be sent back to the sender indicating that further emails are blocked and ignored \fbox{11}.

The next step is to check for mail bounces, or error messages sent by other MTAs \fbox{4}. Special processing takes place to prevent the delivery of every bounce sent to an invalid sender. This is discussed in more detail in Section 3.1.

ASK provides a remote queue and list management control by email. To use this feature, users must send themselves an email containing certain commands in the ``Subject'' header. ASK checks for remote commands \fbox{5} and executes the commands if appropriate. The complete set of remote commands with examples is discussed in Section 3.2.

The next step is to check for confirmation returns \fbox{6}. These messages contain a specific string in the ``Subject'' field, followed by the MD5 hash that uniquely identifies the message inside the pending queue. The MD5 hash is extracted from the incoming mail and used to form a file name that contains the original message. If such a filename exists \fbox{14}, the sender's email in the confirmation message and the one in the queued message are added to the whitelist \fbox{17}. The original message is removed from the queue and delivered \fbox{18}.

If ASK detects an invalid confirmation (for which no files exist in the pending queue) the message will be labeled as ``Invalid Confirmation'' and delivered. This measure prevents loops with other ASK users.

A special case that deserves attention is that of spam messages with the sender's address forged to be the same as the recipient. In this case, a confirmation message cannot be sent as it would in turn be sent to the ASK user. To avoid this, ASK implements the concept of a mailkey: a string or short phrase that must be present in every outgoing mail sent by the ASK user. Common choices are words or short phrases from the user's signature. ASK will first check for the presence of the mailkey in incoming mails \fbox{7}. If it is found, the email is immediately delivered \fbox{16} and processing ends. If not, ASK compares the sender's address to the ASK user's address (configured at installation time) \fbox{8}. The message will be queued with a status of ``Junk'' if a match is found \fbox{15}.

The mailkey serves a second purpose: to minimize the number of confirmations sent to replies. Ordinarily, ASK has no means of knowing if a certain message is a reply to an email sent by the ASK user or not. Most MUAs, however, quote the entire original message in a reply. This gives ASK an opportunity to detect the mailkey in replies and deliver the message without a confirmation. ASK can also be configured to automatically add the sender's email to the whitelist if a message contains the mailkey.

Sending confirmation messages to mailing-lists would be undesirable. For this reason, ASK tries to determine if a message came from a mailing-list \fbox{9} before a confirmation is sent. Mailing-list messages will be immediately queued \fbox{15} and no confirmation will be generated. The set of heuristics used to detect mailing-list and other machine-generated emails is described in Section 3.3.

At this point \fbox{10} the message has passed all the tests that could cause its delivery or dismissal:

The final step is to compute the MD5 hash and send the confirmation message to the sender. This is discussed in detail in Section 3.4.

Next, we discuss special cases and features of ASK.


1 Bounce Treatment

A Mail Bounce is a message sent by the MTA when the email cannot be delivered for any reason.

For ASK's purposes, bounces can be broken down into three distinct types: regular bounces, forged bounces, and confirmation bounces.

Regular bounces occur due to a legitimate failure in the process of sending mails whereas forged bounces are actually spam messages with the sender set to ``MAILER-DAEMON.''

There is no way to distinguish between forged bounces and regular bounces without integrating ASK with the MTA, so both bounces are immediately delivered.

Confirmation bounces are generated when an error happens during the delivery of a confirmation message. These are very common as most spammers forge their emails to contain invalid sender addresses. Confirmation bounces are of no interest to the user and are discarded. To that effect, ASK adds the X-ASK-Version header to all confirmation messages it sends. Since most MTAs quote the headers of the original message in their replies, ASK can use this to discard these bounces as invalid senders \fbox{12}. This is a workaround for those users who cannot access their MTA configuration to make themselves trusted to the system. Trusted users can configure ASK to send confirmations with an empty ``Return Path'' which instructs the receiving MTA that no bounces should be sent in case of an error.


2 Remote Commands

Remote commands are special strings embedded in the ``Subject'' header that instruct ASK to perform certain operations. Common tasks are supported, allowing regular maintenance to be performed by users without shell access. Supported operations include:

ASK offers two flavors of remote commands: text mode, which causes an editable template to be generated and delivered to the user's account or HTML mode where clickable ``mailto'' links are used in the body of the email to generate further individual emails containing commands to execute specific actions.

To avoid problems with forged email addresses, ASK never replies back to the sender when dealing with remote commands. Instead, a reply will be delivered to the user's email, set in the configuration file \fbox{13}. This reply contains an editable (or clickable if operating in HTML mode) template and some authentication tokens. Upon receipt of this second email, the requested actions will be executed.

As an example, let us suppose that user user@domain wants to request a listing of the queue by means of the ASK PROCESS QUEUE remote command. The sequence of events would be:

  1. The user sends an email from user@domain to user@domain with ``ASK PROCESS QUEUE'' in the subject.

  2. ASK detects the ``ASK PROCESS QUEUE'' subject \fbox{5} and delivers an email containing the list of queued files (including their MD5 hashes) to user@domain \fbox{12}. As a security measure, the email is not delivered to the sender but rather to the owner's email that was set at configuration time. This guarantees that only the account owner will be able to execute remote commands.

  3. The user receives the email containing the ``ASK QUEUE REPORT'' subject and the list of queued files and hashes. The email is edited by the user with the appropriate commands and sent back to user@domain.

  4. ASK receives the email, this time with the correct MD5 hashes. The commands in the email are executed \fbox{12}.

The supported remote commands can be found in Table 1.


Table 1: Supported Remote Commands
\begin{table*}\begin{center}
\begin{tabular*}{6.30in}{l p{4.50in}}
\par\hline
\t...
... to be edited
by the user. \\
\par\hline
\end{tabular*}\end{center}\end{table*}



3 Mailing-List Handling

ASK implements a generic test that is able to match most machine-generated mails including mailing-lists and other challenge-based programs.

A common approach in mailing-lists is to rewrite the sender's address or add a ``Reply-To'' header pointing to the list distribution address. Without special treatment, confirmation messages would end up being sent to the list's distribution address, causing considerable confusion.

Even though there is no official way of telling whether a message was automatically generated or not, most mailing-list managers today follow some guidelines. ASK will not send a confirmation to a message if at least one of the following criteria is met:

Messages that match one of the conditions above and are not in the whitelist will still remain in the pending queue, where they can be manipulated with remote commands. No confirmation message will be sent though.


4 Sending the Confirmation Message

Sending the confirmation is the last step when processing a message from an unknown sender. Before the confirmation is sent, ASK performs two final steps in order to avoid mail loops with badly configured auto-responders and other auto-reply services:

  1. ASK verifies if the current email is already queued by means of the MD5 hash (same hash, same email). This offers a basic degree of protection against services that send exactly the same message multiple times.

  2. A circular list is kept with the last N addresses used when sending out a confirmation message. If the current sender's email address appears more than X times in the list, no confirmation is sent. This guarantees that no more than X messages will ever be sent to the same sender in a given period of time. Incoming mails that generate confirmations will push the old ones out of the list, giving the sender a chance for more confirmations after some time. The values of both X and N are user configurable.

The confirmation message itself is a simple ASCII text email containing brief instructions to the sender and the MD5 hash in the ``Subject'' field. The confirmation has to be brief and simple or else some senders might just skip it altogether.

A typical confirmation message is presented on Figure 2.

Figure 2: A Typical Confirmation Message
\begin{figure*}\scriptsize\begin{verbatim}From: ''Marco Paganini'' <paganini@p...
...i@paganini.netHello.
This could be spam.\end{verbatim}\normalsize\end{figure*}

The MD5 hash is generated by concatenating the message text to a user configurable secret, making it impossible for a spammer to get added to the whitelist by crafting a fake confirmation.

The text is easily configurable by the user and more than one language may be present in the confirmation at the same time (normally English and the user's mother language). Ready to use templates are available in English, Spanish, French, German, Brazilian Portuguese, Dutch, Italian, and Finnish.


next up previous
Next: 4 Evaluation Up: ASK: Active Spam Killer Previous: 2 Background
Marco Paganini 2003-04-07