LISA '06 Paper

A Forensic Analysis of a Distributed Two-Stage Web-Based Spam Attack

Daniel V. Klein - LoneWolf Systems
Pp. 31-40 of the Proceedings of LISA '06: 20th Large Installation System Administration Conference
(Washington, DC: USENIX Association, December 3-8, 2006).

Abstract

Open mail relays have long been vilified as one of the key vectors for spam, and today - thanks to education and the blocking efforts of open relay databases (ORDBs) - relatively few open relays remain to serve spammers. Yet a critical and widespread vulnerability remains in an as-yet unaddressed arena: web-based email forms. This paper describes the effects of a distributed proxy attack on a vulnerable email form, and proposes easy-to-implement solutions to an endemic problem. Based on forensic evidence, we observed a well-designed and intelligently implemented spam network, consisting of large number of compromised intermediaries that receive instructions from an effectively untraceable source, and which attack vulnerable CGI forms. We also observe that although the problem can be easily mitigated, it will only get worse before it gets better: the vast majority of freely available email scripts all suffer from the same vulnerability; the load on most proxies is relatively very low and hard to detect; and many sites exploited by the compromised proxy machines may never notice that they have been attacked.

Introduction

As new defenses against spam are developed, spammers have been forced get more inventive in their attacks. Much as antibiotic resistant bacteria force the development of new drugs, so do spam-resistant mail servers drive the spammers to continually create new methods of disseminating their advertisements. A modern anti-spam arsenal provides defense with a combination of Bayesian filters, keyword matching, fuzzy checksums, header and source IP checks.

Spammers have responded with text that defeats filters, omits key words with unusual spelling or use of graphics, random strings to confound checksums, and more and more legitimate-looking headers. The one thing that can truly defeat spammer is to deny them the means to send their email. Since recent legislation has made it undesirable to have a traceable source IP address (since getting caught can now mean stiff fines and/or jail time[Note 1]), spammers used open relays in foreign countries as an indirect (and difficult to trace) means of promulgating their wares.

Indeed, for a while it was thought that closing off open relays and/or denying them the ability to send mail might be a solution. Although it took time to spread, updates to the default configurations of mail transfer agents (MTAs) went a long way towards alleviating the open-relay problem, while education through a distributed denial of service (in the form of ORDBs) have made open relays, if not a thing of the past, at least a non-problem.

However, open relays fall into two classes: MTAs and MUAs (mail user agents), and the ORDBs only deal with the former class. MUAs are often thought of as applications that send mail for an interactive user, but they can also be applications that send mail on behalf of a script. In this case, the application is a CGI script, and an improperly configured CGI script can be exploited to send a payload to locations that were not envisioned by their designers.

Vulnerability Summary

The vulnerability in CGI scripts that send email is easy to spot, and trivial to fix. It may best be summarized in the following highly simplified example (see Example 1).

<FORM ACTION=mail.cgi>
Your name:
<INPUT TYPE=TEXT NAME=Who>
<BR>
Your email address:
<INPUT TYPE=TEXT NAME=Addr>
<BR>
Your problem:
<TEXTAREA NAME=Complaint>
Type your complaint here!
</TEXTAREA>
<INPUT TYPE=SUBMIT>
</FORM>

#!/usr/bin/perl
use CGI ':all';
import_names;
open MAIL "| sendmail -oi -t";
print MAIL <<"==END==";
To: dvk\@lonewolf.com
From: "$Q::Who" <$Q::Addr>
Subject: Whine whine whine...
$Q::Complaint
(whew)
.
==END==
close MAIL;

Example 1: CGI Form and CGI Script.

At first blush, the form and the CGI script look innocuous enough. A surfer can enter a single line of text for their name, another for their email address, and a multi-line complaint. The script then sends an email from the user to a fixed address (in this case, mine), and lets me know their complaint. Compliant browsers all provide this functionality, but the problem is not in browsers, but rather in the fact that it does not take a browser to surf the web! Surely the notion of robots is no mystery (all of the search engines use them to harvest information from the web), so it should likewise be no surprise that a robot can submit forms. Combine this with a fundamental feature of the HTTP and SMTP protocols, and you have a trivial means for a spammer to abuse this simple form.

Although browsers only allow a single line of data to be supplied for an input item of type ``text,'' the HTTP protocol allows arbitrary text, and robot can easily provide that arbitrary text. When connecting to Sendmail (and other MTAs), mail headers are separated from the body of the message by a blank line. So if a legitimate surfer enters ``Dan Klein'' for the Who field, ``dan@klein.com'' for the Addr field, and ``This form\nis really dumb!'' for the Complaint field, then the following would be handed to Sendmail:

           To: dvk@lonewolf.com
           From: Dan Klein <dan@klein.com>
           Subject: Whine whine whine
           This form
           is really dumb!
           (whew)

Notice that the newline in the Complaint field is accurately translated when sending the email. So now consider that a spammer has instead supplied the value in Example 2 for the Addr field. Here is what is handed to Sendmail:

         To: dvk@lonewolf.com
         From: Dan Klein <dan@klein.com>
         Bcc: your@address.com, target@victim.com,
            another@mark.com, more@suckers.com
         Subject: Increase your pennies
         Get bigger pennies! The 19th century
         English pennies are HUGE! Impress
         your woman!
         >Subject: Whine whine whine
         This form
         is really dumb!
         (whew)

         dan@klein.com>\nBcc: your@address.com, target@victim.com,\nanother@mark.com,more@
         suckers.com\nSubject: Increase your pennies\n\nGet bigger pennies! The 19th century\n
         English pennies are HUGE! Impress your woman!\n\n

Example 2: Nefarious text handed to the complaint field.

Since the spammer cannot see the script (only its results), s/he must treat it as a black-box, making a number of attempts until the proper formatting is discovered. Notice the careful placement of the `>' character to guarantee (in this case) that Sendmail properly parses the input. The spammer typically creates a short-term-use email account and uses that address as a test target, trying a collection of well-known formats with specially coded messages with an annotation (representing the format type) included. Once one email is received at the test address, the spammer knows from the annotation the correct format to use.[Note 2] Suddenly the innocuous email form is a vehicle for sending spam!

In retrospect, I should have noticed the problem with my form when the spammer first tested their script. I received a short collection of very similar spam messages from all over the world, but like most people, I simply deleted them. Initially, I received them because the spammer had not yet determined how to properly exploit my script (for example, they had appended the Bcc addresses in the Complaint field).

After the exploit was fine-tuned to match my script, I received approximately 5,000 spam emails (because in my script, I was listed as the To address). Each of the messages were also Bcc'd to 380-390 other addresses. Had I not detected the attack, this ultimately would have resulted in close to 2,000,000 spam messages being sent from my mail server. It is interesting to note that my desktop spam blocking software helpfully deleted most of the spam messages addressed to me. Thus I never knew I was being attacked - at least until I checked my various RRD logs, as shown below.

The Larger Problem

A very quick survey using Google reveals the severity of the problem. Searching for ``email cgi script,'' ``Perl CGI email'' and ``Perl email script'' yielded the expected plethora of results, but a union of the top-10 of each search yielded 21 unique pages. Of these 21 scripts, one third suffered from the vulnerability described above:

7 scripts had obvious flaws that allowed spam email exploitation of the type seen in the example above.
- One of the flawed scripts was #1 on a particular Google results page.
- Another of these scripts ``sanitized'' the user data by removing brackets and shell wildcard characters from mail headers, but failed to remove line breaks!
- A third script removed all instances of ``\r\n'' followed by removing all ``\n'' characters. This allows a spammer to use ``\n\r'' as a line break, leaving a bare ``\r'' after the so-called sanitization is performed.
3 scripts had code to successfully prevent spam email exploitation.
2 were at perl.com (and thus were assumed by this author to be correct).
The remainder of the links were located on pay-sites, were too convoluted to search, or had bad pointers to scripts.

Matt's Script Archive (a tremendously popular website for perl scripts) provided a secure script, but looking at his change log, I found that the script was only secured in August of 2001. Although he urges current users to upgrade to the secured version, the site also has the following note: ``This script has been downloaded over 2 million times since 1997.'' I noted that other publicly available scripts were based on the buggy versions of Matt's script, and hesitate to guess how many of the buggy derivative scripts are still in use throughout the world.

At least half of the most likely-to-be-copied scripts were sufficiently flawed as to allow easy exploitation by a spammer. A random sampling of other email scripts listed later in Google's results indicated that on the order of half the scripts were equally vulnerable.

Solution

Fortunately, there is a trivial fix to the problem. An obvious (yet regrettably often overlooked) rule of thumb is ``never trust your users'' - especially when those users are unvetted and/or from the web. A simple rule that the script could follow is: ``don't trust user input,'' and simply sanitize the user data to accomplish this. Not allowing any user data in email headers is the best solution - fixed header-content guarantees a known result (using Sendmail's -oi switch is also highly recommended, or whatever the equivalent is in other MTAs). However, since it is also convenient to be able to reply to emails from forms (and thus allowing the user to specify input that will ultimately be placed in email headers), a less draconian solution is to simply remove newlines (or more generally, replace all contiguous whitespace with a single space character) in any user input that may find its way into an email header. Either action will fix the vulnerability and allow a user to Reply to form-based email.

There is no need to go to the extra effort of determining if the user has supplied a valid email address.[Note 3] Sendmail will simply fail to deliver mail with bogus From, To, Cc, or Bcc headers, so in general it is better to leave the address parsing to the program most responsible for it.

Discovering the Attack

In addition to reading my email, one of my morning rituals is to scan the SNMP statistics from the various machines on my network. A single click in my tabbed web browser brings all this data readily to hand.[Note 4] These include critical statistics such as network load, free disk space, web server statistics and email traffic, as well as more incidental data including load average, number of active users and processes, machine uptime, and NTP synchronization data. Any anomalies can thus be quickly investigated, and problems can be readily averted or remedied.

My network consists of my laptops and desktops, a web server, an email server, a backup server, a pair of name servers, and a semi-public wireless network. Figure 1 shows the overall traffic on my T1 serial line. The pattern of activity is not particularly noteworthy, and can readily be accounted for by a newly popular page on one of my websites, or one of my wireless customers uploading a particularly large file.

Figure 2 (available to me on the same dashboard) shows network traffic broken down by LAN. Although a second look showed me that the primary load was on the downstairs network (where the mail and DNS servers are located), initially this graph did not look at all ominous. The spikes at 03:50 and 04:20 are normal for a Saturday morning - they result from an upstairs machine performing a remote disk-to-disk backup to the downstairs backup server.

Figure 3 is a more detailed view of the downstairs network, and shows that the backup server (called cow) receives backup data from upstairs, and additionally gets backup data from maxwell at 03:30 and samantha at 05:30. But more ominous (again, only in retrospect) is the outbound data from maxwell (which is also my mail server) starting at 06:00, and the same time, the relatively large amount of output from ns. Typically, the name server accounts for a negligible load, and is all but imperceptible in the graphs.

Indeed, except for data recognized in hindsight, there was nothing to indicate that anything was amiss. The network traffic was cursorily ``normal'' (although the traffic from the name server was ``unusual''), and if network traffic had been my only critical indicator, a quick visual survey would have raised no red flags.

I had received a number of odd mail messages from the script that was being attacked, but I wrote those off as ``someone messing with the site'' (and was unconcerned given that I thought my scripts were secure). I also received one email message for each of the actual attack messages (since the script was designed to send me that mail). However, after the first few, my desktop spam filters ``helpfully'' blocked those messages, and except for the SNMP data, I was otherwise unaware of the thousands of spam messages being sent to AOL from my machine.

What ultimately triggered further analysis was the graph of Apache states, found in Figure 4 (however, this is only after first noticing the highly unusual mail statistics in Figures 10 and 14, below). Mine is a lightly loaded web server, with MinSpareServers and MaxSpareServers set to 15 and 25, respectively. Typically fewer than a half-dozen servers are active serving requests. Thus when Apache reported 40 to 80 active processes for a span exceeding an hour, I determined that something was amiss.

It is especially noteworthy that there was no surge in the requests made of the webserver, nor in the amount of network traffic that that the webserver was generating (seen in Figures 5 and 6). Thus each writing process (shown in Figure 4) was taking a lot of time to perform its job, but did so without generating a lot of data. Since Apache does not block on locked files, the only reasonable explanation was CGI script execution, where each script was taking a long time to complete.

Figure 1: Overall network traffic.

Figure 2: Internal network traffic by LAN.

Figure 3: Internal network traffic on the downstairs LAN.

Indeed, as Figures 7 and 8 show, there were a large number of active processes. The load average on the machine (which is typically below 1.0) surged to a high of 16 before Sendmail's RefuseLA safeties cut in, and the number of active processes nearly quadrupled from 200 to 700.

Figure 4: Apache States.

Figure 8: Number of Processes Active on Mail Server.

Figure 9 shows the load on the name server machine. Typically, it putters along with a load average of 0.1 or less, but the large volume of email consumed nearly all of the resources of this machine. This small machine has a single purpose, and under ordinary conditions adequately services 15 internal hosts (all of whose webservers are configured to do reverse lookups), and is the authoritative name server for 185 domains. Therefore the load on the name server presented by the spam attack is vastly in excess of any normal load experienced.

The volume of mail delivered (and deferred) by the attacked webserver is shown in Figure 10. From the start of the attack, the messages actually delivered (labeled ``forwarded to a remote client'') steadily climbs until it reaches a peak of nearly four messages per second. At approximately 08:00, AOL graylisted my server, the deliveries dropped, and the number of deferred deliveries grew steadily.

Figure 9: Load on Name Server.

Figure 10: Mail Deliveries Made by Attacked Host.

The peaks in deferred deliveries seen every 30 minutes are Sendmail re-running the queue (because the -q30m option was specified at startup). I estimate that between 15,000-20,000 pieces of spam mail were sent until AOL graylisted my server. As will be seen below, this is a fraction of the potential impact on both my server and AOL. However, mine was but one of potentially hundreds of thousands of vulnerable email scripts.

Figures 11 and 12 show the number of message in the outbound mail queue, waiting to be sent. The number climbs steadily following the onset of the attack, and peaks at around 07:45, when the attack ceased (the precipitous drop off at 12:30 is due to my manually purging the queue). These numbers may appear small to someone who is used to running a larger site, but the sparseness of these numbers is misleading.

Figure 11: Number of messages in the outbound mail queue.

Figure 12: Volume of message in the outbound mail queue.

At its peak, the outbound mail queue contained 5,000 messages, with 80 MB of data. This may seem like a pittance, except when you consider that each message was typically addressed to 380-390 victims, so a queue size of 5,000 messages represents nearly two million email targets. For a medium to large host, this additional traffic might go completely unnoticed, and I posit that the problem of unprotected scripts is far larger than we suspect!

It is essential to note that no one bit of data would have triggered my awareness of this spam attack (and the search that followed). It is only in aggregate that the data indicated something was amiss. In fact at some sites, an increase in queue size such as this might only suggest a stuck queue, and merely prompt a restart of the mail server. Likewise, if only the queue volume was used as a metric, this graph could easily have been confused as a user emailing a large file (such as a movie or collection of photos) that was not being accepted by the remote site. Plus, 80 MB of outbound mail is a pittance for many ISPs.

Because my desktop spam filters deleted the large faction of confirmation and spam bounce emails addressed to me, the volume of these messages was not immediately obvious. However, as Figure 13 shows, the number of email messages that were received (and automatically deleted) is greatly in excess of any ``normal'' traffic that my desktop would otherwise see.

Figure 13: Mail traffic to my desktop.

Yet what is perhaps most interesting is the spam statistics for the machine that was attacked. As you can see in Figure 14 (which tracks types of spam received), there is nothing to indicate that spam is being sent. During the attack, the spam ratios and rates look completely normal (the spike of ``unknown users'' at noon is a different, but unrelated attack).

Figure 14: Spam detection.

However, Figure 15 is very interesting. This graph tracks the ratio of spam to ham (and also tracks that which is unknown). Surprisingly, the amount of ham goes up during the attack. This is for two reasons. The first is that the exploited script will still send an email to the originally intended email address within our site. This is definitely considered ham, because it originates inside our network and is targeted to another machine inside our network. The second is because the bounced emails (that fail to reach their intended target at AOL) are unknown. They cannot be classified as either spam or ham, so again the ratio of spam to ham goes down. It was ultimately this graph that caused me to look more closely at what was happening, since nothing (at the time) could immediately explain that erratic behavior.

Figure 15: Spam to Ham Ratio.

Analysis of the Attack

The fact that I was attacked (and the solution to the vulnerability) was now obvious. The biggest problem I still had is that I didn't know exactly how I was attacked. I know the reason (sending spam email), the target (a few million email addresses), the vector (my faulty CGI script), but not the source! In the 24-hour period with the most accesses of my vulnerable script, my web logs show a collection of 259 source-IP addresses scattered around the world.

Since web-servers report on the source-IP of the host that made the connection, my initial assumption was a collection of virus-infected Windows machines were attacking me, coordinated through some central database server. These compromised hosts could likely comprise a bot-net, but a problem exists with this hypothesis - a number of the machines were UNIX- or Linux-based machines! I attempted to contact a web server port 80 for each of the 259 machines. 27 of these identified themselves as UNIX/Linux Apache servers, 13 as Windows IIS servers, 6 as Squid proxies, and 6 as other web servers (the remaining 207 machines did not respond to queries on port 80).

I also considered that the attacker was using forged source IP addresses, but this possibility was quickly ruled out. Certainly, the SYN packets could have come from a forged address, but in order to make a request from my web server, the 3-way TCP handshake needs to be completed. The only way this could be done is for the attacker to be in the routing path of the forged address, and to make sure that the real machine does not respond to any response packets. While theoretically possible, this type of attack is highly unlikely, given the wide distribution of IP addresses.

My next thought was that these machines had open web proxies on them, and this seems the most likely explanation. Using nmap, I scanned each of the machines involved. Since my probes were done some months after the attack, it is very likely that some of the machines were either reprovisioned or reconfigured,[Note 5] and approximately 45% of the machines probed were reported as ``down'' (some of these responded very erratically, with long connection delays that often timed out). Of the remainder I determined that a large number (approximately 50%) of the machines were currently running web servers or proxy servers on ``obvious'' ports (80, 3128, 800x, or 808x). Some of the proxy servers currently required authentication, and although I did not have the proper credentials, it is highly likely that the spammer did. Sniffing software is readily available to find proxy passwords on unencrypted networks. There are also hundreds websites listing ``free proxies,'' and hundreds of others that share passwords - the combination of services is equally likely. I also have no doubt that some of the machines were running proxy servers on unobvious ports as a result of a virus or trojan with an HTTP proxy payload

Finally, I considered compromised logins, especially for those machines running UNIX or Linux. I found that approximately 25% of the machines had open ssh or telnet ports, so this hypothesis is possible, but it seems much more likely that open proxies are the vector for the attack.

In general, the attacker was smart. A small number of test emails were sent, and once a successful exploitation was found, a relatively small number of spam messages were sent (only a couple of thousand), and then the spammer moved on, presumably to another victim.

Because the attack was distributed through a collection of proxies, it is very difficult to determine the source of the attack. To do so would require accessing the proxy logs from a collection of the participating machines (which would require international cooperation from people who may not even know that they have proxy running), and this effort might in turn discover that the participating proxies were contacted by other proxie to further obscure the attack. And although this might eventually lead to a single source ISP, all that we would discover would be a temporary AOL account (to verify that the test emails were received), used only for a day or two and probably created using a forged or stolen credit card.

Further, although the spammer used my script and MTA to send approximately two million spam emails, the volume was not so large that my MTA was swamped, nor would the load have persisted for more than a few days. For a larger ISP, the excess web and email load might even go completely unnoticed![Note 6]

Once the script was used to send emails, renewed probes occurred daily for a few days as two to five tests were sent per host. Since I had by this time fixed the bug in the script, this probing gradually trailed off for the month after the initially discovered attack. However, it seems that the attackers are not fully coordinated, as my script is still probed a few times almost every day from locations around the world. I am ``on their list,'' and it seems that it will take quite some time before I am removed (if ever).

Lessons Learned

A number of lessons can be learned from this attack. Many of them are ``obvious,'' in that they are things that I should have learned and taken to heart ages ago. Experience is recognizing a mistake when you make it the second or third time, and I have plenty of that. Security, on the other hand, is not allowing oneself to make the same mistake the second time! I pass on my experience in the hope that my readers may have better security.

Audit your scripts. It is far too easy to fall into the trap of thinking ``if it is available on the web, it must be good.'' The truth is of course ``not by a long shot,'' but internally developed scripts are equally vulnerable, and should also be audited. If you have a colleague who can audit your scripts, have them do so. But even a lay person can help you audit. I call this the ``mannequin theory of programming.'' If I take the time to explain my code to a total dummy, then I slow myself down enough to think about what might go wrong (and can then make notes to fix it).
Notes are useless without a corresponding action. ``I'll get to it'' has severe consequences when you don't get to it! Once you find bugs, fix them!
Finding a bug in one script probably means that the same bug exists in others. When you find a bug, exhaustively search your site(s) for scripts which have similar behavior to see if the mistake has been repeated.
Audit your configurations. Most UNIX/Linux systems have Apache and Squid trivially available to them, and both can be (mis?)configured as an open reverse proxy. While I will not engage in the privacy and anonymity arguments that proxy use can engender, I will only note that with every freedom there is a concomitant potential for abuse of that freedom. If you intentionally run a proxy, make sure it does what you intend it to do.
Monitor your network. Daily dashboards are great for detecting anomalies, but beware of slowly changing things, too. A slowly filling disk (due to non-deletion of old log files) wasn't noticed for a year, because I only looked at daily, weekly, and monthly logs (and only the yearly log showed the slow trend)!

Don't rely on automated detection. Tools like IDS systems are only as good as their processing rules (that in general rely on past experience), and they tend to discourage vigilance and encourage laziness. For example, would cfengine have found this attack? In a private communication, Mark Burgess (author and designer of cfengine) only says ``maybe.'' More accurately, he said:

The cfenvd daemon would certainly have noticed the rise in SMTP, DNS, processes etc., but the question is - would your monitoring policy have bothered to report it to you? In cfengine, you have to tell it what you want to hear about. The default is always - nothing! Cfengine does a kind of ``lazy evaluation'' on the data. If you ask to see anomalies in, say SMTP, you can also ask it look more closely at the distribution of IP sources. It uses the ``informational entropy'' (i.e., how sharp the distribution is) as a policy parameter. So you could say ``tell me only about low entropy SMTP anomalies'' (meaning tell me only about SMTP anomalies that come from one or proportionally few IP sources - a sharp attack from a single source). Or you could say ``show me all of them,'' or ``high entropy'' (wide range of sources). So as with all systems, it depends on how you had it configured.

Know what you are looking at (and looking for). If you don't know your data, then it is that much harder to recognize an anomaly. In my case, a single anomalous graph would not have triggered my search. Instead, it was the correlation of a number of unusual bits that did so. Your data should provide specific information as well as the overall gestalt of your network.
Know your systems and your configuration, and where possible, leave yourself clues. Troubleshooting is as much an art as it is a science, and the more you know about your environment, the more likely you are to inuit the source of a problem. In my case, I immediately recognized the 10-year old script I had written from the text of the email messages it was sending. However, even if I hadn't, I had unique text in every email (tagged to the script) that would have enabled me to quickly grep for the offending script. An even better clue would be to put both the URL and the location of the file on your server in the script output.

Author Biography

Daniel Klein has been teaching a wide variety of Unix-related subjects since 1984, and has been involved with Unix since 1976. His experience covers a broad range of disciplines, including the Internals of almost every Unix kernel released in the past 30 years, real-time process control, compilers and interpreters, medical diagnostic systems, system security and administration, web-related systems and servers, graphical user interface management systems, and a racetrack betting system. He contributes regularly to the proceedings of the USENIX Association, and is also their tutorial coordinator. He holds a Masters of Applied Mathematics from Carnegie-Mellon University in Pittsburgh, and in his free time is director of an a cappella group and a member of an improvisational comedy troupe.

Conclusion

Spam is a problem that is going to plague us as long as we provide a means for spammers to send their messages. Crime may not pay, but spam certainly does (it has been estimated that Jeremy Jaynes was earning $400,000 to $750,000 per month). While providing an open mail relay or an open web proxy may be viewed as an exercise in free speech, CGI scripts which allow email to be sent to any address other than the intended one can only be viewed as software with a bug. Fixing this particular bug is relatively easy, and has an immediate beneficial effect. While the overall impact of fixing this vulnerability on the volume of spam can only be speculated at, it is unquestionable that the volume will be reduced by doing so.

Footnotes:
Note 1: On September 5, 2006 the Virginia Court of Appeals (Record No. 1054-05-4) upheld the felony conviction of Jeremy Jaynes under the state's anti-spamming law (18.2- 152.3:1). The sentence (which was affirmed in the appeal) was nine years imprisonment.
Note 2: Some of the test emails also include the correct use of MIME headers and separators, some of which (through multipart/alternative) appear to be specifically calculated to hide the original legitimate contents of the form-generated email.
Note 3: A myriad of email form scripts get this wrong by forbidding `+' or other legal address characters. For a complete and correct regular expression for parsing email addresses, consult Jeffrey Friedl's excellent book ``Mastering Regular Expressions'' (ISBN 0-596-00289-0), or look at https://examples.oreilly.com/regex/email-opt.pl
Note 4: I use Cricket (https://cricket.sourceforge.net/)to collect the data, RRDTool to store it, and drraw (https://web.taranis.org/drraw/)to display it in ``dashboards.'' Drraw is indispensable, because it allows unrelated data to be collected in a single view.
Note 5: A webmaster that I contacted directly indicated that on the date of the attack he indeed had had a proxy running, but had since taken it offline due to ``configuration issues.''
Note 6: The open proxies are perhaps not as lucky. Some of the open proxies I examined had response delays of 45 seconds or more. While they may be intentionally throttled to prevent abuse, they are more likely seriously overloaded by that selfsame abuse.

Last changed: 1 Dec. 2006 jel