Return to homepage
Most users of e-mail are troubled by SPAM, but precautions are possible, some of them very simple and cost-free. Ian Hickman explains.
SPiced hAM was a great invention and is very handy if you want a quick meal on your plate. But the modern digital version, "spam", is something no one wants on their computer. The scourge is large and growing, in every country. In May 2003 the network in the United States was burdened with amount of spam which exceeded non-spam traffic for the first time ever, and where the U.S. leads, the U.K. and the rest of the world follow, sooner or later.
It is, of course, argued by the spammer that spam is simply advertising, which may bring to your attention goods or services that you would like to know about and purchase, but of the availability of which you would otherwise have been unaware. However, the objectionable aspect of spam is its indiscriminate distribution. If you are a gardener, then you will find advertisements for seed, spades, compost, derris dust and the like in your favourite gardening magazine entirely appropriate, even if you don’t want to buy any of those items just now. But you would not want, nor expect to see, adverts for spinnakers or depth-sounders. An example of just how annoying inappropriate spam can be was forced on my attention recently - an e-mail arrived in my inbox inviting me to view a pornographic movie which, to judge from the title, was singularly unpleasant and depraved. Had I installed one of the various filters available, it would have been blocked, but under my usual routine, which I possibly neglected on that occasion, and of which more later, it would probably never have reached my inbox.
The popularity of spam as a means of advertising is simply due to the economics of the game, as explained in Reference 1. In the U.S., a spammer can buy a list containing several million addresses for less than $100. Suppose he spends a couple of thousand dollars on e-mailing 20 million messages extolling his sixty-dollar product, then with a mere 34 purchases he breaks even, and thereafter is in profit. This is a response rate of less than 0.0002%, whereas a direct mail advertising campaign needs a response rate in the region of one and a half or two percent to cover its costs.
So just where do spammers get their lists of addresses from? Professional readers of this magazine will probably also receive several "freebie" electronic magazines, ones with no cover price, being supported entirely by advertising. Like many other magazines, advertisements etc, one is encouraged to supply ones e-mail address when replying to advertisers. I know from experience that these magazines sell their address lists to one another, with the result that one can receive duplicate copies with identical address details – the other day I actually received four identical copies of one such magazine! Whether the e-mail addresses are for sale or not I cannot say, but certainly the sale of circulation lists is widespread, despite the efforts of an audit bureau. However, technology provides an alternative way of discovering email addresses (Reference 2). This is the "dictionary attack". A spammer, or someone wishing to generate lists of e-mail addresses for sale, opens a connection to a server, and using software specially designed for the purpose, sends millions of randomly generated e-mail addresses. The vast majority are of course unknown, and - conveniently for the spammer - flagged as such by the server. Those not meeting with such a reply are added to the list. Some modern mail servers are programmed to look for large numbers of messages to unknown e-mail addresses all from the same source, but on a large server, such a "dictionary attack" may simply be lost in the vast amount of traffic handled. In January, attacks on two well-known servers, emanating from servers in Beijing which were being operated from the United States, were announced. These attacks had been in progress for months, and must have accumulated huge numbers of e-mail addresses. The victims will never know how their e-mail addresses were obtained.
In the past, the small amount of spam I received was easily dealt with, but that blessed state is fast disappearing. What can one do about spam, if, as in many cases, the sheer amount is an intolerable burden? The traditional approaches are to reject any messages from known spammers or the mail servers that they use, and/or to reject any messages containing words typical of spam, such as "viagra", "low interest rates" etc. But these are of limited use. Spammers counter these ploys in various ways, such as changing their addresses or the servers they use. And simple keyword filters tread a fine line between letting spam through with the wanted messages if too gentle, or if more aggressive, throwing out the baby with the bathwater.
In the last year, more sophisticated spam filter programmes have become available, based on the probability theory elaborated by the 18th century mathematician Thomas
Bayes. A tabular list of open source Bayesian filtering software is given on page 42 of Ref.1, with the URL of each, details of the language in which it is written and a comment - such as "Originally written by open-source guru Eric Raymond". This type of code is being incorporated in products from Microsoft, Netscape and other sources. The URLs are given here as Appendix 1. A Bayesian filter looks at the whole message, rather than simply looking out for keywords. Following training - you go through your inbox and tell it which messages are spam and which not - it forms rules for itself, which it applies when scanning new e-mails. These are subsequently refined each time you rescue an e-mail the filter has wrongly assigned as spam, or tell it that something it let through really is spam after all. The great worry about spam filters is the "false positive" problem; when a message you would like to receive gets discarded. One expert in the field claims that, even when set for zero false positives, the best Bayesian filters can still discard 99.5% of spam.
But filtering at the recipient’s level does not help the ISP. Spam is as unpopular with Internet Service Providers as it is with the unwilling recipients, since it uses up a lot of an ISP’s bandwidth and disk space. So most ISPs now use server-based software to filter e-mail before moving it around the network to individuals’ inboxes. These spam filters, of various sorts, can be quite effective. For example, AOL deletes about 2 400 000 000 items of spam daily, representing about 70 messages aimed at each of its customers inboxes each day. It is, of course, just possible that an e-mail you sent never arrives at its intended destination, having been erroneously classified by an ISP as spam.
The financial burden on the I.T. industry is great. The EU’s Commissioner for Enterprise and the Information Society says research has shown that spam costs European businesses 2.5 billion euros at present, while the estimated cost in 2003 to US-based ISPs and corporations is $10 billion. Part of the problem is that spammers go to great lengths to disguise their messages as legitimate traffic, and to disguise their identity and location. In the UK, the London-based non-profit anti-spam organisation Spamhaus produces an hourly updated block-list of sources of spam, which is used to protect the accounts of over 140 million e-mail users. The American anti-spam organisation Brightmail runs a network of e-mail addresses as deliberate "victims". When a message is caught on a number of such addresses, it is likely to be spam and filters at the server can be updated to block the source. Brightmail protects the broadband customers of BT Openworld amongst others, and updates its filtering rules every five to ten minutes. Spam is not illegal – yet, although Microsoft, amongst others, has taken legal action against specific spammers. In Europe, there is a Privacy and Electronic Communications Directive, under which governments are required to implement anti-spam legislation by 31st October 2003, and similar legislation is in the pipeline in the United States. How effective it will be, remains to be seen. In Europe, the intention is that a spammer will only be able to use e-mails for direct marketing to individuals who have signified their willingness to receive it, the "opt-in" strategy. In the USA, where the direct marketing lobby is powerful, legislation will probably require individuals to "opt-out" from each source, effectively still allowing bulk mailing or spam. But whether you live in the US, UK or elsewhere, never accept a spammer’s invitation to unsubscribe from his site, by replying "remove" or in any other way. To do so, alerts the spammer that your e-mail address is not only "real" (not rejected as unknown by your ISP’s server), but also "active", i.e. visited by its owner. Lists of active e-mail addresses are much more valuable than others, and the spammer can sell such a list, at a huge profit, to other spammers, who will in turn themselves sell it on. A real e-mail address may be inactive for many reasons. The owner may have activated a second e-mail address with the same ISP, or changed to another ISP, e.g. to upgrade to broadband, or just to get away from spammers. If you have a bad spam problem, you can always change to a new e-mail address and only notify your family and friends, legitimate business contacts and firms you want to deal with. Also, if you have your own website, consider very carefully before including your e-mail address on it. **** You are making a gift to spammers and sellers of lists, who have software that crawls the web, looking for "*@*.*" where * is any string – almost certainly an e-mail addresses. Likewise, when contacting e-mail boards and newsgroups, use a different e-mail address from your main one. If, for business reasons, you wish strangers such as potential customers to be able to contact you, you can hire a PO Box number from the P.O. Business Sales Centre, 08457 7950950. This costs £43.00 or £53.00 for six or twelve months respectively if you collect your own mail from your local Post Office, or twice that amount if you want the mail delivered. Alternatively, for people on the move, there is the free Poste Restante service, but this can only be used for up to three months, in any one town.
At present, my own approach is to view what e-mail there is waiting for me at my inbox at the ISP. This is done using their "webmail" service, accessed over the web with Internet Explorer. ISPs provide webmail as a "wrapper" around the e-mail they have for you, so you can see what there is and delete any you don’t like the sound of, before ever downloading any of it into your computer. If you cannot decide from the title whether it is a message you want or not, my ISP (like others, I guess) can make the content available as a simple text message to view on my screen, which does not entail actually running the code, even at the ISP end of the link. This prevents any undesirable item - virus, worm, trojan or just plain spam - entering my computer. Or at least, it almost always does. Immediately after "sieving" the ISP-end inbox and deleting any unwanted items, I then download my email, with Outlook Express. There is, however, always the possibility that an unwanted item, spam or worse, arrived in the ISP-end inbox after sieving with webmail, but before downloading with Outlook Express. This could lead to a vulnerability, if the item is opened, should it be not just spam, but an infection of some sort. Outlook Express is often set up so that below the list of the latest e-mails received, there is a "preview pane". This displays the latest message received. But beware, in order to do so, it has to open the file. If the file contains a virus or somesuch, you could be in trouble. Even if you downloaded several messages, of which the infective one was not the last, accidentally clicking on its title in the inbox window will display it, and the damage is done.
This danger can be prevented by going to "View" in the Outlook Express toolbar, and from the dropdown menu, selecting "Layout". This will bring up the "Windows Layout Properties" box, and you can then unselect the "Show preview pane" box. Now, just clicking on the title of an item in the inbox window (for example, to transfer it to the Deleted Items box) will not open it. But beware, double clicking the title will still open the file, but in a new window. A useful precaution is to keep your Deleted Items box empty. For further protection, the firewall ZoneAlarm can be downloaded for free from www.zonelabs.com. This will warn you every time someone tries to access your machine, and give you the option to allow or block the request. However, unless you have a very fast processor, it will slow down you machine noticeably if set to the highest level of protection.
There is still a downside to accidentally opening a message, even if it is only spam. Many spammers will include a "picture" in the message, designed to auto-display, either in the preview pane or a new window. You usually won’t see them, as they consist of but a single pixel. These are known as a web bugs, and can be found in web pages, HTML formatted e-mail messages or other web-aware documents. Although there are legitimate uses for them, they are frequently misused by spammers. The display of a web bug triggers a "phone home" return message to the spammer, confirming that he has reached a real active e-mail address and providing him with statistics on who is looking at the bugged item. Not viewing the message denies him this confirmation, and he may as a result delete your address from his list. This is another reason for disabling the preview pane. Other useful steps include setting Outlook Express to the Resticted Zone and disabling both session- and non-session-cookies. Some firewalls usefully include advertisement-banning features. For example, "Outpost" from Agnitum can be set to prevent graphics of certain sizes or from known sources being loaded.
Whilst precautions of this sort suffice, in my case, at least for the moment, a spam filter of some sort will almost certainly become necessary, sooner or later. In addition to the list of filters in Ref. 1, already mentioned, Ref. 2 provoked some interesting correspondence, see Ref. 3. One of the letters recommends the SpamBayes filter, which can be found at spambayes.sourceforge.net, as very suitable for use with MS Outlook Express, stating that it appears to be about 99% accurate. Nobody likes spam, and that goes for geeks, too. They "fixed" one spammer, who thoughtlessly gave away his postal address, so that he received hundreds of pounds weight of junk mail delivered to his door daily, making it almost impossible for him to sort out his wanted post! Details at Slashdot, the world’s leading geek website. Details of some other interesting websites are given in Appendix 2.
The author is grateful for the aid, in the preparation of this article, provided by London-based I.T. and Computing Consultant Paul D. Zambon B.Sc. (contact care of Electronics World).
1 "Saving Private E-mail", Steven J. Vaughan-Nichols, Spectrum (Journal of the IEEE), August 2003, pp 40 – 44.
2 "WHAM, BAM – YOU’VE GOT SPAM", Roger Dettmer, The IEE Review, September 2003, pp 38 – 41.
3 The IEE Review, October 2003, p4.
Open-source Bayesian e-mail filtering software may be found at the following sites:-
Some other useful websites:-
1) www.spamabuse.org - organisation against spam
2) http://www.mail-abuse.org/ - runs special black lists of known spamming addresses and acts as an anti spam co-ordinator
3) http://spam.abuse.net/ - another organisation with a no-spam vision of the world
4) http://www.bugnosis.org/ - software for detecting web bugs on html pages (however doesn’t work in an email client)
5) http://www.eff.org/Privacy/Marketing/web_bug.html - web bug info
6) http://www.leave-me-alone.com/webbugs.htm - another useful faq.
**** Following publication of this article, Douglas Self wrote to Electronics World with an excellent tip for those who wish to include their email address on a website, without the risk of it being picked up by web-crawler programs looking for email addresses for sale to spammers. It is this: do not include the email address in the html code. Instead, incorporate it as a .gif file. Viewers won't be able to open an email to you automatically, by simply clicking on it, but they can still contact you by entering the email address into Outlook Express manually. More importantly, it will be invisible to programs looking for text strings like *@*.* in the html code.
This page last updated - see homepage.
Return to homepage
This page prepared and maintained by Ian Hickman Partners (Eur. Ing. D.I. H. May BSc.Hons, C.Eng, MIEE, MIEEE, and D. M. May B.A.Hons, A.I.L.)