SpamTUNNEL (2003)
I recall that over New Year’s break 2002-2003 I got interested in spam and statistical filtering, so I wrote SpamTUNNEL, which I released as freeware. It got way more attention than I anticipated, getting me contacts from the MIT Media Lab and getting cited in several papers.
The timing was not an accident. Paul Graham had published “A Plan for Spam” in August 2002, and the idea that you could beat spam with simple word-frequency statistics rather than hand-written rules was suddenly everywhere. I wanted to try it myself, and I wanted it to work with whatever mail client I happened to be using, without a plugin. So instead of writing a filter for one program, I wrote a small proxy that would sit between any mail client and the mail servers and filter everything passing through.
What it was#
SpamTUNNEL was a transparent local mail proxy. You pointed your mail client at localhost
instead of your real POP3 and SMTP servers, and SpamTUNNEL relayed each session to the
upstream server while inspecting the mail in transit. Because it worked at the protocol
level, it filtered mail for any client without needing an add-on.
It was a Java 1.1 application with a Swing interface, built in Borland JBuilder and shipped
as a native Windows executable. The launcher was a small Borland C++ stub that found a JVM,
set the class path, and ran the main class, with the compiled .class files appended to the
executable as a ZIP archive. It was packaged with Inno Setup. I released it under “Calypso
Soft” (the shipped strings spell it “Calipso”), starting at version 0.1 in January 2003 and
later reaching version 1.01. Because it was Java, it ran on Windows, Linux, Mac OS X and
Solaris.
How it worked#
Two independent proxy servers ran at the same time, both built on a shared socket base class:
- Inbound (POP3): filtered and tagged incoming mail as it was retrieved.
- Outbound (SMTP): relayed outgoing mail, and doubled as a control channel for training and configuration.
One decision I still like: the tool never deleted mail. Its verdict was expressed as an added header and an optional subject tag, and the final disposition was left to the user’s own client-side rules.
Inbound filtering#
When the mail client connected to the POP3 tunnel on port 110, it was greeted with
+OK Calipso Soft pop3 tunnel/mailfilter. The tunnel proxied the USER and PASS
authentication straight through to the real POP3 server. On a RETR, it fetched the message
from upstream, ran it through the classifier, and then injected a verdict header before
handing it to the client:
X-SpamTUNNEL: SPAM MAILfor flagged mailX-SpamTUNNEL: clear mailfor mail that passed
For flagged mail it could also prefix the subject with a marker, THIS IS SPAM by default.
Standard POP3 verbs (STAT, LIST, UIDL, TOP, DELE, QUIT, and the rest) were
proxied transparently. Anything it did not recognise got -ERR Command not understood.
The classifier#
The core was a Bayesian-style token-frequency filter, the approach Graham had popularised a few months earlier. It tokenised the message text, split on punctuation and whitespace, lower-cased everything, and kept only tokens between 3 and 28 characters long, discarding shorter and longer ones as noise.
It maintained two persistent word-count tables stored as Java Properties files: good.lst
for ham (legitimate mail) and bad.lst for spam. Ham counts were weighted by a multiplier
(1.2 by default) to bias the filter against false positives, because treating a real mail as
spam is more costly than letting a spam through. The message score was compared against a
threshold (20 by default, spelled “treshold” in the code, one of a few spelling quirks I
shipped and never fixed).
Wrapped around the statistical pass was a whitelist and blacklist check. The sender address
and server were parsed from the From: header and checked against a list file. A whitelist
match forced a “good” verdict, a blacklist match forced “bad”, short-circuiting the
statistics. List entries could be full addresses ([email protected]) or whole domains
(@server.com matched everything at that domain).
Training#
The filter learned from a labelled corpus. You put known-good messages in one folder and
known-spam in another, and the trainer read every file, tokenised it under the same rules as
the filter, and accumulated the counts into good.lst and bad.lst. It reported progress
as Learned from N good mails and Learned from N bad mails.
The control channel#
The part I was most pleased with: you could train and configure the tool from any mail
client, with no separate UI, by sending mail to spamtunnel@localhost. The SMTP tunnel
noticed that recipient, refused to deliver it, and instead handed the body to a command
handler that parsed instructions like good, bad, whitelist, and blacklist, updating
the learning tables and the list file. So retraining the filter on a misclassified message
was just a matter of forwarding it to a local address.
Configuration#
Everything lived in plain text files. servers.ini held the network settings: the upstream
SMTP host and port, the local POP3 listen port, an optional path to a mail client to launch
on startup, and a flag to exit when that client closed. spamtunnel.ini held the filter
behaviour: token length limits, the ham weighting multiplier, the spam threshold, the
subject marker text, and the persistence interval. The word tables and the address list were
the three .lst files.
Limitations#
It was very much a tool of its moment, and it had real limits:
- Because it classified mail only after fetching it from the POP3 server, it did not save download bandwidth. The filtering was purely local. Contemporary users pointed this out.
- It spoke only classic unencrypted POP3 and SMTP on ports 110 and 25. It predates the general move to TLS and has no SSL handling at all.
- Period-typical desktop Java: 1.1 bytecode, Swing, JBuilder.
There are also small fingerprints of me all over it. The protocol strings switch into
Romanian at the friendly moments: it says La revedere !!! (“goodbye”) on QUIT. And a
handful of misspellings (“treshold”, “congif”, “noboby”) are preserved in the shipped
strings.
References#
The technique it was built on:
- Paul Graham, “A Plan for Spam” (August 2002).
Academic mentions:
- M. Franciosi, BSc thesis, University of Pavia (netlab), which lists SpamTunnel among free POP3-proxy spam filters, notes it had reached version 1.01, and points out that its flexibility came at the cost of ease of use: MFranciosi_Thesis.pdf
- Universidad Politécnica de Madrid, which lists SpamTunnel (in Java) alongside SpamBayes, POPFile and PASP as local POP3 proxies for spam filtering: oa.upm.es
Contemporary forum discussion:
- HilfDirSelbst.ch anti-spam forum (German), thread “SpamStopper 1.4 from Railhead & SpamTUNNEL 0.1 from Calypso”: hilfdirselbst.ch
- onliner.by forum (Russian), thread “спам-фильтр для The Bat” (21 January 2003), describing SpamTUNNEL as “a personal freeware POP3 proxy with built-in spam filtering”: forum.onliner.by
The original distribution lived at http://uiorean.cluj.astral.ro/, now only reachable
through the Internet Archive Wayback Machine.