Spam Detection Using SpamAssassin with PYTHEAS MailGate
News (Jan. 27th, 2010)
This page now includes instructions how to install SpamAssassin release
3.3.2.
Upgrade instructions are here.
SpamAssassin (tm) is an open source product that performs
heuristic spam analysis and RBL (Realtime Blackhole List) lookups among other tests,
to clearly tag spam mail as such. PYTHEAS MailGate can then be instructed
to handle spam mail in a particular way.
SpamAssassin (tm) is open source software, licensed
under the Apache Software License (which you can find at
http://www.apache.org/foundation/licence-FAQ.html).
No guarantees or warranties apply to the software. You use it entirely at your own
risk.
Neither SpamAssassin nor the software components it
requires are installed by the PYTHEAS MailGate setup program. Please note
that you need a PYTHEAS MailGate license key which activates the
Content-Checking Rules engine; see the
About tab of the Configuration Program to learn about
the options activated by your license key.
In its default form, SpamAssassin is designed and written
for Unix platforms. This document outlines how to get SpamAssassin
working on a Win32 platform such as Windows 200x/XP. Although it may seem a little
bit cumbersome at first glance, we are sure that you will recognize that it is worth
the trouble - it has an amazing efficiency.
Upgrading SpamAssassin
If you are doing a fresh install, you can skip this section.
Upgrading a SpamAssassin v. 3.x Installation
For the time of the upgrade, you should
stop the Pytheas.MailGate service (or the Communication Task). To upgrade to a newer version of SpamAssassin:
- If you upgrade from SpamAssassin
2.x to 3.x, be sure to read these notes first.
- Uninstall ActivePerl. Then delete the whole
c:\perl subtree.
Be sure not to delete the c:\etc\mail\spamassassin folder. You
may also want to move the NMAKE utilitiy
from C:\perl\bin to some safe place.
- Be sure to get the new
SpamAssassin support files. The
sa.cmd file required for SpamAssassin v.3.3.2
is different from the one included in the package for earlier (pre 3.3.0) versions of
SpamAssassin. Please copy DOS2UNIX.EXE und
UNIX2DOS.EXE to the folder
where PYTHEAS MailGate has been installed.
- Your configuration file
pmg-local.cf
may contain options which are no longer supported in the new version.
Carefully read the beginning of spamdebug.txt when checking your
new SpamAssassin installation later.
- Proceed the same way as you would for a fresh installation, starting from here.
Installing Perl
You should install Perl on Windows 200x/XP-platforms only. It
seems to be possible to get it running on Microsoft Windows 95/98/ME, but Perl is
said to act unreliably on such platforms.
Dowload ActivePerl v.5.8.8.822.
Please note: even on 64bit systems, you
should install this Windows (x86)
package. If you use a more recent ActivePerl distribution, you may run into
problems to install the required language extensions.
- Install ActivePerl. Keep the features Perl
et PPM selected. You may unselect the features Perl ISAPI,
PerlEx, PerlScript, Documentation et
Exemples.
- Open a Command-Line window and type
PERL -v to check that everything
is fine.
- In subsequent sections, it will be assumed that Perl has been installed in
C:\PERL. Make appropriate changes if necessary.
- If you are installing
SpamAssassin for the first time, configure access to public DNS (this
is not yet needed here, but would require another reboot if we do it later): The following environment variables need
to be defined at system level:
RES_NAMESERVERS = ipaddress
LANG = en_US
ipaddress represents the IP address of your ISP's DNS
server or your own DNS server, provided it is linked to the public DNS. To add
more than one, separate the addresses with a space character. Add these to the
global environment variables of your operating system which can be defined in
Control Panel / System, on the Advanced
tab.
- Reboot the computer. If Perl already had been installed on your
computer, and the environment variables already had been defined, for ex.
during an upgrade, you may skip the reboot.
- After rebooting, open a command line window, and type
PATH
to make sure that C:\PERL\BIN is now part of your PATH
environment variable.
Installing NMAKE
- Download NMAKE.
- Extract the files, and place them in
C:\PERL\BIN. Both
NMAKE.EXE and NMAKE.ERR are needed.
Installing the Necessary Perl Modules
Perl uses modules to extend the language's capabilities. Many of them are included
with the core distribution, but many others are available. SpamAssassin
requires several modules which are not in the core distribution of ActivePerl.
Obtaining and Installing SpamAssassin
- Be sure to have PYTHEAS MailGate v. 2.32a (or a newer version).
Upgrade if necessary.
- Go to
http://spamassassin.apache.org/downloads.html,
and download the ZIP file distribution. Extract the Zip file off the root. For
SpamAssassin version
3.3.2 for
example, this will create C:\Mail-SpamAssassin-3.3.2
or C:\Mail-SpamAssassin-3.3.2\Mail-SpamAssassin-3.3.2,
depending on how you proceed. We'll refer to this folder as the SPAMSOURCE
folder in subsequent sections.
- Open a command-line window (an elevated command line window on Windows
Server 2008 and later), go to the SPAMSOURCE folder and type:
PERL MAKEFILE.PL You will be asked a couple of questions. Be sure to answer
No to
the first one, which is not the default response: First question:
Build spamc.exe (...)? Answer: N Next question:
What email address or URL should be used (...) Answer: give a meaningful answer for your site. You may safely ignore the warnings about optional missing modules:
(...) optional module missing: Razor2 optional module missing: Net::Ident optional module missing: IO::Socket::INET6 optional module missing: IO::Socket::SSL (...)
- Still in the SPAMSOURCE folder, type:
NMAKE NMAKE INSTALL
- Make a backup copy of
c:\perl\site\etc\mail\spamassassin\v310.pre
(name it v310.backup for ex.; in any case, don't give it the
.pre extension). Open the file c:\perl\site\etc\mail\spamassassin\v310.pre
in a text editor (Wordpad.exe will handle the line endings better
than Notepad.exe). At the beginning of the lines
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
add the character # to transform them into a comment and avoid
loading the plug-ins.
- Finally type:
C:\Perl\Site\Bin\SpamAssassin -V
You should get the following response:
SpamAssassin version
3.3.2
running on Perl version 5.8.8
- Download the SpamAssassin rules:
C:\Perl\Site\Bin\sa-update --nogpg -v
Using the --nogpg option works even if you do not have gpg installed. This should run without an error message.
We recommend to run this command regularly (once a week, for ex.) to keep
the SpamAssassin rules up to date.
Configure Access to Public DNS
DNS access is needed for all RBL lookups. We
already set the required environment
variables:
SET RES_NAMESERVERS=ipaddress SET LANG=en_US
Testing Your SpamAssassin Installation
Rename the SPAMSOURCE\rules subfolder
(call it rules-orig for ex.).
From a command line window, in the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-nonspam.txt 2>spamdebug.txt
This command should run smoothly. In the command line window, you will get the
message after it passed through SpamAssassin. The output should indicate that this
sample message is not spam - look at the X-Spam-... lines added by
SpamAssassin in
the header part of the message.
Please note: it may happen that the file spamassassin.bat is not
created in the c:\perl\site\bin folder, but in the c:\perl\bin
folder. In this case please adjust the suggested commands in the subsequent
chapters.
Have a look at spamdebug.txt which has been created by this run.
Check for DNS resolution. In the Received header parsing part of it,
you should see:
dbg: dns: is Net::DNS::Resolver available? yes
dbg: dns: Net::DNS version: (...)
dbg: dns: trying (3) w3.org...
dbg: dns: looking up NS for 'w3.org'
dbg: dns: NS lookup of w3.org using (...) succeeded => DNS available (set dns_available to
override)
If there is trouble with DNS resolution, verify that you
properly configured access to public DNS. If you are in doubt with a DNS
server, you can check it with NSLOOKUP (issue the server
configuration command to connect to the DNS server in question).
At the end of the file, check for the results:
dbg: check: is spam? score=0 required=5
dbg: check: tests=
dbg: check: subtests=__CT,__CTYPE_CHARSET_QUOTED, __CT_TEXT_PLAIN, __DOS_BODY_STOCK, __DOS_BODY_SUN, __DOS_HAS_ANY_URI, __DOS_LINK, __DOS_RCVD_FRI, __FB_PICK, __FB_S_STOCK, __FM_STOCK_WORDS, __HAS_ANY_EMAIL, __HAS_ANY_URI, __HAS_MSGID, __HAS_RCVD, __HAS_SUBJECT, __LAST_UNTRUSTED_RELAY_NO_AUTH, __MIME_VERSION, __MISSING_REF, __MSOE_MID_WRONG_CASE, __NAKED_TO, __NONEMPTY_BODY, __RCVD_IN_SORBS, __RCVD_IN_ZEN, __SANE_MSGID, __TOCC_EXISTS, __YOUR_ACCOUNT
Now let's check if a message is correctly identified as spam. From the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-spam.txt 2>spamdebug.txt
The output in the command line window should indicate that this sample message
is spam (look at the X-Spam-... lines added by SpamAssassin
in the header part of the message, and the body of the message which has been modified
by SpamAssassin).
Have a look at spamdebug.txt. At the end of the file, check for the results:
dbg: check: is spam? score=999.998 required=5
dbg: check: tests=GTUBE,NO_RECEIVED,NO_RELAYS
dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT, __MIME_VERSION, __MISSING_REF, __MSGID_OK_HOST, __NONEMPTY_BODY, __SANE_MSGID, __TOCC_EXISTS, __UNUSABLE_MSGID
The Online Documentation
You can access the documentation at
http://spamassassin.apache.org/full/3.3.x/dist/doc/. The most
important file to read is Mail Spamassassin Conf - it outlines all major
configuration parameters.
Connect SpamAssassin and PYTHEAS MailGate
If you are upgrading, you are now ready to restart PYTHEAS MailGate.
If you are installing SpamAssassin for the first time, download and unzip the
SpamAssassin support files. If you
do not have a pmg-local.cf file, copy this file from the pack to C:\etc\mail\spamassassin. Create this
folder if it does not exist. Use this file to configure the way SpamAssassin
should work for your site. You should not edit global configuration files in
C:\perl\site\share\spamassassin as your settings could be lost during
the next upgrade. Of course, it is a good idea to look at the global configuration
files to know what parameters can be changed.
Copy the files sa.cmd, DOS2UNIX.EXE et UNIX2DOS.EXE to the C:\Program Files\PytheasMailgate
folder. The downloadable version of the file assumes that Perl has
been installed in the C:\perl folder.
Please note that we do
not really need DOS2UNIX.EXE and UNIX2DOS.EXE for the current version of
SpamAssassin, but it may be useful for future versions.
Here are some comments about
the contents of sa.cmd:
-D |
Instructs SpamAssassin to produce diagnostic output (see below). You may change this option to obtain different diagnostic output.
You can also omit this parameter altogether, if you do not need it. |
-e |
Instructs SpamAssassin to set the exit code depending
on the spam status. PYTHEAS MailGate uses this exit code to pick up the
spam status. |
-p ... |
Instructs SpamAssassin to use the Pmg-local.cf
file, regardless of the user context in which it is running. |
|
%1, %2, %3, %4 |
PYTHEAS MailGate will always call sa.cmd
with 4 parameters. Please see details below. |
|
%1 |
Path name of the file containing the message to be checked. |
|
%2 |
Path name of the file to contain the checked message (this is
always Temp_folder\PmgSaChk.tmp). |
|
%3 |
Path name of the file to contain the diagnostic output produced
by SpamAssassin (this is always Temp_folder\PmgSpamA.log). |
|
%4 |
Determined by the POP3 account configuration
in PYTHEAS MailGate. Note: the downloadable
version of sa.cmd includes a code to handle the value
NoSpamCheck for this parameter, which does what its name suggests: if
you add Spam-A:NoSpamCheck to the Comment of a
POP3 account, it will be excluded from spam checking.
|
|
Exit code or Errorlevel |
Since v. 2.31c,
PYTHEAS
MailGate no longer relies on the exit code (or Errorlevel
value) of the sa.cmd command file, as with previous versions. |
To check your installation, you may use sapmg.cmd from the
SpamAssassin support files. This
command file calls SpamAssassin the same way PYTHEAS MailGate
does. You will find the message which has been checked by SpamAssassin,
and the diagnostic output spamdebug.txt, in the folder referenced
by the TEMP environment variable (use the SET command
to show environment variables).
Test it
If you activate spam-checking for the first time, you may want to activate it
for a single POP3 account only, with the following options:
- Check incoming mail with SpamAssassin... Only from POP3 accounts with
the word Spam-A in the comment. Put the word Spam-A into the
Comment field of the POP3 account entry.
- Forward messages identified as Spam to... The intended Recipient as
usual
- Add SpamAssassin's report to the Session Log message...Always.
Be sure to have your Recipient entry configured to receive
Session Log messages (check the corresponding box on its property sheet).
This is for debugging purposes only. Be sure to remove this option once you have
SpamAssassin up and running.
After messages have been spam-checked, look for the following lines In the Remote Control Program
or in the Session Log message:
[11:16] [Spamassassin] Spam status: No, score=-4.9 required=5.0 tests=BAYES_00
autolearn=ham version=3.3.2
or
[11:06] *** [Spamassassin] Spam status: Yes, score=8.8 required=5.0
tests=BAYES_99, BIZ_TLD, HTML_60_70, HTML_MESSAGE, HTML_TITLE_UNTITLED, HTTP_EXCESSIVE_ESCAPES,
MIME_BASE64_TEXT, MIME_HTML_NO_CHARSET, MIME_HTML_ONLY autolearn=no version=3.3.2
In case you have problems:
- Please have a look at
PmgSpamA.log or at PmgSaChk.tmp
(you will need to make a copy of these file while the download session is still
in progress, as they will be deleted upon termination). You will find these files
in the folder you specified on the Environment tab of the Configuration
Program.
- If you have trouble to get SpamAssassin to work while running
PYTHEAS MailGate as a service: please try to run the PYTHEAS MailGate
Communication Task from the Start menu; you will need
to stop the service for this purpose.
- Did you really restart the computer since you installed Perl?
Cleaning up
The SPAMSOURCE folder is no longer needed once the installation
is completed.
Spam Handling in PYTHEAS MailGate
To activate spam detection in PYTHEAS MailGate, open the configuration
form which can be accessed from the Content Checking page. The
SpamAssassin diagnostic output can be inserted into the PYTHEAS
MailGate Session Log messages (please note that they will not
be visible in the Remote Control Program).

Setting Spam Delivery Options in PYTHEAS MailGate
You have the following options for the delivery of messages which have been identified
as spam:
- deliver as usual (please note that the spam will have been tagged as such
by SpamAssassin),
- always deliver to a particular Recipient
- do not deliver to anybody. If you have configured to write a log entry for
every incoming message, messages identified as spam are logged even if they are
actually not forwarded to any internal Recipient at all. Such messages
receive a [Spam] tag at the beginning of the message subject.
- Messages with a spam score above a certain level can be handled in a different
way, as compared to spam messages with a spam score below this level.
Specific Configuration Settings for POP3 Accounts
You can activate spam analysis for all POP3 accounts, or only for selected ones.
The Comments field in the POP3 Account properties is used for this
purpose.
To activate spam detection only for certain POP3 accounts, configure the corresponding
option in the PYTHEAS MailGate configuration (see screen shot above), and type the word Spam-A
anywhere as a separate word into the Comment field of the selected
POP3 accounts.
To use specific SpamAssassin configuration settings for POP3 accounts,
proceed as follows:
- Put the following expression into the Comment field of each POP3
Account entry:
Spam-A:ConfigTag.
ConfigTag is some identifier (only composed of letters and numbers). It
will be passed as 4th parameter to sa.cmd.
- You can now write code in
sa.cmd to switch to different configuration
files, based on this parameter.
- If for a particular POP3 account, no ConfigTag value is found
in the Comments field, the word Nothing is passed as
4th parameter (so you can be sure that your
sa.cmd file always gets
4 parameters).
- The
sa.cmd file included in the
SpamAssassin support files files
contains code to handle the ConfigTag value of NoSpamCheck,
to exclude a particular POP3 account from spam checking.
Spam/Ham Learning for SpamAssassin
For spam/ham learning with sa-learn, messages
are needed in text format according to RFC822, with the complete message header
lines. Unfortunately, there does not seem to be an easy way to save messages in
such a format using Microsoft Outlook.
How to save incoming messages to files in RFC822 format
PYTHEAS MailGate v. 2.30c (or later) supports a new way to write messages to disk
files in RFC822 format. This new function is managed by a tag in the
Comment field of POP3 account entries. The name of the tag is SaveToDisk,
and it has two parameters, which are separated by a vertical bar (ASCII_124):
- a name for a folder (which will be created if it does not exist). Messages
will be saved to this folder. It will be located in
ProgramData\PytheasMailgate\Incoming or Program_Files\PytheasMailgate\Incoming
(depending on where your PMailGat.INI configuration file is
located);
- an age limit (in hours). Any files in this folder older than the age limit
will
automatically be deleted. An age limit of 0 (zero) will disable automatic
cleaning.
As an example, adding the expression SaveToDisk:SpamHam|24 to
the Comment field of a POP3 account entry will save all
messages from this POP3 mailbox to the folder Program_Files\PytheasMailgate\Incoming\SpamHam,
and any file older than 24 hours in this folder will be cleaned out at the
beginning of the upcoming download
session. Message delivery will continue as usual. Several POP3 mailboxes can
have their messages dropped into the same folder.
Another way to obtain messages in RFC822 format is to use the View/Delete messages function (accessible from the POP3 account
property page). It has a Save message as-function (press F10 to access
it). You should also configure PYTHEAS
MailGate not to delete messages after downloading them, and clean them
after a day or two. So you can get messages in RFC822 format directly from the POP3
account. With this method, you can also get the messages to teach the Bayes engine
with messages for which it does not yield the correct result.
To streamline the process, you could do the following:
- Set up a folder structure as described in the
SpamAssassin support files package.
- Make shortcuts on the desktop for the programs
LearnHam.cmd and
LearnSpam.cmd, and the folders SpamTest\Ham and
SpamTest\Spam.
Now the learning procedure could look like this:
- If you configured your POP3 account to have the messages saved to files by
using the
SaveToDisk option (see above), open the
...\Incoming\... folder. Drag-and-drop the messages to the
SpamTest\Spam or SpamTest\Ham shortcut.
- Alternatively, you can save the message to feed into the learning process on the desktop (View/Delete
messages, F10, Save message as). Then drag-and-drop the file to the shortcut pointing to the
SpamTest\Spam or SpamTest\Ham
folder.
- Double-click on the shortcut for
LernSpam.cmd or LernHam.cmd
(this will feed
all files contained in this folder into sa-learn).
Additional instructions for upgrading from SpamAssassin
2.x
- Before
installing a 3.x version of SpamAssassin over a 2.x version, you should
put your Bayes database into a "clean" state:
from a command line prompt, execute:
sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --rebuild
- Clean the
c:\etc\mail\spamassassin folder: leave only pmg-local.cf
and the bayesdb subfolder and its contents; delete all the other
files.
- After installing the 3.x version of SpamAssassin: From a command line prompt, execute...
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --sync
followed by
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf -D --import
to migrate the data into new DB_File format. Be patient, these commands may take
a couple of minutes to complete, depending on the size of your Bayes database.
- Check that the new version of SpamAssassin works on your machine
(we recommend to use the
spam-a.cmd command file included
in the
SpamAssassin support files for this
purpose, because it includes a reference to your pmg-local.cf
preferences file, which in turn contains the pointer to your Bayes database
in c:\etc\mail\spamassassin\bayesdb). Look in the debug output for
configuration options in pmg-local.cf which may be no longer supported
or which have a new syntax. You may want to compare your configuration file to
the sample pmg-local.cf file contained in the
SpamAssassin support files.
More Information
Credits
This document has been inspired by USING SpamAssassin WITH
WIN32, (c) 2002,2004 by Michael Bell (thanks!).
SpamAssassin is a trademark of the Apache Software Foundation.
|