SpamAssassin for Dreamhost
Posted August 19th, 2007 at 8:22 AM in the Projects category; there are 12 comments

This SpamAssassin guide is for Dreamhost clients who want to run a personal copy of SpamAssassin separate from the one already provided. This will take you step-by-step through the process of downloading, installing, and configuring SpamAssassin (SA) version 3.1.0.

Although DH currently supports SA, it is applied to your entire domain. To further complicate matters, DH upgraded to Debian Linux “Sarge” and it caused compatibility issues with the Bayesian database files; these are read/write files created using a Perl library. The next best alternative is to store those Bayes tokens in a MySQL database. By using SA’s Bayesian filtering you stand a much greater chance of eliminating your junk mail, and MySQL seems to circumvent the problem mentioned earlier.

Update: changes to Dreamhost’s network architecture require amendments to this procedure. Please reference my updated tutorial for more information.

A few assumptions before we get started…

  1. Disable all keyword and junk mail filtering via the Dreamhost Panel. The Panel modifies your .procmailrc file and later we will be making changes to it and other files in your root directory.
  2. Mail Tab

  3. Log in to the account that will be using SpamAssassin. You will be doing most of the work from the shell. Having a browser window open to the DH Panel would also be a good idea.
  4. Download SpamAssassin 3.1.0. The 3.1.0 distribution can be obtained from the SA home page. Be sure to get the tar.gz format. Using this command will download it to your home directory. Type the following:
    %> cd ~
    %> wget <Valid URL for: Mail-SpamAssassin-3.1.0.tar.gz>
    
  5. Extract the SpamAssassin archive. You must now extract the files to prepare for installation. Be sure the tar.gz file is in your root directory. Type the following:
    %> tar xvfz Mail-SpamAssassin-3.1.0.tar.gz
    %> cd Mail-SpamAssassin-3.1.0
    
  6. Make the SpamAssassin files. This step is found in the INSTALL file under the heading “Installing SpamAssassin for Personal Use (Not System-Wide).” Each step takes some processing time, so please be patient. It will place your configuration files in ~/saetc and binaries in ~/sausr. Type the following at the command prompt:
    %> perl Makefile.PL PREFIX=~/sausr SYSCONFDIR=~/saetc
    %> make
    %> make install
    
  7. Create a directory for spam rules. This directory provides a location for all of your procmail-related rules. Type the following at the command prompt:
    %> cd ~
    %> mkdir procmail
    %> cd procmail
    
  8. Create the spam.rc file in your procmail folder. The spam.rc file will tell SA where to place junk mail and where to find associated rules for filtering. I have set up a rule to sort all spam into SquirrelMail’s Spam folder. Thus, it is not downloaded by my POP3 client. Type the following at the command prompt and insert the text shown:
    %> pico spam.rc
    
    #=================================================
    # spam.rc should be located in ~/procmail
    
    # Send e-mail to SpamAssassin if less than 500kB in size
    :0fw
    * < 500000
    | $HOME/sausr/bin/spamassassin
    
    # Delete all spam with a score of 8 or higher
    :0
    * ^X-Spam-Level: \*\*\*\*\*\*\*\*
    /dev/null
    
    # Move suspected spam into a separate folder
    :0:
    * ^X-Spam-Status: Yes
    .Spam/
    #=================================================
    
  9. Create the .forward.postfix file in your root directory. Type the following at the command prompt and insert the text shown:
    %> cd ~
    %> pico .forward.postfix
    
    
    #==========================================
    #.forward.postfix should be located in ~
    "| /usr/bin/procmail -t"
    #==========================================
    
  10. Modify the .procmailrc file in your root directory. This step edits the file that specifies which rules are applied to your e-mail. You want to reference the spam.rc file you already created. Type the following at the command prompt and modify the file as shown:
    %> mv .procmailrc .procmailrc.bak
    %> pico .procmailrc
    
    
    #=======================================
    # .procmailrc should be located in ~
    
    # Directory for procmail-related files
    PMDIR=$HOME/procmail
    
    # This is the message directory
    MAILDIR=$HOME/Maildir
    
    # procmail will use spam.rc for rules
    INCLUDERC=$PMDIR/spam.rc
    
    # Send everything else to Maildir
    :0
    $HOME/Maildir/
    #=======================================
    
  11. Install modules necessary to enable DNS tests. Although DNS-based tests are not necessary for SA to work, it’s another great feature that helps to reduce spam. You’ll need the Net::IP dependency before installing Net::DNS. DH currently has this installed but the version is insufficient for SA to run correctly. Type the following at the command prompt:
    %> cd ~
    %> wget <Valid URL for: Net-IP-1.24.tar.gz>
    %> wget <Valid URL for: Net-DNS-0.53.tar.gz>
    %> tar xvfz Net-IP-1.24.tar.gz
    %> tar xvfz Net-DNS-0.53.tar.gz
    %> export PERL5LIB=/home/your_username/sausr/share/perl/5.8.4
    %> cd Net-IP-1.24
    %> perl Makefile.PL PREFIX=~/sausr
    %> make
    %> make install
    %> cd ~/Net-DNS-0.53
    %> perl Makefile.PL PREFIX=~/sausr
    %> make
    %> make install
    %> ln -s ~/sausr/lib/perl/5.8.4/Net/* ~/sausr/share/perl/5.8.4/Net
    
  12. Send yourself an e-mail to check the installation. Your installation should be working properly. Be sure to check the e-mail headers for “X-Spam-Checker-Version: SpamAssassin 3.1.0″ This means your local copy of SA is working.
  13. Create a MySQL database for SA. Go to the Goodies, MySQL link on the DH Panel. The username/password are examples. For security reasons, do not use the following. I used these only to complete the guide. Remember these settings because you’ll be using them later. After DH has set-up your database you will see it added to your list. Proceed to the next step. New MySQL Database
  14. Create the database tables necessary for SA. Doing this step will create all the necessary tables used by SA for user preferences and the Bayes tokens. I’m not currently using the database for my preferences but doing this step now would make it easier later. Type the following at the command prompt and enter your database password as prompted:
    %> cd ~/Mail-SpamAssassin-3.1.0/sql
    %> mysql -h spam.sa.com -u spam_user -p MySpamDB < userpref_mysql.sql
    %> mysql -h spam.sa.com -u spam_user -p MySpamDB < bayes_mysql.sql
    %> mysql -h spam.sa.com -u spam_user -p MySpamDB < awl_mysql.sql
    
  15. Modify the local.cf file. Everything is set-up to use Bayesian filtering but it’s not enabled by default. Type the following at the command prompt and add the lines indicated:
    %> cd ~/saetc/mail/spamassassin
    %> pico local.cf
    
    
    #===================================================
    # These lines must be in your local.cf file
    
    # Tell SA to use Bayesian filtering
    use_bayes 1
    
    # Tell SA to use a MySQL database for tokens
    bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
    bayes_sql_dsn       DBI:mysql:MySpamDB:spam.sa.com
    bayes_sql_username  spam_user
    bayes_sql_password  itsasecret
    
    user_scores_dsn           DBI:mysql:MySpamDB:spam.sa.com
    user_scores_sql_username  spam_user
    user_scores_sql_password  itsasecret
    
    #===================================================
    
  16. Modify the user_prefs file. The user_prefs file is where you should add any white or black listed e-mail addresses. The MySQL database can store this information but I haven’t integrated it with SquirrelMail…yet. These preferences are optional but I believe they lead to greater reliability when SA assigns a score to an e-mail. Type the following at the command prompt and add the lines indicated:
    %> pico ~/.spamassassin/user_prefs
    
    
    #===================================================
    # These lines should be in your user_prefs file
    
    # Do not autolearn from Bayes
    bayes_auto_learn 0
    
    # Do not autowhitelist e-mail addresses
    use_auto_whitelist 0
    
    # Bayes will ignore these headers
    bayes_ignore_header X-Virus-Scanned
    bayes_ignore_header X-Spam-Status
    bayes_ignore_header X-Spam-Level
    bayes_ignore_header X-Spam-Flag
    
    #===================================================
    
  17. Create a script for training SA. This will vastly simply your life so training SA becomes almost automatic. The script is capable of training for spam or ham, depending upon the command line argument. If training on spam, it will delete all the messages when it’s done. Type the following at the command prompt and insert the text shown:
    %> echo > ~/salearn.bat
    %> chmod 744 ~/salearn.bat<br>
    %> pico ~/salearn.bat
    
    
    echo '========================================='
    TESTED=false
    if [ "$1" ]
    then
      if [ $1 = "spam" ]
      then
        TESTED=true
        ~/sausr/bin/sa-learn -V
        echo '-----------------------------------------'
        echo Learning what is SPAM...
        ~/sausr/bin/sa-learn --spam ~/Maildir/.Spam/cur
        rm -f ~/Maildir/.Spam/cur/*
        echo All messages from your Spam folder were deleted.
      elif [ $1 = "ham" ]
      then
        TESTED=true
        ~/sausr/bin/sa-learn -V
        echo '-----------------------------------------'
        echo Learning what is HAM...
        ~/sausr/bin/sa-learn --ham ~/Maildir/cur
      fi
    
      if [ $TESTED = true ]
      then
        echo '-----------------------------------------'
        echo Summary statistics of Bayes database...
        ~/sausr/bin/sa-learn --dump magic
        echo '-----------------------------------------'
      fi
    else
      echo Enter one argument:  [ham | spam]
    fi
    echo '========================================='
    
  18. Allow spam to accumulate in your Spam folder and train SA. SpamAssassin gets even better after training it with several thousand spam messages. It assigns a statistical probability to certain words and characteristics of mail you have designated as junk. When your Spam folder is getting full type the following at the command prompt:
    %> ~/salearn.bat spam
    %> ~/salearn.bat ham
    
  19. Create a crontab job to run your script. You have a script but you need to execute it with regularity. That’s what crontab is used to do. You could have it doing lots of other chores like emptying the Trash folder, but I will leave the extras to you. After the script executes you are sent an e-mail with a summary of what happened. The script will execute 10 minutes after midnight every Sunday. Type the following at the command prompt and add the text listed:
    %> crontab -e
    
    10 0 * * 7 $HOME/salearn.bat spam
    
  20. Delete the temporary SpamAssassin folder. To free up a little disk space, go back and delete the temporary folder created when we decompressed the archive. Type the following at the command prompt:
    %> cd ~
    %> rm -R -f Mail-SpamAssassin-3.1.0
    

12 Comments on “SpamAssassin for Dreamhost”

  1. Chris

    Terrific! Took me about a half hour to walk through the steps - I’ve used SA in the past but have, for the last year, been trying to use DH’s own junk mail handling as well as Thunderbird’s built-in controls; neither are up to snuff and my inbox was all but useless. I’m looking forward to getting back to what I know will work. Thanks for the excellent walk-through.

    BTW, I did use the latest SA, 3.2.2, and everything worked exactly as described above for 3.1.0.

  2. Chris

    Followup: I’ve been running SA for about 48 hours now and it’s been working amazingly well. I already had a sizable corpus of spam and ham to jumpstart the training, so that helped.

    ~ [4]:$ spamreport.sh
    To null: 116
    Maybe Spam: 35
    Delivered: 44

    I tweaked my spam.rc to only send mail with scores of 10 or higher to /dev/null, so I’m getting a few more “Maybe Spam” then I really need to - I’ll likely change that down to 8 in about a week. Of the 44 messages delivered to my Inbox, only 3 have been spam - woot! Of the 35 in “Maybe Spam” none were ham.

    Thanks again for the excellent instructions.

  3. matthew

    I’m glad you liked the instructions, Chris. I’ve updated the graphics and made a few other modifications. I finally updated the guide to reference a Spam folder instead of SquirrelMail’s Trash folder. Also, there are no more “optional” steps. Automation adds value, so creating the script and crontab job are now considered an essential part of the guide.

  4. Steven

    Thanks for the instructions. I’m eager to try this out after running into some issues with both the DH built in junk filter and then later their own install of SA.

    When you say “Log in to the account that will be using SpamAssassin.” are you talking about any SSH account with access to the domain that will use SA or the specific email account that SA will be used on?

    Thanks for any clarification you can provide. I’ve read through the entire instruction set and the rest of it looks pretty straightforward. Thanks for the great doc!

  5. Desh

    I have the same question Steven does. I have my own Dreamhost account (and thus have shell access), but my email is mostly through an email-only account I set up for myself. So I can log into a shell from my “main” account but not my email account. Does this prevent me from installing SA this way?

    Thanks!

  6. matthew

    I’ve only tried SA with my user account. To handle those mail-only accounts my first thought is to use procmail and setup forwarding rules. If I get a chance to try this, I will definitely update the guide with my findings.

  7. Andrew Kelley

    Thanks for the very helpful tutorial. Looks like you are setting up the awl table in MySQL but not actually using it. I’ve figured out how to turn on MySQL auto-whitelist from the docs, but I don’t know how to migrate my old awl database to MySQL. Anyone know?

  8. rng

    I don’t think it’s a good idea to learn the entire inbox as ham. It should be a folder where you’re sure it is 100% ham. This doesn’t apply to the inbox, as fresh spam might sneak in while spamassassin is learning.

  9. Adam

    I did this and followed the instructions off the DH wiki for installing DNS tests but I only show SpamAssasin headers for version and score and not DNS testing.

  10. Kevin Keegan

    Sorry this is no longer an option with dreamhost. Dreamhost removed the ability to have a custom .procmail or any mail filtering. To read more see: http://www.krkeegan.com/archives/84-Dreamhost-Removes-SpamAssassin.html

    Dreamhost Deletes SpamAssassin

  11. Randy

    Had it installed and running great, but now I’ll no longer be able to do it for any new users in the future. This was one of the biggest selling points that got me to sign up, and the other was off-site backups of my personal data (particularly personal photos taken with our DSLR). Guess what: Dreamhost policy changes have taken both away. The old bait ‘n switch.

    Of course, now I have like 10 domains for family and friends hosted there, so I can’t migrate to a new host all that easy. I guess that’s what they’re counting on…

  12. Geoffrey

    One of the lucky ones as I signed up last summer.

    Sorry if I am being thick here, but when you say “allow spam to accumulate in your Spam folder”, are you meaning the SquirrelMail SPAM folder or my IMAP Spam folder?

    One of my biggest complaints about DH’s spam support has been the dependency on using that god-awful web app to manage spam and whitelists.

    Thanks for this great tutorial!

Write a Comment

Validation Image