Logo

Why Bayesian Filtering is the most Effective Antispam Technology



Achieving a 98%+ spam detection rate using a mathematical approach

This white paper describes how Bayesian filtering works and explains why it is the best way to combat spam.


Introduction

This white paper describes how Bayesian mathematics can be applied to the spam problem, resulting in an adaptive, ‘statistical intelligence’ technique that achieves very high spam detection rates.

It also explains why the Bayesian approach is the best way to tackle spam once and for all as it overcomes the obstacles faced by more static technologies such as blacklist checking, comparing to databases of known spam and keyword checking. These technologies are not obsolete, but cannot be relied upon without a Bayesian filter.

Current spam detection techniques

Spam is an ever-increasing problem. The number of spam mails is increasing daily - studies show that over 50% of all current email is spam; the Radicati Group predicts this will reach 70% by 2007. Added to this, spammers are becoming more sophisticated and are constantly managing to outsmart 'static' methods of fighting spam.

The techniques currently used by most anti-spam software are static, meaning that it is fairly easy to evade by tweaking the message a little. To do this, spammers simply examine the latest anti-spam techniques and find ways how to dodge them.

To effectively combat spam, an adaptive new technique is needed. This method must be familiar with spammers’ tactics as they change over time. It must also be able to adapt to the particular organization that it is protecting from spam. The answer lies in Bayesian mathematics.

How the Bayesian spam filter works

Bayesian filtering is based on the principle that most events are dependent and that the probability of an event occurring in the future can be inferred from the previous occurrences of that event. (More information about the mathematical basis of Bayesian filtering is available at Bayesian Parameter Estimation – http://www-ccrma.stanford.edu/~jos/bayes/Bayesian_Parameter_Estimation.htm and An Introduction to Bayesian Networks and their Contemporary Applications – http://www.niedermayer.ca/papers/bayesian/bayes.htm).

This same technique can be used to classify spam. If some piece of text occurs often in spam but not in legitimate mail, then it would be reasonable to assume that this email is probably spam.

Click Here to read the complete Whitepaper...


 
TCA Home | ARTICLES | WEBINARS | SIGN UP | EVENTS | SPONSORS | PARTNERS | EXPERTS | ABOUT | CONTACT | PRIVACY POLICY | UNSUBSCRIBE | TCA RSS Feed

Copyright ©2009 The Compliance Authority, Inc.