TR2004-091

The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It

- Yerazunis, W.S., "The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It", MIT Spam Conference, January 2004.
  BibTeX TR2004-091 PDF
  - @inproceedings{Yerazunis2004jan,
  - author = {Yerazunis, W.S.},
  - title = {{The Spam-Filtering Accuracy Plateau at 99.9\% Accuracy and How to Get Past It}},
  - booktitle = {MIT Spam Conference},
  - year = 2004,
  - month = jan,
  - url = {https://www.merl.com/publications/TR2004-091}
  - }
MERL Contact:
- William S.
  Yerazunis

Abstract:

Bayesian filters have now become the standard for spam filtering; unfortunately most Bayesian filters seem to reach a plateau of accuracy at 99.9 percent. We experimentally compare the training methods TEFT, TOE, and TUNE, as well as pure Bayesian, token-bag, token-sequence, SBPH, and Markovian ddiscriminators. The results deomonstrate that TUNE is indeed best for training, but computationally exorbitant, and that Markovian discrimination is considerably more accurate than Bayesian, but not sufficient to reach four-nines accuracy, and that other techniques such as inoculation are needed.

Related News & Events

NEWS MIT Spam Conference 2004: publication by William Yerazunis
Date: January 21, 2004
Where: MIT Spam Conference
MERL Contact: William S. Yerazunis
Research Area: Data Analytics
Brief
- The paper "The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It" by Yerazunis, W.S. was presented at the MIT Spam Conference.

MERL Contact:

William S.Yerazunis

Abstract:

William S.
Yerazunis