TR2010-064
Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier
-
- "Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier", Black Hat Technical Security Conference, July 2010.BibTeX TR2010-064 PDF
- @inproceedings{Yerazunis2010jul,
- author = {Yerazunis, W.S. and Kato, M. and Kori, M. and Shibata, H. and Hackenberg, K.},
- title = {Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier},
- booktitle = {Black Hat Technical Security Conference},
- year = 2010,
- month = jul,
- url = {https://www.merl.com/publications/TR2010-064}
- }
,
- "Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier", Black Hat Technical Security Conference, July 2010.
-
MERL Contact:
-
Research Areas:
Abstract:
In this whitepaper we consider the problem of outbound-filtering of emails to prevent accidental leakage of confidential information. We examine how to do this with GPLed open-source spam filter CRM114 and test the accuracy of this filter against a 10,000+ document corpus of hand-classified emails (both confidential and non-confidential) in Japanese. We look into what moving parts are involved in these filters, and how they can be set up. The results show that a hybrid of multiple CRM114 filters outperforms a human-crafted regular-expression filter by nearly 100x in recall, by detecting greater-than 99.9% of confidential documents, and with a simultaneous false alarm rate of less than 6%. As the programmers creating the machine-learning programs don't know how to read or write Japanese, this problem is an almost ideal case of the Searle "Chinese Room" problem.
Related News & Events
-
NEWS MERL researcher's spam filter finds automobile safety defects at NHTSA Date: June 25, 2015
MERL Contact: William S. Yerazunis
Research Area: Data AnalyticsBrief- The CRM114 Discriminator, an open-source spam filter / text classifier created by William Yerazunis in MERL's Data Analytics group, continues to turn up in interesting places - and apparently one of them is in the US Department of Transportation's process for analysis of car safety defect reports.
Although CRM114 is usually used as a spam filter, CRM114 has been used to analyze resumes for jobseekers, scanning outgoing emails to detect accidental confidential information leaks, perusing blogs for relevance, scanning syslog files for interesting events, and now, apparently, searching complaints sent to NHTSA to find safety-related vehicle malfunctions.
- The CRM114 Discriminator, an open-source spam filter / text classifier created by William Yerazunis in MERL's Data Analytics group, continues to turn up in interesting places - and apparently one of them is in the US Department of Transportation's process for analysis of car safety defect reports.
-
NEWS Black Hat Technical Security Conference 2010: publication by William S. Yerazunis and others Date: July 24, 2010
Where: Black Hat Technical Security Conference
MERL Contact: William S. Yerazunis
Research Area: Data AnalyticsBrief- The paper "Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier" by Yerazunis, W.S., Kato, M., Kori, M., Shibata, H. and Hackenberg, K. was presented at the Black Hat Technical Security Conference.