Critical Eye on SpamAssassin 324
ErrorBase writes "In this Infoworld article, Logan G. Harbaugh makes a great deal about an ancient (2.44) version of SpamAssassin comparing it with newer comercial variants.
Quote : You get what you pay for. [...] However, it took more than 10 times as long to install and configure SpamAssassin as it did any of the other products. "
Why did he not ask Kevin Railsback who had the whole thing working some while ago?)"
SpamAssassin (Score:5, Interesting)
TrollAssasin would be nice, imagine seeing posts subjects as *****TROLL***** heh
Re:SpamAssassin (Score:5, Funny)
Re:SpamAssassin (Score:3, Informative)
Re:SpamAssassin (Score:2)
Then write a small filter for squid, which will break all comments into single posts, filter them through SA and reassemble output.
Nice weekend-hacking project.
Re:NonDocumentedSoftwareAssassin (Score:3, Funny)
Memo to self: if I ever spend 3 months creating free software to share, take 2 hours to write a web page showing somebody how it freaking works!
What am I doing wrong? (Score:4, Interesting)
I get about 60-70% of my spam correctly tagged, and about
Re:What am I doing wrong? (Score:4, Funny)
The problem is that you're making the same mistake I am.
(No, I can't expand upon that)
Re:What am I doing wrong? (Score:3, Informative)
If you are running 2.60, have you trained and enabled the bayesian filters? By default you need to feed SpamAssassin about 300 spam and 300 ham (non-spam) messages for it to learn the difference. It will auto-train itself over time but it only auot-learns on messages that are very obviously (to it) spam or ham.
If you normally only get email from a select list of people then you may want to lower your threshold. For people
Catching false positives. (Score:3, Informative)
I turned subject rewriting on:
rewrite_subject 1
Then I set the subject tag to include the hit number:
# Text to prepend to subject if rewrite_subject is used
subject_tag *****SPAM****:*_HITS_*
then in your email client you can sort your JUNK messages based on subject. This will put the tagged spam messages with the fewest hits at the top. That way you can eas
Re:SpamAssassin (Score:2, Insightful)
TrollAssasin would be nice, imagine seeing posts subjects as *****TROLL***** heh
Seriously, wasn't that one of the ideas behind moderation?
Is there a gui tool for configuring SpamAssassin? (Score:5, Insightful)
Aren't there tools that do this?
Re:Is there a gui tool for configuring SpamAssassi (Score:4, Informative)
Re:Is there a gui tool for configuring SpamAssassi (Score:3, Insightful)
"I installed the software on Red Hat Linux 9, with help from one of Proofpoint's systems engineers. She talked me through getting the Linux system configured properly, getting sendmail set up, and installing and configuring the Protection Server, which includes the MySQL database server for storing quarantined e-mail."
who needs a gui?
no wonder he gave spamassassin a low score. he couldnt have someone handhold him
Re:Is there a gui tool for configuring SpamAssassi (Score:4, Informative)
http://www.openhandhome.com/saconf.html
a problem with reviewers (Score:5, Insightful)
Re:a problem with reviewers (Score:2, Insightful)
This is a kin to when Ballmer was quoted comparing Redhat 6 vs. Longhorn or XP or whatever.
This guy's just following the first rule of "marketbenching"
"When in doubt, squew results in favor of the company that's paying you the most..."
Re:a problem with reviewers (Score:5, Insightful)
It is always nice to see a lack of journalistic integrity in reviewers...
Re:a problem with reviewers (Score:4, Insightful)
Re:a problem with reviewers (Score:3, Informative)
Spamassassin didn't seem that hard to install. I just typed "apt-get install spamassassin" and just piped my mail through it with a procmail recipe:
:0fw
| spamassassin -P
* ^X-Spam-Status: Yes
spam
Seemed simple and straight forward. Granted, if you're doing it on an entire machine basis you'd just use spamd/spamc and setup a filter on the mail server itself. For one user though I'm not sure ho
Re:a problem with reviewers (Score:3, Interesting)
But fix your .procmailrc (Score:3, Interesting)
Re:a problem with reviewers (Score:5, Informative)
So if you want to whinge at anyone, whinge at RH. At least this shows that reviewers now think they should include FOSS in their reviews.
Justin.
Re:a problem with reviewers (Score:2, Insightful)
Re:a problem with reviewers (Score:4, Insightful)
That is the oldest canard (read: excuse) in the FOSS zealot's book. And I say that as a regular proscelitiser myself.
How old is Red Hat 9? It was the current release till earlier this year, when they launched Fedora. So, he used a version that is a few months old. Whoop-de-fuck. 'Very old' my arse.
J.
Re:a problem with reviewers (Score:4, Insightful)
Anybody using an old version of anti-virus or anti-spam software gets what they deserve (or get's the review their advertisers want). I use spamassassin and clamav with mimedefang on my corporate gateway and you have to upgrade spamassassin regularly or more and more spam starts slipping through - this is the nature of anti-spam and I'm sure is just as true of brightmail and the others.
Re:a problem with reviewers (Score:3, Funny)
Yo Grark
Canadian Bred with American Buttering
Coming soon at Infoworld... (Score:5, Insightful)
Seriously, InfoWorld, SpamAssassin 2.44 was released in February, all the other vendors you compared were constantly updating their products to cope with the ever changing nature of spam.
John.
Logan You Better Run (Score:5, Informative)
My ISP (souther NH) runs SpamAssassin 2.6 - and I can tell you that at the default settings it catches 90-95% with
I've got one client where the run NO filter - some folks (the names GOTTA be on the web site) get up to 100 spams a day. IT are basically monkeys with hands. I have no idea what the CEO thinks. They wouldn't even think OS as they're a total MS shop.
Re:Logan You Better Run (Score:4, Informative)
Razor: Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it.
From the review:
All the products except Brightmail and SpamAssassin allow end-users to add senders to the domain whitelist themselves. Brightmail allows users to forward misidentified e-mails to the administrator, who can choose to add the sender to the whitelist. SpamAssassin allows only the administrator to add to the whitelist, with no direct access for users.
Who is missing something here? Me or the reviewer? It looks like Razor does exactly what he wants to do and claims that SpamAssassin doesn' t do. It seems to me you are right ... selectively comparing old OS with newer commercial software so that he can make claims that are factually correct about SpamAssassin 2.44 but completely missleading about the current version.
Re:Logan You Better Run (Score:3, Informative)
SpamAssassin allows only the administrator to add to the whitelist, with no direct access for users.
SpamAssassin (anything remotely resembling a current version) supports per-user whitelists and other preferences. It takes a little more skill to set up, but frankly the end result is way better than anything yo
Re:Logan You Better Run (Score:3, Funny)
Monkey's don't have hands!?!?
I get what I pay for too from reading the article. (Score:5, Informative)
Yeah all those GUI options look nice, but 90% of the time, why do I need to change my spamblocking settings? The Bayesian filter autoadjusts itself with little or no user intervention -- it's near transparent.
Re:I get what I pay for too from reading the artic (Score:3, Informative)
Is it a sin to be critical of a free product? (Score:3, Insightful)
Why is there this attitude that if your project is free, then it does not matter if it is garbage. Furthermore, you are not allowed to say it is garbage, because, after all, you don't look a gift horse in the mouth. Perhaps that is why Linux is still not on the desktop. There are plenty of people who spend days configuring theirs and then post "it works for me" comments, while the rest of us silently wonder why anyone would want to spend so muc
Re:I get what I pay for too from reading the artic (Score:3, Interesting)
Then I found out about the beauty of procmail once I looked into filtering all spam to it's own folder without email client filters. So now, I have different emails filtered to specific folders before it ever hits my inbox. Oh and I had to disable the bayesian filter, it was catching way to many not spam emails. Stuff that didn't have any keyword
Works for me (Score:5, Informative)
It works pretty well for me -- the mail server's only for my personal use so I don't really have to worry about irate subscribers sueing me for dropping them legit mail =p and the 8-12 point range in the spam marking gives me a chance to vet through those suspicious mails briefly before deleting them.
I've never tried any other spam filters on the server-side, so I can't really compare. I guess I'm also a bit of a Linux hacker so I don't mind tweaking all those config files along the lines of the FAQ and other hints on forums to get it to work the way I want it to.
Re:Works for me (Score:5, Informative)
Re:Is Running Home Server Worth It? (Score:3, Interesting)
As for maintainence, there isn't any. I set up exim two or three years ago and have hardly touched it since.
Spam Filters . . . and Eudora (Score:4, Funny)
Come to think of it, it seems to work out just fine.
Newt-dog
Sales sales sales (Score:3, Insightful)
In the end, any company is going to have to put people and tools together to get a spam solutution, or outsource it. But DIY needs people time.
Don't pay vendors for SpamAssassin, it runs quite nicely on left over PCs reloaded with Linux.
He already sent an open letter to SAtalk (Score:5, Informative)
Re:He already sent an open letter to SAtalk (Score:5, Informative)
Re:He already sent an open letter to SAtalk (Score:4, Insightful)
This is very true, of course. But has the guy considerered that this is 1:1 the case with commercial software too?
Even support providers for enterprise level software (i.e database vendors, which may charge hundreds of thousands of $, depending on the installation and support level) will never guarantee that they provide you with a solution.
Of course their sales reps have the flashier presentations though, which is a part of what you pay for.
Re:He already sent an open letter to SAtalk (Score:3, Insightful)
Re:He already sent an open letter to SAtalk (Score:3, Informative)
no wonder... (Score:5, Insightful)
Because (Score:5, Interesting)
He expected to get the results that he normally gets with most commercial software. Click Setup.exe, answer a question or two and it's done, up and running. Further configuration is not required though it may be desired.
The commercial vendors of Spamassassin have not improved the core product in any way. What they have improved is the packaging, the installation, the default configuration and the interface to modify that configuration. The stock SpamAssassin does not offer that although, Spamassassin setup is far more simple than some other packages out there.
Taken from the two articles (Score:5, Interesting)
versus
The first found Spamassassin easy, the second found it hard. Hmmm.
What really aggravates me is the typical "There are blacklists available that you can subscribe to, and some are updated regularly, but these are noncommercial lists with no guarantees." I'd like to see what guarantees the commercial lists come with.
Re:Taken from the two articles (Score:2, Funny)
Apparantly this IT consultant and author of two networking books hasn't read a single EULA.
Re:Taken from the two articles (Score:2)
The guarantee that if they don't do an acceptable job, they won't make any money, and thus have a strong incentive to please their users?
Guarantees? (Score:2)
"The SpamCop Email System will filter up to 90% of spam sent to your employees."
Thats "up to" not "at least" so I guess not much of a guarantee, but then again, they only charge $30 a year.
Critical Eye on Tech Journalists (Score:5, Informative)
Each product was tested with a different stream of mail, so the number of messages received varied, but all received enough messages to assess their capabilities.
Can you imagine someone writing "Oracle, Sybase and Postgres were compared. While the data and workloads were different, all products performed enough work to assess thier capabilities."
All the products except Brightmail and SpamAssassin allow end-users to add senders to the domain whitelist themselves.
I don't know anything about Brightmail. Spamassassin end user whitelists entries can be set up in a number of ways.
And all the products but SpamAssassin use dynamic updates to keep up with the evolving technologies spammers use to circumvent less sophisticated filters.
As aluded to in the summary, this is false with modern versions of Spamassassin, which uses Baysian filtering. (The author later says he couldn't get it working.
However, it took more than 10 times as long to install and configure SpamAssassin as it did any of the other products. [...] But just because the software is installed does not mean it will work -- filtering criteria must be added manually, and until that's done nothing is filtered out. Getting the various configuration files edited properly so that the whole package worked was not simple. Documentation was difficult to find, and not always easy to follow.
While it is true that one must be comfortable with a text editor to configure Spamassassin, thus perhaps putting it out of reach of point-and-click admins and technical journalists, I also wouldn't be prone to put my mail servers in the hands of either of those groups of people.
It looks for keywords in the subject or body of e-mails, but is frustrated by words not in the dictionary, such as "V!agra," or words that contain invisible HTML characters.
While I am not sure what tests appeared in which version, I'm pretty sure 2.44 handled off-by-one works such as V!agra. I have no idea what he's talking about when he says "invisible HTML characters", but it does seem to point to a certain technical incompetence, similar to the ostritch belief - "If I can't see you, then you can't see me."
This is not to say Spamassassin is the easiest thing in the world to deal with. I happen to love it, because of the extreme flexibility.
I just get sick of tech journos who decide that because a tool doesn't have a gui and they don't want to take the time to configure it, it sucks.
Re:Critical Eye on Tech Journalists (Score:5, Insightful)
A very large sample of mail would negate almost all of the differences caused by using a different set of mail, but I get the feeling that each of these servers ran for about a day and the results were gleaned from that.
I don't know anything about Brightmail. Spamassassin end user whitelists entries can be set up in a number of ways.
As aluded to in the summary, this is false with modern versions of Spamassassin, which uses Baysian filtering. (The author later says he couldn't get it working.)
Maybe I'm missing something or taking things that I consider basic for granted, but Bayesian filtering with SA is about as straightforward as it gets, except that instead of clicking a few buttons, you run one short command.
While it is true that one must be comfortable with a text editor to configure Spamassassin, thus perhaps putting it out of reach of point-and-click admins and technical journalists, I also wouldn't be prone to put my mail servers in the hands of either of those groups of people.
I think we've all known these types, and unfortunately they're more widespread than we'd like to think. Many simple solutions such as SA are ruled out because the admin doesn't have the skill to implement them. Note to any managers reading this: hire people with a solid background in the field, not those who list single-platform applications on their resume as "skills." Software changes, but a good administrator has the ability to adapt.
Re:Critical Eye on Tech Journalists (Score:3, Insightful)
I think the poster was creating an implicit comparison between various types of admins. Installation, configuration and maintenence of Spamassassin is simple for a skilled admin, while it may not be for an inexperienced one. It is a simple solution because well, it is, if you k
Re:Critical Eye on Tech Journalists (Score:3, Informative)
If you look at the source of most HTML spam, you'll see things like:
v<!-- the -->i<!-- brown -->a<!-- cow -->g<!-- is -->r<!-- dead -->a
The <!-- --> parts are HTML comments and thus won't be displayed to the user, but they can mess up some spam filt
sixty-two percent? (Score:5, Interesting)
To me, this statement is pretty telling. Harbaugh must get some completely different kinds of spam than me, because, even though I receive about 60 spam mails a day (directed to my "spam" folder, so I never see them until I scan the "From:" field and then delete them), maybe one per week makes it through the filter. And seeing as how I can't even remember the last time I got a false positive, that's a pretty damn good number.
I can believe that if you receive a variety of mail and if you took no time to configure SpamAssassin other than cranking it up, maybe then it'll only catch 80% of the spam. But 62%? I'm not sure if Harbaugh is skewing the benchmarks or if he just doesn't know what he's doing.
There are some legitimate issues with SpamAssassin that might not make it ready for the enterprise, but for a handful of users, I have been more than satisfied. And the price is right.
Re:sixty-two percent? (Score:2, Insightful)
Look at where the article is from!!
Infoworld.com Do you think there going to put their advertisers products down? I could tell after the first three paragraphs that the article was a sales brochure.
Re:sixty-two percent? (Score:3, Informative)
I was using 2.20 until recently. After updating to 2.60, the level of spam still coming through the filter dropped right off. It's about 1 msg. per day now, used to be at least 5 times that.
You think 2.44 is ancient? (Score:5, Informative)
Re:You think 2.44 is ancient? (Score:5, Informative)
Aside from that, installing 2.60 into your home directory is absolutely painless. Just did that, before I learned about the backports.org website.
Article lenght advertisement (Score:3, Insightful)
Sounds to me like Infoworld has an advertising contract with (at least) one of these companies. At the very least he should have checked the site for an update before he started his "tests". For a while there, I got every one of those "IT industry" hype mags (always free). While there was some good information here and there, you had to wade through a lot of advertising pretending to be articles.
I love SpamAssassin and would not consider email hosting without it. It has made my email account useable again ! For the record, it seems to catch about 80-90% of my spam, and I have never seen a 'false positive' (I do check my 'spam' folder, but less and less)
it's a matter of proper configuiration! (Score:2, Informative)
what i can highly recommend is to increase the score of MICROSOFT_EXECUTABLE as it generally is a piece
-1, Troll (Score:5, Funny)
It's just a bit too obvious that he was hoping for a severe slashdotting, driving his own numbers ("look, editor, how many people read my articles!") and the ad numbers of his paper up.
Probably submitted the story himself, too.
The review isn't as bad as slashdotters make it (Score:5, Insightful)
Seriously:
Re:The review isn't as bad as slashdotters make it (Score:2)
If he wants to pay (Score:2)
Rus
Rule #1: user intelligence >= tool (Score:2, Insightful)
It's all about the UI (Score:4, Insightful)
The bias apparent in this article and the crappy comparison chart aside this review doesn't even begin to touch base as a throughly researched opinion ion piece and ends up look like an advert for Brightmail.
However we do in the OS community face a UI problem. The missing rung on the ladder to mass acceptance is the absence of high quality UI that give users and indeed administrators of the point and drool variety a interface with the service they are seeking to use.
Before the Highly polished phpmyadmin I met serious resistance from admins for MySQL over msSQL based mostly on interface. The same goes for CUPS which has a web interface that I think has come of age if not achieve adult hood. The Webmin's are OK as long as you don't tinker to much or do anything slightly non-standard. I dislike Swat and am now so used to editing smb.conf I haven't even checked it;s working. I think that a lot of these services, apache, Spamassassin and X11 for example, could bare providing embedded configuration UI's if they aim to capture wider markets. Mandrakes X11 confugulator is very good.
I was going to mention the difficulty presented for admins with widely deployed Outlook when looking at these kind of solutions but then I though no only have sympathy where it is due. An I know that SpamAssassin could work seamlessly with Outlook but if users want a front end for white-listing then SpamAssassin isn't going to be your toy just yet.
Though we love the text based config file you may have to put a lot of working into configuration UI's if you want to enter the area as far as that reviewer and many sysadmins are concerned.
Re:It's all about the UI (Score:2)
For simple configurations (getting a printer set up assuming the drivers are already in place), CUPS is great and easy to use. Once you want to do more complicated things like authentication or SSL - which is what really makes IPP and CUPS shine - you're back to hacking text files and restarting the server.
Not Really (Score:4, Insightful)
I knew nothing about filtering spam until I installed SpamAssassin 2.6 in a multi-user environment last week. Here are my responses:
I wouldn't recommend that my grandmother install SpamAssassin, but if you have any admin skills whatsoever, it's quite easy to use it to set up effective and useful filters. Furthermore, there are enough factual errors in the article that I'm tempted to dismiss it outright.
Of course, it's possible that it got a lot better between 2.44 and 2.6, but that begs the question, why did he install 2.44?
Paid opinions are worth what they cost (Score:2)
Who is kidding who here?
install took 10 times as long...? (Score:5, Insightful)
I also like the characterization of Spamassassin as "first generation" without any supporting evidence to the fact. First generation was adding spam senders to your e-mail client's blocklist. Bayesian filtering is well beyond first generation, but spammers have learned to defeat Bayesian filtering with poison data in non-eyeball space and text obfuscation. The next generation in spam detection is to detect the Bayesian evasion features - and guess what does that!? Spamassassin (2.60).
SA+MailScanner works for me (Score:5, Informative)
I've found the easiest way to implement SpamAssassin is to invoke it through MailScanner [mailscanner.info]. MailScanner uses third-party virus scanners and can optionally invoke SpamAssassin as well. With the free ClamAV [elektrapro.com] antivirus product, you can build a powerful open source mail scanner. Even without a virus scanner, MailScanner detects and quarantines executable attachments and other dangerous content which represent the most common types of mail-borne viruses and worms.
RedHat installs the daemonized version of SA as well as the SA Perl scripts. Using the daemon, the easiest implementation is to invoke SA in /etc/procmailrc on the mail delivery host; for mail gateways running sendmail, you need to use the milter interface. I've found the MailScanner+SpamAssassin approach much easier to configure than either of these methods, and you get virus scanning to boot!
I suspect if the reviewer had compared SA 2.60+ to the commercial products, rather than the older 2.44 version used in the review, SA would have shown better results.
I'd agree with the reviewer that one of the things SA lacks is an easy method for users to interact directly with the program. (Part of the issue has to do with security; SA runs as root. As I read the review, I wondered how the other products allow users to interact directly with the scanners without sacrificing security.) It's not easy to maintain per-user Bayesian filtering, for instance, but I generally recommend having the mail client, e.g., Mozilla [mozilla.org], handle these tasks.
Thanks for the reminder!! (Score:3, Interesting)
Old, and on the list (Score:3, Informative)
The current version of spamassassin is 2.60.
Try the Custom Rule Emporium! (Score:4, Informative)
Since then, I've downloaded a bunch of rules from The SA Custom Rule Emporium [merchantsoverseas.com] and almost nothing gets through.
If this guy had trouble, it is the fault of the documentation, not the product. Either that, or he was dumb enough not to upgrade to perl 5.8 or above, and spent forever installing modules.
He says:
Funny how when you install an old version of the product, it seems outmoded, hmmm?
Sheesh.
Pixie
Man could he be more wrong.... (Score:2)
He was trying to make a point (Score:4, Interesting)
Notice that he deliberately took a standard install from RedHat 9, something some IT person (Not a tr00 g33k) might buy at CompUSA. He then tried to install the provided product. Clearly, a tr00 g33k would go and download the latest release, but keep in mind that not everyone is so comfortable with being on the bleeding edge - I believe that this was a point he tried to make. There is also the perception that the release provided with a "product" such as RedHat 9 will be up to the same standards as the OS.
While it's true the latest version has default rules and whatnot - it's quite likely that his older, more out of date version does not. In fact, going briefly to the spamassin home page the links for the 2.5 and 2.4 release documentation are broken.
The point to be made was: OSS needs to be more buttoned up. Notice that he said that he had no trouble installing redhat 9. That's becuase the installer is rather good.
Commercial Guarantees, eh? (Score:5, Insightful)
11. LIMITED WARRANTY FOR PRODUCT ACQUIRED IN THE US AND CANADA.
Microsoft warrants that the Product will perform substantially in accordance with the accompanying materials for a period of ninety days from the date of receipt.
YOUR EXCLUSIVE REMEDY. Microsoft's and its suppliers' entire liability and your exclusive remedy shall be, at Microsoft's option from time to time exercised subject to applicable law, (a) return of the price paid (if any) for the Product, or (b) repair or replacement of the uct, that does not meet this Limited Warranty and that is returned to Microsoft with a copy of your receipt.
Note that a) no updates or fixes are guaranteed, b) your only remedy is media replacement or a refund, and c) this choice of remedy is up to Microsoft.
I love it when people claim that you're taking a huge risk with open source software without guarantees. Microsoft says their software will work, but isn't saying that if their software doesn't work, they have to fix it.
Light weight alternative (Score:2)
I run a personal mail server (Debian on a P-75 w/ 32MB) which most of the time is just fine. If for some reason I stop Yahoo forwarding my messages and then catch up later with fetchmail, I have to stop spamd. If I don't then I have hit the power button as SpamAssassin will consume all memory and CPU and then some. Even if I hit Ctrl+Alt+Del, it will still be thrashing 6 hours later. It's kind of annoying... so any recommendations for alte
modifying subjects and other content (Score:3, Interesting)
I know you're just joking, but to be serious for a minute, the reason not to do that is because you'd be transparently altering someone else's copyrighted property. Overzealous and/or overworked sysadmins misconfigure SA to globally analyze all incoming content and then to alter email subjects based on its opinion. This is an invasion of content, certainly prone to false positives because antispam scanning is an individually trained process, and breaks the trail of reply threads at least on a visual basis. There are always going to be tons of misconfigured or RFC ignorant smtp servers out there, and being compatible with them is what makes the Internet work. That would include corporate servers, legitimate opt-in bulk mail, and opt-in mailing lists run by Some Dude. There will be people on a mailing list whose personal content is always publicly marked by certain recipients as spam! It's confusing, insulting, and unnecessary. SMTP has invisible meta-tags in its headers to allow for that, and agents are supposed to respect them.
This is fine for using SA's global config as your personal config for your own little systems, but not for an ISP or business.
According to spamassassin.org:
Arsehole (Score:2, Interesting)
Spam assasin is on my server and is absolutely brilliant.. it catches 99.9% of all my spam, and has only on 5-10 occasions in the past month (i get about 50-60 emails a day) counted 'innocent' mail as spam... and even those were newsletters....
Anyone who slates SpamAssasin is one very deluded person... its Open Source, constantly improved... open to editing by it's users, rules can be added.... marvellous.
Commercial variants ive seen have been painfully ba
Personalized Bayesian training (Score:3, Informative)
-- casual readers may skip the following details
In an attempt to mitigate this, SA makes an unfortunate mistake in its unsupervised learning algorithm - it uses a different set of rules for training than it uses for marking mail as spam or not. So you can easily have email marked as spam but have the system trained as non-spam (or vice versa). This introduces systematic bias into the learning so that spam detection can get worse in the long run. As a further attempt to mitigate this problem, the learner uses a higher spam threshold, so many spams that are correctly marked do not contribute to the learning process. There is no way to set the SA configuration parameters to eliminate these biases (setting the learn threshold does *not* do it).
--- end of gory details
It is not too difficult to set up SA for personalized learning. Just pipe your mail to the following command:
spamassassin -e
If the return code is 0 (non-spam) also pipe the mail to
sa-learn --ham --single
If the return code is 1 (spam) pipe to
sa-learn --spam --single
If you do this you are guaranteed that the statistics recorded in your personal bayes db correspond exactly to the judgements made by SA.
In addition to this you must correct SA when it makes a mistake, by piping the message to sa-learn again with the right flag. You may be able to set up a macro in your mail reader to do this.
This isn't as easy to set up as it should be, but it is *very* effective.
In the last year I've received 20,000 non-spam and over 100,000 spam messages & viruses (30,000 if you eliminated the "Cumulative Update" messages, which SA caught just fine.) About 100 spams have gotten through (a couple a week) and about 10 false positives have occurred. All of the false positives have been 'weird' - advertising, automatic responses, or web pages that were forwarded to me. As far as I know (and I do check periodically) I've had no false positives in the last 50,000 spams.
My preliminary analysis indicates that personalized learning reduces both false negatives and false positives by a factor of ten. I'll report more systematic analysis in due course.
what is it with those guys? (Score:3, Insightful)
PC Magazine [pcmag.com].
Spamgourmet [spamgourmet.com] (open source and free to use) was lined up against several commercial offerings, and was rated the lowest. It was clear from the review that he didn't spend much time learning about how spamgourmet works -- he wound up faulting it for perceived problems that were addressed by features that he ignored in the review.
Not to be cynical, but if I were a tech reviewer, I might be afraid of lawsuits resulting from my reviews -- open source projects have no revenue, and therefore can't prove up any damages in court. This might make me more likely to choose the open source alternative to get the shaft. Hopefully that's not what's going on here, but you've got to wonder...
My letter to the author (Score:5, Insightful)
Mr. Harbaugh,
This letter is in response to your InfoWorld article titled "Commercial solutions win, spam loses." In that article you portray all commercial spam solutions as winners and you portray the only open-source spam solution you reviewed as a dismal failure. I must say that as a professional in the anti-spam field I'm am truly disappointed by your incomplete and inaccurate assessment.
You start the article off quite well. Your introduction regarding two of the possible types of spam filtering is in terms that the average reader can understand. The introduction is also technically accurate, although it doesn't mention the other ways to filter spam.
You quickly take an opportunity to kick dirt on SpamAssassin by claiming it filters a fraction of the amount of spam all the commercial solutions filter. You hint at something during that statement when you said that SpamAssassin's "age showed in my tests," yet you fail to actually make it apparent to the user what the real truth is. I must ask, why did you choose to compare such an ancient version of SpamAssassin to the current versions of the four commercial products? Version 2.44 is over 9 months old. Spam filtering techniques are constantly evolving to filter a continually changing target. Comparing a 9.5 month old copy of SpamAssassin to the current version of BrightMail is like comparing a 1990 Chevy Silverado to a brand-new 2004 model. As an author and professional in the IT industry writing a column for InfoWorld, one of your goals is accuracy and fairness in reporting, is it not?
You make numerous false statements regarding SpamAssassin in your article:
1) "All the products except Brightmail and SpamAssassin allow end-users to add senders to the domain whitelist themselves... SpamAssassin allows only the administrator to add to the whitelist, with no direct access for users."
This is simply not true. SpamAssassin allows its users to add whitelist or blacklist entries to the personal preferences. It also allows its users to control the scoring for each individual ruleset with SpamAssassin's arsenal. Even the ancient version of SpamAssassin you chose to use had that simple feature. SpamAssassin also has the ability to automatically whitelist senders.
2) "Delegation of specific administrative functions is possible with all the products except SpamAssassin..."
This too is not true. As I said in response to number 1, SpamAssassin allows its users to control the scoring for each individual ruleset. This gives them the ability to disable certain rules, lessen the scores of others, and increase the scores of rules they wish had more weight. For example a user could disable the MAPS RBL DNS blacklist checks, whitelist joe@mydomain.tld, blacklist annoying-spammer@spamdomain.biz, and increase the score of the rule ALL_CAP_PORN to 2. The users can also create their own rulesets. SpamAssassin gives its users a high level of control over their spam filtering.
3) "Finally, in addition to stopping spam, all four commercial products provide content-filtering features, allowing the administrator to block incoming or outgoing e-mail that contains proprietary data, audio or video files, executables, sexually explicit words, or racial slurs. They also provide protection against DoS attacks and directory harvesting attacks."
This one baffled me at first. I'm honestly not sure why you want to compare features that have nothing to do with filtering spam. Filtering racial slurs from an email is
The author replies..... (Score:3, Informative)
Re:What is a good client-side spam filter for Outl (Score:5, Informative)
Re:What is a good client-side spam filter for Outl (Score:4, Informative)
Given I get over 100 spams a day and I see non of them I am very happy with this indeed.
Re:What is a good client-side spam filter for Outl (Score:2, Insightful)
Re:What is a good client-side spam filter for Outl (Score:2)
Re:What is a good client-side spam filter for Outl (Score:4, Informative)
Spam Bayes Rules! (Score:2)
Its a great product.
Re:What is a good client-side spam filter for Outl (Score:5, Funny)
Best one yet!
POPFile (Score:4, Informative)
POPFile is easy to use. It also performs Bayesian filtering. It is what I use.
http://popfile.sourceforge.net/
My current POPFile statistics:
Messages classified: 1,440
Classification errors: 19
Accuracy: 98.68%
Re:POPFile (Score:3, Insightful)
> Classification errors: 19
> Accuracy: 98.68%
That's nice, but it's really important to break it down between false positives and negatives. I get over 200 spams a day (before filtering), and while it's quite tolerable for 2 or 3 of those to get through, missing that many legitimate messages a day is not.
Re:What is a good client-side spam filter for Outl (Score:3, Informative)
Re:Photo of Author (Score:2)
spamassassin-2.44-11.8.x.i386.rpm (Score:4, Insightful)
To moderators. When you mod something "informative", please check the facts first. Spamassasin in RH 9 is 2.44.
Re:spamassassin-2.44-11.8.x.i386.rpm (Score:3, Funny)
Re:The algorithm (Score:3, Interesting)
I know someone who did a project on classifying video using Bayesian filtering. It looked at stuff like brightness, contrast, volume, basically everything they could extract from the movie file and give a value to. The concept itself is quite powerful; the difficulty is