- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 16 Aug 2011 01:35:37 +0200
- To: www-archive@w3.org
Hi, I get a lot of email. That is largely because Humanity seems to be bad at developing decent messaging infrastructure and user interfaces. Group discussions for instance do not fit nicely with the E-Mail protocols. It would be better to use a protocol more like NNTP for those, at least for my uses (it is easy to see, of course, how low volume users might prefer a setup that is entirely unworkable for very high volume users). Anyway. When spam levels surpassed half a million of messages per year, I moved spam detection into the "cloud", as they would now call it, and did not care to keep an eye to the statistics as I used to. The past couple of weeks I took a close look though, and the numbers come out nicely along the lines of the following, with some rounding. X messages per 30 minutes identified as ham X messages per 60 minutes identified as spam X messages per day false negatives X messages per week false positives The false negatives figure is so high because I do not train the filter with false negatives as the systems are largely disconnected. Oddly it's mostly spam in non-latin languages and daily newsletters I've obviously never subscribed to (newsletters, too, would be better suited for a pull medium resembling NNTP but with more centralization than the Atom world usually employs when not using a shared feed system that does resemble NNTP more like per-user-pull). If I did train the filter for false negatives I am pretty sure the rate would match the rate of false positives. I gather globally the ham/spam ratio is more the other way around, which is easily explained by volume of ham in my particular inbox. Even without the training the accuracy is at 98%, and would be at 99.7% if I am correct about my assumption. That matches the usually claimed figure of "99%". The false positives I note come mostly from the same people or are due to configuration errors on my part (I have a forwarding setup that is incompatible with SPF), so it seems rather plausible to move accuracy to 1 in 1000 levels. regards, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Monday, 15 August 2011 23:36:06 UTC