- From: Judson, Ross <rjudson@managedobjects.com>
- Date: Thu, 31 Jul 2003 15:21:49 -0400
- To: <www-rdf-interest@w3.org>
One of the barriers to acceptance for RDF is the lack of RDF annotation out there on the web. I thought I'd toss out this idea and see what the group thinks. I use a Bayesian classifier to reduce spam. It filters my incoming mail divides that mail into categories (spam/ham). These classifiers are, in general, pretty darn accurate. Now imagine that someone builds a big corpus of URIs, and starts to categorize them. Someone like, say, Google, or DMOZ. The text of each URI is used to create Bayesian classifiers. Google then provides a simple service that, given a URI, reads the text at the URI and performs classification (via a nested set of classifiers), marks up the contents with RDF, and returns the result. This amounts to a best guess, dynamic assignment of semantics to that URI. The corpus can be tuned to produce better and better results, but in general, the scheme should be pretty accurate. Of course, Google already has a huge database of word/count/frequency information which can be used to seed this process, to considerable effect. The probability-based approach should yield substantially better results that keyword identification schemes. RJ
Received on Thursday, 31 July 2003 15:24:46 UTC