- From: Tim Finin <finin@cs.umbc.edu>
- Date: Mon, 29 May 2006 13:23:48 -0400
- To: public-rdf-in-xhtml task force <public-rdf-in-xhtml-tf@w3.org>
- CC: Tim Finin <finin@cs.umbc.edu>, Li Ding <dingli1@cs.umbc.edu>
We'd like to extent our Swoogle semantic web search engine [1] to find and index content encoded in RDFa. Swoogle's database currently has extensive metadata on about 1M RDF documents and 350K HTML documents with embedded RDF. If we can develop en effective way to discover XHTML documents with RDFa content, Swoogle could be used to track and monitor RDFa's adoption, who is using it and how it's being used. Our problem is how to find pages likely to have RDFa content. Swoogle doesn't exhaustively crawl the Web for documents with semantic web content but instead uses an adaptive hybrid strategy [2] that starts with conventional web search engines to discover initial seed documents. The basic approach is to (1) use Google to find initial seed documents; (2) drill down with subsequent site-specific queries to find more; (3) employ a focused HTML crawler to discover yet more; and (4) use an RDF scutter to discover still more. My question is, are there Google queries that will be useful for finding XHTML documents with RDFa content? For example, a Google query file 'rdf -rss filetype:rdf' produces lots of RDF documents. I tried searches like '"rel=" "html xmlns:"' but virtually all of the of the documents found are using conventional uses of the rel attribute. If anyone has suggestions for search engine queries that might be good at finding RDFa content, please let me know. If there aren't any, maybe it would be good to develop a convention by which an XHTML document can assert that it has RDFa content and to encourage it's use as a best practice. Tim [1] http://swoogle.umbc.edu/ [2] http://ebiquity.umbc.edu/paper/html/id/304/Search-Engines-for-Semantic-Web-Knowledge -- Tim Finin, Computer Science & Electrical Engineering, Univ of Maryland Baltimore County, 1000 Hilltop Cir, Baltimore MD 21250. finin@umbc.edu http://ebiquity.umbc.edu 410-455-3522 fax:-3969 http://umbc.edu/~finin
Received on Monday, 29 May 2006 17:24:01 UTC