- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Mon, 29 May 2006 21:54:05 +0200
- To: "Tim Finin" <finin@cs.umbc.edu>, "public-rdf-in-xhtml task force" <public-rdf-in-xhtml-tf@w3.org>
- Cc: "Li Ding" <dingli1@cs.umbc.edu>
We could propose a 'this page uses RDFa' button, or maybe better, the use of a <link>, something like: <link rel="profile" href="http://www.w3.org/2006/rdfa" /> Steven Pemberton On Mon, 29 May 2006 19:23:48 +0200, Tim Finin <finin@cs.umbc.edu> wrote: > > We'd like to extent our Swoogle semantic web search engine > [1] to find and index content encoded in RDFa. Swoogle's > database currently has extensive metadata on about 1M RDF > documents and 350K HTML documents with embedded RDF. > > If we can develop en effective way to discover XHTML > documents with RDFa content, Swoogle could be used to track > and monitor RDFa's adoption, who is using it and how it's > being used. > > Our problem is how to find pages likely to have RDFa > content. Swoogle doesn't exhaustively crawl the Web for > documents with semantic web content but instead uses an > adaptive hybrid strategy [2] that starts with conventional > web search engines to discover initial seed documents. > > The basic approach is to (1) use Google to find initial seed > documents; (2) drill down with subsequent site-specific > queries to find more; (3) employ a focused HTML crawler to > discover yet more; and (4) use an RDF scutter to discover > still more. > > My question is, are there Google queries that will be useful > for finding XHTML documents with RDFa content? For example, > a Google query file 'rdf -rss filetype:rdf' produces lots of > RDF documents. I tried searches like '"rel=" "html xmlns:"' > but virtually all of the of the documents found are using > conventional uses of the rel attribute. > > If anyone has suggestions for search engine queries that > might be good at finding RDFa content, please let me know. > If there aren't any, maybe it would be good to develop a > convention by which an XHTML document can assert that it has > RDFa content and to encourage it's use as a best practice. > > Tim > > [1] http://swoogle.umbc.edu/ > [2] > http://ebiquity.umbc.edu/paper/html/id/304/Search-Engines-for-Semantic-Web-Knowledge > >
Received on Monday, 29 May 2006 19:54:14 UTC