Re: Finding RDFa content on the web

We could propose a 'this page uses RDFa' button, or maybe better, the use  
of a <link>, something like:

	<link rel="profile" href="http://www.w3.org/2006/rdfa" />

Steven Pemberton

On Mon, 29 May 2006 19:23:48 +0200, Tim Finin <finin@cs.umbc.edu> wrote:

>
> We'd like to extent our Swoogle semantic web search engine
> [1] to find and index content encoded in RDFa.  Swoogle's
> database currently has extensive metadata on about 1M RDF
> documents  and 350K HTML documents with embedded RDF.
>
> If we can develop en effective way to discover XHTML
> documents with RDFa content, Swoogle could be used to track
> and monitor RDFa's adoption, who is using it and how it's
> being used.
>
> Our problem is how to find pages likely to have RDFa
> content.  Swoogle doesn't exhaustively crawl the Web for
> documents with semantic web content but instead uses an
> adaptive hybrid strategy [2] that starts with conventional
> web search engines to discover initial seed documents.
>
> The basic approach is to (1) use Google to find initial seed
> documents; (2) drill down with subsequent site-specific
> queries to find more; (3) employ a focused HTML crawler to
> discover yet more; and (4) use an RDF scutter to discover
> still more.
>
> My question is, are there Google queries that will be useful
> for finding XHTML documents with RDFa content?  For example,
> a Google query file 'rdf -rss filetype:rdf' produces lots of
> RDF documents. I tried searches like '"rel=" "html xmlns:"'
> but virtually all of the of the documents found are using
> conventional uses of the rel attribute.
>
> If anyone has suggestions for search engine queries that
> might be good at finding RDFa content, please let me know.
> If there aren't any, maybe it would be good to develop a
> convention by which an XHTML document can assert that it has
> RDFa content and to encourage it's use as a best practice.
>
> Tim
>
> [1] http://swoogle.umbc.edu/
> [2]  
> http://ebiquity.umbc.edu/paper/html/id/304/Search-Engines-for-Semantic-Web-Knowledge
>
>

Received on Monday, 29 May 2006 19:54:14 UTC