RE: Survey of RDF data on the Web from Dan Brickley on 2002-08-19 (www-rdf-interest@w3.org from August 2002)

From: Dan Brickley <danbri@w3.org>
Date: Mon, 19 Aug 2002 09:23:02 -0400 (EDT)
To: Andreas Eberhart <andreas.eberhart@i-u.de>
cc: Danny Ayers <danny666@virgilio.it>, <www-rdf-interest@w3.org>
Message-ID: <Pine.LNX.4.30.0208190912210.9559-100000@tux.w3.org>

On Mon, 19 Aug 2002, Andreas Eberhart wrote:

>
>
> Hi Danny,
>
> > The paper (2.4) states that "RDF subjects, predicates and most objects are
> > URLs themselves..."  - errm, not!
> > However it's interesting that you did get a good number of links
> > using this
> > assumption.
>
> oops, you're right. It basically was a (desperate) attempt to find more RDF.

Have you looked at having your crawler traverse rdfs:seeAlso references?

Hunting for RDF in the general Web is like looking for the proverbial
'needle in a haystack'. If you start in a Web of interconnected RDF
documents, a large part of the discovery problem vanishes.

That's the approach I've been taking anyhow. See for example,
http://rdfweb.org/people/danbri/rdfweb/danbri-foaf.rdf which contains
markup such as... [[
...
 <knows>
  <Person foaf:name="Edd Dumbill" foaf:nick="edd">
  <rdfs:seeAlso web:resource="http://heddley.com/edd/foaf.rdf" />
  <foaf:mbox web:resource="mailto:edd@usefulinc.com" />
  <foaf:mbox web:resource="mailto:edd@xml.com" />
  <foaf:mbox web:resource="mailto:edd@xmlhack.com" />
  <foaf:homepage web:resource="http://heddley.com/edd/" />
  </Person>
  </knows>
]]

...and if you dereference http://heddley.com/edd/foaf.rdf you'll similarly
find more cross-references to other RDF documents in the Web (hence
the name, "RDFWeb").

There are currently only a few hundred such documents in the
RDFWeb/FOAF testbed, but it's been enough to convince me that the general
approach has merit, and that a Web of small, independently maintained RDF
documents can be more useful than having a few huge KBs dumped out in
RDF/XML format.

I really don't think finding RDF will be a problem. The trick will be in
finding the _relevant_ RDF. For this, I think we need a combination of
using rdfs:seeAlso, of mentioning the type of the thing described in the
referenced document (eg. Person, Company etc), and of mentioning a type of
the referenced document (eg. CV, Bibliography). Such hints make it easier
to build smarter crawlers that won't be overwhelmed by the -- as yet
nonexistent ;) -- mass of RDF data out there.

Dan

Received on Monday, 19 August 2002 09:23:06 UTC