Re: SemWeb Non-Starter -- Distributed URI Discovery from Stephen Rhoads on 2005-03-22 (semantic-web@w3.org from March 2005)

From: Stephen Rhoads <rhoadsnyc@mac.com>
Date: Tue, 22 Mar 2005 18:18:22 -0500
To: semantic-web@w3.org
Message-ID: <4114100.1111533502153.JavaMail.rhoadsnyc@mac.com>
On Sunday, March 20, 2005, at 11:41PM, Charles McCathieNevile <charles@sidar.org> wrote:

> So you not only want to query what the owner of a URI says about it, but   
> also what others have said about it (IMHO a more important query in   
> general). And then you want to be able to find contradictions and use them   
> to decide which sets of statements are more reliable in some other case,   
> or apply what you have learned elsewhere to resolving conflicts in   
> statements you have gathered.

Yes, but how can *anyone* make statements about a URI in the first place until they know what thing or concept it is intended to represent!

I think we need a new term here, say, a "URI Originator".  "URI Owner" gives the impression that the person or organization that minted the URI somehow "owns" the URI and thus is the only entity entitled to make statements about it.  We’re all aware that one of the empowerments of the Semantic Web is that "anyone can say anything about anything".  That's not in dispute.  But there has got to be a way to reliably query the Originator of a URI in the first place in order to determine what it *is* and what the Originator has to say about it.

So, with that settled, let’s restate the problem:

At present, there is no formal, generalized mechanism whereby a Web Agent, upon discovery of a URI, and lacking knowledge about that URI, can query the Originator of the URI in order to obtain an RDF description of the URI.

Let’s take an example from my domain of discourse:

Internet radio and television broadcasters will use the Digital Media Metadata Ontology to publish information about their radio and television channels.  That data, in turn, will be collected by aggregators and served up to the public via Web interfaces.

The ontological definition of a Channel calls for it to have Timeslots which in turn have Content.

A “Day 1” implementation will probably see the Broadcaster creating a “Local Definition” of some Content because the SemWeb is not yet widely deployed and the Content in question probably has not been defined elsewhere on the Web:

<dmmo:TelevisionChannel rdf:about=”http://www.hbo.com/rdf/hbo”>
   <dmmo:hasTimeslot>
      <dmmo:Timeslot>
         ...
         <dmmo:hasContent>
            <dmmo:Movie>
               <dmmo:hasTitle>Meet the Parents</dmmo:hasTitle>
               <dmmo:hasGenre>
                  <dmgo:Comedy/>
               </dmmo:hasGenre>
            </dmmo:Movie>
         </dmmo:hasContent>
      </dmmo:Timeslot>
   </dmmo:hasTimeslot>
</dmmo:TelevisionChannel>

Further down the road, we’ll probably see Content Owners providing RDF Descriptions of their Content and a Broadcaster can simply reference the relevant URI:

<dmmo:TelevisionChannel rdf:about=”http://www.hbo.com/rdf/hbo”>
   <dmmo:hasTimeslot>
      <dmmo:Timeslot>
         ...
         <dmmo:hasContent rdf:resource=”http://www.dreamworks.com/rdf/parents”/>
      </dmmo:Timeslot>
   </dmmo:hasTimeslot>
</dmmo:TelevisionChannel>

So, as an aggregator, how can I easily and reliably find out more information about “http://www.dreamworks.com/rdf/parents”?  Is it a Movie?  A TelevisionProgram?  An Episode of a TelevisionSeries?  What’s its title, genre, synopsis?  Let’s face it, If I’m a programmer today and I need to obtain an RDF description from the Originator of a URI my thought process goes something like this:

“Ooh, this URI looks interesting, but I don’t know anything about it.  Hopefully the Originator put up an RDF description somewhere.  Let’s try an HTTP GET on the URI.  Ouch, HTML.  Didn’t want that.  How about with “Accept:  application/rdf+xml”.  No dice.  OK, let’s try an HTTP GET on the URI minus the ending hash or slash.  No?  OK, let’s try that with “Accept:  application/rdf+xml”.  Still no dice.  Maybe they put up an “index.rdf” at the root of the domain.  Nope.  Fine.  I’ll spider their entire domain starting at the root and following HTML links in header and body.  Gee I hope someone wrote that code already because, if they didn’t, I’ll have to write it and it will be one more obstacle to me getting this killer SemWeb application out the door sometime this decade.”

Further, if we're going to empower the SemWeb to take off the way the Web did, it's got to be simple enough for people with limited technical ability to grasp and partake.  That's why I’m beginning to see the benefits of the document-centric hash approach.  Any 'ol Cat with a text editor and a copy of "RDF for Dummies" can toss some RDF statements into a file.  The base of the document plus fragID serves as a URI for each resource defined within *and* gives a Web Agent that may discover a reference to the URI a good hint at where to find an RDF description of the URI ... in the document.

Or maybe all we need is some sort of convention akin to “index.html” or “robots.txt”.  Toss your RDF statements into a file named “index.rdf” at the root of your domain so that others can easily and reliably discover information about the URIs of which you are the Originator.
Received on Tuesday, 22 March 2005 23:18:24 UTC