Why URIs in RDF?

> I also have no idea what HTTP
> GET has to do with knowledge representation or RDF. 
> If you see an assertion about "http://www.25hoursaday.com", whether you
> consider it an assertion about me personally or the web page (i.e. the
> representation) seems completely orthogonal to what HTTP GET can or
> cannot retrieve. 
> Some enlightenment would be appreciated. 

Yeah, this is the other half of the issue.  On the one side, people
thinking about the HTTP protocol don't care about how URIs are used
outside of HTTP.  When they hear someone saying, "Please visit
http://mystore.example.com" it probably seems like sloppy language,
instead of a sign that the person is using an ontology where web sites
are locations.  [ I happen to think information-bearing locations
(like bulletin boards -- where has that analogy been used before?) are
part of a great model for the web. ]

On the other side, people using URIs as constant symbols in a KR
language like RDF don't care what the URIs mean.  Formal logic is all about
manipulating sentences without caring what the terms denote.  The RDF
entailments of "<http://mystore.example.com> <foo> <bar>" really have
nothing to do with the web.

But ... both HTTP and RDF use URIs!  Why is that?  We know why HTTP uses
URIs, but what purpose do URIs serve in RDF?

The RDF Concepts draft [1] says:

    The expressive power of RDF corresponds to the
    existential-conjunctive (EC) subset of first order logic [Sowa].

    Through its use of extensible URI-based vocabularies, RDF provides
    for expression of facts about arbitrary subjects; i.e. assertions
    of named properties about specific named things. A URI can be
    constructed for any thing that can be named, so RDF facts can be
    about any such things. 

That second paragraph is crucial if vague.  Try actually communicating
in first-order logic (let alone the EC subset) and you immediately
realize that you and your audience must share an interpretation (a
mapping of identifiers to the things they identify), or at least some
of it.  An automated reasoner doesn't need to know anything about your
interpretation, but your audience sure does.

In theory, we could address this with a central registry.  We could
set up the Semantic Web Public Terms Registry.  When you wanted to
talk about rain, you'd search the registry for a term denoting rain.
If you didn't find one, you could apply to have one added.  You could
pick any unused term and provide a definition for it, preferably in at
least three natural languages.  :-) The fee would be perhaps $25 to
cover processing costs; the term and definition would be added to the
database and published on the Web and monthly CD-ROMs.  Maybe bulk
registration would be $30 plus $1 per entry.  No editing would be
done, and there would be no checking to see that your definition was
even intelligible; the registry would simply maintain a many-to-one
mapping from term-strings to their definitional text.

And this would be enough.  Now you could talk, in FOL, about the rain
in spain falling mostly on the plain, and your audience could
understand you.  The people would look in the registry and read the
definitions.  The software agents in the audience would understand
what you said if and only if they had been programmed with an
understanding of those exact terms.  That programming would be done by
humans who had looked in the registry.  (Some natural-language agents
might try to read the definitions; they'd be acting more like humans.
Whatever.  I'll believe that when I see it.)

But we don't have this registry.   Instead we use URIs.


How do you pick an unused term in URI space?  How do you share your
definition with the world?

Well, you pick a term by getting some web space.  You get your own
domain name, or use a web hosting service, or ask friend for some of
her URI space.  Then you have all the terms you could ever want, for
under $100 per year.

And...  how do you share your definitions with the world?

Approach #1: You form a W3C Working Group to publish a Recommendation
which provides the definitions.  This is a rather expensive solution.
My quick calculation while warming some soup in the microwave puts the
cost of this approach at over $10,000 per term.

Approach #2: You put your definitional text on the web at the given
URI.  Approximate cost, from one datapoint [2], is $7.95/month for
however many terms you can define in 400 megabytes.

The difference in cost is, of course, due to the fact that the first
method requires extensive technical, editorial, and political review.
But there's no reason to make that review manditory.  I certainly
don't want that kind of review when I'm assigning URIs to my pets.
(And I'm not sure I could afford it, even after selling the house.)

My points:
  1.  Producers and consumers of RDF information need to know what
      many of the URIs in a document denote, if useful communication
      is to occur.  They can learn that nicely by doing a web lookup
      on the URIs.

  2.  It is not practical to communicate meaning entirely via
      standards committees.   That just might be okay for ontologies,
      but certainly not for individual objects which you mention in
      instance data.  

Meanwhile, neither the RDF Core WG nor the WebOnt WG really follows
this argument, for a variety of reasons.  They (not me) are the ones
who speak for the W3C (when they reach Rec, at least), but I happen to
think they're too focused on the details and have forgotten the big
picture, presented above.  They seem to like an intermediate model
where you don't just http GET the URI to find out about it, you have
to follow rdfs:isDefinedBy and owl:imports links.  That works, but
it's silly extra overhead.  Done that way, there is zero advantage
[and a lot of confusion] to using http URIs instead of a scheme
explicitly devoid of resolution protocols, like uuids or tag URIs

I hope that offers at least a little light....

   -- sandro

[1] http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-SimpleFacts
[2] http://www.ipowerweb.com/
[3] http://www.taguri.org/ (named before the TAG came along)

Received on Tuesday, 14 January 2003 23:35:59 UTC