Semantic Web Requirements (for URIs) from Sandro Hawke on 2003-01-24 (www-tag@w3.org from January 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 24 Jan 2003 01:55:04 -0500
To: "Roy T. Fielding" <fielding@apache.org>, Tim Berners-Lee <timbl@w3.org>, www-tag@w3.org
Message-Id: <200301240655.h0O6t4f27068@wadimousa.hawke.org>
Maybe it'll help to revisit the old Semantic Web Requirements
Document.  Ooops, it hasn't been written yet.  Oh well.  Let's see if
I can write the relevant parts of one on the fly.....

1.  We need to be able to generate identification strings for 
    long-lived knowledge bases, and then do something like

        insert(kbIdent, formula)
	retract(kbIdent, formula)
	query(kbIdent, formula)

    from anywhere on the net and get roughly the same behavior with
    the same parameters.  (There are also lots of issues about
    connectivity, access control, what KR language to use for the
    formula, etc.)

    You can think of these as database operations if you prefer.

    This looks like HTTP and http: URIs if you squint a little.  For
    now HTTP GET of an RDF/XML file is query-all and HTTP PUT is
    retract-all and insert(PUT's parameter).  We can shuffle them
    together better with some more work.

2.  We need to be able to generate new unique strings, to use as
    constant terms/symbols in our formulas.  They have to be strings
    which have never been used before in one of these formulas,
    anywhere.  Each time we want to talk about something for which we
    don't already know a constant symbol, we'll probably want one of
    these.  (If we don't want other people to re-use the name we give
    it, we can assign a local-scope name (an existential variable,
    what RDF sometimes calls a blank node), but that's probably best
    avoided since linking is good.)

    This looks like UUIDs or tag: URIs.  You could use something else
    like http: URIs, but they give us lots of extra features and
    baggage which we don't need.

That's where I was two years ago, and I was happy with it (except that
I couldn't get an RFC on tag: URIs published).  Then TimBL pointed out
that tag: URIs were not "clickable".  You couldn't get a
representation.  And I said "that's the point -- you don't need to get
a representation -- you're just trying to make a new logic symbol!"
But eventually I figured out that he wanted the strings from
requirement #2 to lead automatically to one of the KBs in requirement
#1.

So instead of saying "I'd like to buy widget-435353", we should always
say something like "I'd like to buy
widget-435353-which-you-can-learn-about-by-calling-1-800-BUY-WIDG."

Thus, a 3rd requirement:

3.  There should be a cheap and fast way to get from a constant symbol
    to the kbIdent of a KB which can give us some authoritative
    information about it.
    
    The cheapest/fastest way to do this is string manipulation.  TimBL
    proposes that URIs with a "#" in them be considered constant
    symbols and URIs without a "#" are kbIdents, and you get to the
    kbIdent from the constant symbol by truncating at the "#".  Fast
    and simple.

    Another approach is to use rdfs:isDefinedBy, giving a triple for
    each constant term, linking it to an authoritative kbIdent.  So
    you say

       <a> <b> <c>.

    and then, to let people know something about what those terms
    mean, you add

       <a> rdfs:isDefinedBy <http://....>.
       <b> rdfs:isDefinedBy <http://....>.
       <c> rdfs:isDefinedBy <http://....>.
 
    That has two problems.  First, the <http//....> terms look
    syntactically like constant symbols, not like kbIdents.  How do
    you know where THEY are defined?  That problem can be addressed by
    redefining rdfs:isDefinedBy to have a range of uri-string instead
    of resource.  That gives us:

       <a> <b> <c>.
       <a> rdfs:isDefinedBy "http://....".
       <b> rdfs:isDefinedBy "http://....".
       <c> rdfs:isDefinedBy "http://....".

    That works, but feels a little like if web addresses were UUIDs,
    and along with the UUID you were given an address of where you
    could get information about it.  Kind of like it's missing the
    point of URIs.

So I was using Tim's approach for a while, but I started to want to
make the content *really* clickable.  If I give you a URI for an
action item you agreed to last week, you should be able to get some
readable information even without special RDF software.

4.   One should be able to use content-negotation to offer structured
     data in RDF *and* HTML at the same URI.   When you ask for HTML,
     you get some nice little HTML tables or diagrams saying the same
     thing as the RDF.   It's easy enough. 

     [ Why at the same URI, you ask?  Why not have the HTML at the
     more public URI, and have some hidden link inside the HTML
     pointing machines at the RDF?  Because it's really the constant
     symbol we're trying to follow; we're trying to get information
     about the action item you agreed to.  We just want one URI for
     that.  We don't want http://...//meeting7.html#item6 and
     http://...//meeting7.rdf#item6 !

     I can almost see a RESTful solution here, but not quite.  I end
     up with being unable to distinguish between the kbIdent and the
     constant symbol identifying the action item.   But I await
     suggestions here.... ]

     Anyway, there's a TAG finding which says I shouldn't serve HTML
     and RDF at the same URI -- and with good reason, because the
     fragment semantics are different.

     The only solution I've seen so far is mine: when you need a
     constant term, create a kbIdent for information about it, with it
     distinguished as the primary subject.  Now, use the same URI for
     the kbIdent and the constant term, but follow precise context
     rules so you always know which one you are talking about.   This
     meets requirements 1-4.   Does anything else?

     The overhead of creating a new kbIdent for each thing you want to
     identify with a constant term is insignificant if we reclaim the
     use of fragments.  Each thing mentioned in a big KB can have a
     fragment of the kb devoted to information about it.  This seems
     to match the use of anchors in some HTML, where each important
     concept or defined term in the document gets its own fragment.
     It also matches XPointer semantics for RDF/XML being XML if you
     use rdf:ID where feasible instead of rdf:about, and think of
     rdf:ID as an XML id.  The semantics may not be perfect, but
     they're pretty darn close.

Justifying #1 would be an interesting excersize for another day.

    -- sandro
Received on Friday, 24 January 2003 01:57:31 UTC