HTTP, URIrefs and resources not "on the web" (was Re: [rdfmsQnameUriMapping-6] Algorithm for creating a URI from a QName in RDF Model?

On 2002-05-23 20:46, "ext Graham Klyne" <GK@ninebynine.org> wrote:

> At 12:58 PM 5/23/02 -0400, Mark Baker wrote:
>> On Thu, May 23, 2002 at 04:47:00PM +0100, Graham Klyne wrote:
>>> At 03:43 PM 5/23/02 +0100, Sean B. Palmer wrote:
>>>> The RDF Core WG would certainly want the SW and WWW to be
>>>> interoperable, and yet after repeated debates spurred mainly by Aaron,
>>>> fragIDs in RDF haven't been deprecated. That speaks volumes.
>>> 
>>> As one who used to think that fragids in RDF were broken...
>>> 
>>> I've been thinking about this point some, and I'm coming round to a view
>>> that fragid's are not only OK with RDF, but their use is to be preferred
>>> for many RDF resources, and that the SW/WWW integration can work just
>>> fine.  I've not yet had time to sit down and straighten my thoughts ...
>> too
>>> many other things to do!
>> 
>> Process foul! 8-)  You can't do that.  We need reasons, damnit!
> 
> Er, you're right.  This will be very sketchy:
> 
> 1. The interpretation of a fragment identifier depends on the MIME type of
> the representation it's applied to.
> 
> 2. URIs without fragment identifiers are generally presumed to map to some
> resource for which a Web representation (or several) can be retrieved.
> 
> 3. RDF uses URI-references to denote things that aren't necessarily
> web-retrievable.
> 
> I think so far is pretty standard stuff.
> 
> The difficulty with someurl#frag in RDF arises when you say that this is
> interpreted by:
> (a) dereferencing 'someurl'.
> (b) interpreting #frag according to what you get back.
> This doesn't work well for RDF, because different MIME types can be
> returned, with different interpretations of the fragment identifier, where
> RDF requires that a URI ref have just one denotation under any given
> interpretation.
> 
> So my approach for interpreting someurl#frag (and this is largely inspired
> by comments from TimBL and Pat Hayes, though any errors are of course all
> mine) is this:
> 
> (A) *assume* that 'someurl' indicates a resource which has an RDF
> representation.  (If it's not dereferencable as such on the web, so be it,
> but I must assume its notional existence)
> 
> (B) when used in an rdf document, 'someurl#frag' means the thing that is
> indicated, according to the rules of application/rdf+xml mime type as a
> "fragment" or "view" of the RDF document at 'someurl'.  If the document
> doesn't exist, or can't be retrieved, then exactly what that view may be is
> somewhat undetermined, but that doesn't stop us from using RDF to say
> things about it.
> 
> (C) the RDF interpretation of a fragment identifier allows it to indicate a
> thing that is entirely external to the document, or even to the "shared
> information space" known as the Web.  That is, it can be an abstract idea,
> like my cat or DanC's car.
> 
> (D) So any RDF document acts as an intermediary between web retrieval
> documents (itself, at least, and also any other web-retrievable URIs that
> it may use, including schema and references to other RDF documents) and
> some set of abstract or non-Web entities that it may describe.
> 
> That's it.  I think it's consistent with all the conventional web axioms,
> but it also provides an handling of URIrefs and their denotation that is
> consistent with the RDF model theory and usage.  The "stretch", if there is
> one, is that it somewhat extends the idea of a "fragment" or "view" beyond
> the conventional idea that it's a physical part of a containing document.
> 
> If you accept this, then it becomes natural to take a view that URIs
> without fragment identifiers _should_ be reserved for indicating
> web-retrievable resources (when used in RDF), which is something TimBL has
> promoted.  This goes against quite a lot of actual RDF usage (mine
> included) so I don't think we can be too strict about that, but it seems a
> reasonable principle to aim for.
> 
> It also suggests a possible answer to the question about the web and
> URIs.  It is sometimes claimed that to be on the web means to have a
> URI.  So are people and cats and dogs and cars "on the web"?  If I clarify
> the definition of "on the web" to not include things that have URI
> references, then the answer to that question can be "no".  But using RDF,
> we are still free to talk about these things without actually having to
> claim that they are "on the web", by using URI-references rather than "1st
> class" URIs.

All in all I can accept this point of view as reasonable and workable,
with two exceptions or caveats (and I appreciate that your comments
were offered off-the-cuff and quickly -- so feel free not to respond
if any of the following is off the mark from your actual views):

1. I wouldn't presume to require every uriref someuri#frag
that is used to denote a resource in RDF to require that
someuri resolve to a representation of an RDF instance. The
real requirement is simply that it consistently resolve to
an instance of the same MIME type such that the fragment
identifier has a consistent interpretation in all cases.
Yes, that's more difficult to determine/ensure, but that's
really what the true requirement distills down to, I think.
 
2. I'm not comfortable with the very last comment, which seems to suggest
that "1st class" URIs would not be used to denote things which are not
"on the web". Whether you have foo://bar#cat or foo://bar/cat in no
way determines whether the thing is "on the web" and a representation
of it is obtainable. This is perhaps the primary point of friction
between the needs of "traditional" web applications which are concerned
with stuff that is web accessible, and newer semantic web applications
which, in addition to being concerned with stuff that is web accessible,
is also concerned with alot of stuff that is not web accessible, either
because it's not digital, or because it is abstract.

The question about whether a thing is "on the web" (has an accessible
representation) or not "on the web" and whether that distinction can
be determined from the URI or URIref itself is, I think, pivotal,
and one that needs more attention and hopefully some resolution in
the not so distant future.

The present web architecture, insofar as I can see, does not provide
a clear and consistent answer to this. A 404 error seems the closest
we can get, but that doesn't really tell us whether the resource is
not "on the web" versus "on the web" but not presently accessible.

There seem to be two approaches to making this distinction explicit:

1. On a per-instance basis, by defining in some manner metadata about
   the resource denoted by the URIref which clarifies whether it is
   web accessible or not

2. On a per-class basis, by defining for the URI scheme or URI class
   whether instances of that scheme or class denote resources which are
   or are not on the web (e.g. [1], [2])

Both have advantages. The former in terms of flexibility. The latter
in terms of economy.

Rather than trying to make it an either-or choice which is unlikely
to be resolved by any amount of discussion or debate, perhaps we
should provide for both.

In conjunction with specific URI schemes or classes which provide as
part of their semantics whether the resources they denote are or
are not "on the web", we could also define a new set of HTTP response
codes, e.g. 6xx which indicate "the resource denoted by the URI
attempted to be dereferenced is not web-accessible" and the particular
codes indicate the nature of the actual response, which could be
various degrees and/or types of metadata known about the resource, e.g.

   600  No further information available about resource
   601  Summary of information known about resource (RDF encoded)
   602  Listing of servers hosting information about resource (RDF encoded)
   ...

etc.

Thus, a 4xx response truly means that the resource is known or
presumed to be web accessible, and the server failed to provide
a representation for it -- whereas a 6xx response makes it clear
that one cannot obtain any representation for the resource in
question (at least insofar as the particular server is concerned).

In addition to the above, add an HTTP method such as "INFO" which,
even for web-accessible resources, would force a 6xx response from
the server, enabling one to obtain knowledge about any arbitrary
resource whether it was web accessible or not. Of course, the
"HEAD" method theoretically could be used, but (a) would still
perhaps confuse web accessible versus non accessible resources
and (b) would not provide for the richness of RDF for capturing
the knowledge associated with a resource.

The benefit of this particular approach is that, based on specific
classes and schemes of URIs, or based on per-instance knowledge,
a server can respond usefully to attempts to dereference a URI
which denotes a resource which is not web-accessible, and then
notify a client accurately of the nature of the resource, providing
useful informaiton to the client about the resource or where such
information could be obtained.

Thus, whether each URI is qualified individually as to its nature
of accessibility (which would be required for e.g. http: URIs
denoting non web accessible resources) or whether qualified by
URI scheme or URI class [1], [2] would be up to the creator of
the URI and boils down to a simple matter of flexibility versus
economy. The web archtecture itself would remain agnostic about
it, but still provide that critical distinction regarding
accessibility required for the next generation of semantic web
agents.

Cheers,

Patrick

[1] http://ietf.org/internet-drafts/draft-pstickler-voc-01.txt
[2] http://ietf.org/internet-drafts/draft-pstickler-uri-taxonomy-00.txt


--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Monday, 27 May 2002 04:38:52 UTC