RE: web proper names from Hamish Harvey on 2004-09-21 (www-rdf-interest@w3.org from September 2004)

From: Hamish Harvey <david.harvey@bristol.ac.uk>
Date: Tue, 21 Sep 2004 14:24:19 +0100
To: www-rdf-interest@w3.org
Message-Id: <1095773059.12871.204822408@webmail.messagingengine.com>
This debate has caused a decided shift in my understanding of this
issue. There follows a summary. Comments and flames the very purpose of
posting. Am I missing the point? Making some rudimentary mistake?

Cheers,
Hamish

URIs can indicate any entity.

(URI qua symbol)s indicate resources.
(URI qua retrieval path)s describe a mechanism by which a representation
of the thing indicated by the (URI qua symbol).

(URI qua symbol)s are opaque.
(URI qua retrieval path)s are not opaque, and are not symbols.

(URI qua character string)s are different things again; they are an
artefact of implementation, I suspect, and while obviously important in
that context, of no particular concern here.

"URI qua resource identifier" doesn't take us very far, as is made more
obvious by writing "URI qua URI". By definition, a URI has these two
aspects: it identifies resources which are inaccessible to machines; and
it describes a mechanism by which a representation of this resource can
be retrieved. The notion of "URI qua URI" therefore includes both of
these aspects, and doesn't help with the discussion.

(URI qua symbols) suffer from the symbol grounding "problem" (which
isn't IMO a problem, just a fact to be aware of). Adrian's "solution" is
to attempt a formalisation of Newell's observation, that symbol
grounding in knowledge representation languages is parasitic on natural
language (or on the conceptual structures expressed using natural
language). As Jon suggested, the NL<-->KRL transformation process is an
interesting area of ongoing research, but doesn't help with symbol
grounding, it just draws all of the problems of natural language into a
situation where the whole point of defining a formal language is to
escape from them to some degree.

Ignoring for the time being the complication of content negotiation, and
assuming helpful term coiners who place a human readable description of
a term to be retrieved by GETting the (URI qua retrieval path)
corresponding to the (URI qua symbol) of the term, we have this:

_:representation ex:representationOf
<http://www.paris.org/Monuments/Eiffel> .

This doesn't get us very far in terms of machine processing, though. Add

_:representation ex:resultOfDereferencing
<http://www.paris.org/Monuments/Eiffel> .

but this says something like, 

"[the thing indicated by the blank node _:representation] 
[is related by the relationship indicated by the (URI qua symbol)
ex:resultOfDereferencing]
[to the thing *indicated by* the (URI qua symbol)
<http://www.paris.org/Monuments/Eiffel>]".
 
In order to avoid this unfortunate state of affairs, you surely need to
step down to using a literal: a thing which indicates itself. In this
case, you need a (URI qua retrieval path) literal, rather than a (URI
qua symbol) which indicates some inaccessible entity.

_:representation ex:resultOfDereferencing
"http://www.paris.org/Monuments/Eiffel"^^xsd:anyURI .

and you can start to make assertions about for example the authorship of
the representation, independent of what the (URI qua symbol)
<http://www.paris.org/Monuments/Eiffel> indicates. You can also eg look
for other representations of the same thing (the thing identified by the
(URI qua symbol) <http://www.paris.org/Monuments/Eiffel>; at this point
of course you start to have to make some judgement on the human/social
uses of the (URI qua symbol) and the need for a treatment of uncertainty
rears its ugly but fascinating head). 

Where there are multiple representations, the relationships between
these can of course be described using RDF too, so it should be possible
to accommodate content negotiation.

The situation that many potential users of RDF will regard as "normal"
is the desire to describe eg the authorship of documents. The natural
tendency here is to use the (URI qua retrieval path) for the document as
a (URI qua symbol):

<http://www.paris.org/Monuments/Eiffel> foaf:maker [ for for author of
web page ] .

but assuming that we have already decided to use this (URI qua symbol)
to indicate the tower itself, we can't do this. We can, as has been
pointed out, use a URI to indicate one thing only, and once we've made
the choice we didn't ought to change our minds. No problem there: just
say, "No!"
 
Assuming we hadn't already made that commitment, however, we would be
free to use this URI in its "natural" (read -- intuitive to some) role,
as indicating the document (probably not as natural as it seems, though,
as explored in tortuous detail in the Functional Requirements for
Bibliographic Records recommendation [1] which talks of works,
expressions, manifestations, and items).

[1] http://www.oclc.org/research/projects/frbr/default.htm

So far so good, we can say

<http://www.paris.org/Monuments/Eiffel> foaf:maker [ for for author of
web page ] .

but it still isn't clear to man or machine what
<http://www.paris.org/Monuments/Eiffel> indicates, and it would be nice
to be more explicit. Again ignoring content negotiation, one could say

<http://www.paris.org/Monuments/Eiffel> ex:resultOfDereferencing
"http://www.paris.org/Monuments/Eiffel"^^xsd:anyURI .

To the standard RDF machinery this is just another assertion. The use of
the (URI qua retrieval path) as distinct from the (URI qua symbol)
however allows extra-RDF machinery to come into play and do useful
things at the interface of RDF and *retrievable* resources (which are by
definition themselves representations of some possibly non-retrievable
resource).

It is here that the need for this distinction seems clearest. Witness:

<http://www.paris.org/Monuments/Eiffel> ex:resultOfDereferencing
<http://www.paris.org/Monuments/Eiffel> .

at which point meaning disappears up its own anus. It isn't clear that
there is anything in the world of the web that can be the result of
dereferencing itself.

So by disambiguating the (URI qua symbol) as used heavily in RDF and the
(URI qua retrieval path) -- the two facets of the (URI qua URI) -- it
becomes possible to

1) Make assertions about the thing indicated by a URI (without having
access to said thing)
2) Make assertions about representations of the resource indicated by a
URI.
3) Specify explicitly that a URI does actually indicate the
respresentation which can be retrieved using it.

It follows then that no software is allowed to treat a (URI qua symbol)
as it appears in an RDF graph as anything other than a totally opaque
symbol. It is *only* if it is explicitly specified that a URI (or bnode)
indicates some specific retrievable resource that it is valid to go
beyond the "inaccessible indicated thing" level of interpretation.

When a (URI qua symbol) is to indicate a non-retrievable resource, such
as the Eiffel Tower, it is then possible to place an eg HTML document to
be retrieved using that URI as a (URI qua retrival path), and it is
precisely the fact that humans can do this in order to get a hint as to
what a (URI qua symbol) is supposed to identify that leads to the
argument that one should always use http URIs. This document is of value
only to humans.

If a (URI qua symbol) is to indicate a document which is retrieved using
that URI as a (URI qua retrieval path), then it is *not possible* to
also place a document there explaining to a human what the (URI qua
symbol) indicates. It is then *necessary* if any human or software is to
ground this symbol -- which grounding must be possible for the symbol to
be useful -- to state explicitly, in RDF, that the (URI qua symbol)
indicates what the (URI qua retrieval path) provides a path to (modulo
content negotiation complications). So it seems that what people might
regard as the "natural" case is the one that must be explicitly handled.

URIQA seems in this context to be a mechanism by which the (URI qua
retrieval path) can be used to retrieve information about the thing
indicated by the (URI qua symbol). The URIQA web service clarifies the
two facets of the URI in play by separating the retrieval path and the
(URI qua symbol) being asked about.
Received on Tuesday, 21 September 2004 13:24:25 UTC