RE: fragment identifiers and media types (was RE: XPointerconsidered incomprehensible) from Booth, David (HP Software - Boston) on 2006-09-06 (www-xml-linking-comments@w3.org from July to September 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Wed, 6 Sep 2006 13:59:14 -0400
To: "Dan Connolly" <connolly@w3.org>
Cc: <simonstl@simonstl.com>, <noah_mendelsohn@us.ibm.com>, "Bjoern Hoehrmann" <derhoermi@gmx.net>, <www-tag@w3.org>, <www-xml-linking-comments@w3.org>, "Jonathan Marsh" <jmarsh@microsoft.com>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C2011443DA@tayexc19.americas.cpqcorp.net>

> From: Dan Connolly [mailto:connolly@w3.org] 
> 
> On Tue, 2006-09-05 at 12:37 -0400, Booth, David (HP Software - Boston)
> wrote:
> > > From: Simon St.Laurent
> > > 
> > > . . .  For a URI reference like:
> > > 
> > > http://simonstl.com/#news
> > > 
> > > The interpretation of the fragment identifier depends on the 
> > > media type returned.  URI philosophers will likely wave their 
> > > hands and say this isn't a problem. 
> >
> > As a side comment, this is precisely why, in my opinion, hash
> > URIs are unsuitable as *general purpose* identifiers: the
> > meaning of the URI is tied to the media type that is returned
> > when the racine is dereferenced. (The racine is the part
> > before the hash.) I.e., the meaning of this
> > URI:
> > 	http://simonstl.com/#news
> > depends on the media type that is returned when this other,
> > related URI:
> > 	http://simonstl.com/
> > is dereferenced. This may be fine if one is willing to limit
> > one's self to certain media types, but it is not very general
> > purpose.
>
> Why not?
>
> The meaning of *every* URI in the web is, practically,
> connected to protocol messages involving that URI, and pretty
> much all the Web protocols use MIME types somehow.

Protocol messages may be used in the process of determining the meaning,
but there is a difference between the meaning being determined by the
content of the retrieved message versus the mere fact of retrieval.  In
determining the meaning of http://simonstl.com/#news , if a GET on
http://simonstl.com/ returns a 200 OK and an HTML document, the mere
fact of retrieval indicates that http://simonstl.com/#news identifies a
location within an HTML document.  It therefore cannot, for example,
identify a person or a dog.

>
> The meaning of URIs that include #'s is no more or less
> dependent on media types than other URIs. For example, I can
> mint a URI right now...
> http://dm93.org/2006/09/not-very-helpful#sixtythree and tell
> you that it refers to the integer 63.
>
> I did that without using any representations of
> http://dm93.org/2006/09/not-very-helpful , and hence without
> any dependence on media types.
>
> Now this URI is not a very good one, because when you look it
> up, you'll get a 404; so you don't get any useful information
> when you use standardized protocols to look it up. But as the
> owner of dm93.org, I have the right to allocate that URI and
> say what it means, whether I use standard protocols or not.

Yes, you can allocate a URI from your domain without serving anything
from it, but that's not the point.  The point is that *if*
http://dm93.org/2006/09/not-very-helpful is dereferenced and it returns
a media type -- HTML, for example -- then that media type determines the
interpretation of http://dm93.org/2006/09/not-very-helpful#sixtythree
and thus the mere fact of retrieving HTML has limited the range of what
http://dm93.org/2006/09/not-very-helpful#sixtythree might identify,
because in HTML, a fragment identifier indicates a location within a
document.  And AFAIK, a location within a document cannot be a person or
a dog.

>
> (You could argue that I'm conflicting myself when I say that I
> have allocated that URI and yet give out a 404 error. OK, so
> change it to "403 unauthorized"; in other words: I know what
> /2006/09/not-very-helpful means, but but I'm not telling; not
> via HTTP, anyway. Please grant this line of argument even
> though I have not, actually, configured the dm93.org web server
> to act this way.)
>
>
>
> > Slash URIs using 303-redirects do not have this limitation.
>
> Sure they do. The meaning you get back after you follow your
> nose thru the redirection certainly depends on the media type
> of what you get back.

Well, not exactly.  The media type is used to tell you how to interpret
what you receive.  In essence, it tells you what interpreter to apply.
But once you know what interpreter to apply, the *content* of the
document (hopefully) tells you the meaning of the original URI.  For
example, if the media type indicates that an HTML document is returned,
then you know that you should read the document to find out what it says
about the meaning of the original URI -- and it could say anything.  It
might say that the original URI identifies a person or a dog, for
example.  This is very different from the situation with a hash URI, in
which the mere fact of retrieving an HTML document indicates that the
URI identifies a location within that document -- regardless of what the
document says.

>
> >   For example, if
> > 	http://simonstl.com/news
> > does a 303 "See Other" redirect to
> > 	http://simonstl.com/
> > then
> > 	http://simonstl.com/news
> > could identify any resource, independent of the media type
> > returned when
> > 	http://simonstl.com/
> > is dereferenced.
>
> No, it's not independent; the whole transaction tells you about
> the meaning of http://simonstl.com/news , right? And that
> meaning is determined by looking at the media type of what you
> get back from http://simonstl.com/ .

Not "determined by", but "determined through".  The media type is merely
a conduit for determining the meaning.  The meaning is determined by the
content retrieved.

David Booth

Received on Wednesday, 6 September 2006 17:59:46 UTC