Re: #foo URI references from Roy T. Fielding on 2004-01-30 (uri@w3.org from January 2004)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 29 Jan 2004 17:52:20 -0800
To: Paul Grosso <pgrosso@arbortext.com>
Cc: uri@w3.org
Message-Id: <F11C7199-52C6-11D8-A4B3-000393753936@gbiv.com>
>>> Fair enough.  So the special interpretation of "#foo" in the resource
>>> denoted by "http://www.example.com/blargh" is extended to  
>>> "blargh#foo"
>>> and "http://www.example.com/blargh#foo" as well.
>>>
>>> But it seems to me that (for good or ill) this also means that if a
>>> base URI is available, say "http://www.example.com/stat/blargh", then
>>> "#foo" now means "http://www.example.com/stat/blargh#foo".
>>>
>>> Is this a correct reading of 2396 bis?

Yes.

>> And if John is reading it correctly (he is reading it as I did),
>> then this is the crux of my problem with it.
>
> I've reread
> http://cvs.apache.org/viewcvs.cgi/*checkout*/ietf-uri/rev-2002/ 
> rfc2396bis.html?rev=1.64#same-document
> but it seems that my studying the text (and JohnC's studying of it)
> isn't necessarily sufficient to answer my question which is as follows.
>
> In RFC 2396, it makes clear that:
>
>       If the path component is empty and the scheme, authority, and
>       query components are undefined, then it is a reference to the
>       current document and we are done.
>
> Given either the following document at http://www.example.org/doc.xml:
>
> <doc xml:base="http://www.example.com/stat/blargh">
>   <para href="#foo">xxx</para>
>   <para id="foo">yyy</para>
> </doc>
>
> or the following document at http://www.example.org/doc.html:
>
> <html>
> <head>
> <title>t</title>
> <base href="http://www.example.com/stat/blargh">
> </head>
> <body>
> <p><a href="#foo">xxx</a></p>
> <p><a name="foo">yyy</a></p>
> </body>
> </html>
>
> the href in either case points to the element in the same document
> containing yyy.

href is an attribute that contains a URI reference that might be
interpreted by the reference interpreter as a pointer to a name
defined by an attribute on an element within the document being
interpreted.  Regardless, the contents are a URI reference and the
target is identified by the URI within the context in which it is
found.

> Some readings of 2396bis interpret it to be saying that the href above
> points to http://www.example.com/stat/doc.xml#foo (or, in the html  
> case,
> http://www.example.com/stat/doc.html#foo) which does not necessarily
> exist and is certainly not my intended target.

It does exist -- it is defined by the document itself.  Both readings
say the same thing -- they just use different perspectives to say it.
The original was from the view of the document processor, whereas the
draft wording is from the view of the URI processor (with further
instruction to a document processor that is processing a retrieval
action).  In both cases, the target of the reference is the same and
the result is the same.

> Is this significant change in meaning (and behavior for a variety of
> deployed tools) the intended interpretation of 2396bis?

There is no change in meaning (I wrote both sentences in both documents
and I am absolutely certain of that).  Base has always defined a name
space for the document's reference resolver, and the behavior will
be the same as it should have been before.  Any other interpretation
would eliminate the ability to set bookmarks and location bars to the
identified URI.  The observable differences between the two are the
internal interface between the document processor and its relative URI
resolver, which has been simplified by the new wording such that it
better reflects what most browsers do today, and in the behavior of
the document processor when the referenced URI matches the base URI
in every way except the fragment (a situation which was left undefined
before, but that browsers implemented to be a same-document reference
for the sake of performance and because it simplified their own
interface with the reference resolver).

That is, the correct behavior is to parse the reference into a URI
before deciding whether or not it is a same-document reference.
The effect of this change, when implemented, has no impact on the
protocol *except* when a person is deliberately abusing the base URI
by assigning it an unrelated URI for the purpose of creating an
artificial shorthand notation for external references.  It should be
no surprise that such usage will not be supported by the standard,
since it never worked with the majority of deployed implementations.

In any case, the change is needed for consistent interpretation of
URI references in non-document-retrieval conditions (e.g., SemWeb).

....Roy
Received on Thursday, 29 January 2004 20:52:02 UTC