Comments on draft-fielding-uri-rfc2396bis-0x from Kay, Michael on 2003-08-05 (uri@w3.org from August 2003)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Tue, 5 Aug 2003 17:04:00 +0200
To: uri@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD02F@daemsg02.software-ag.de>

This is clearly a great improvement on RFC 2396.

It is disappointing, but not really surprising, that the document still
contains so many words like "should", "recommended", "unwise", "generally
counterproductive", "discouraged", and "abnormal", which all tend to give
the impression that handling URIs is a black art rather than a precise
science.

RELATIVE URI REFERENCES

The document retains one ambiguity from RFC 2396: is the zero-length string
a valid relative URI reference? The ABNF syntax seems to suggest that it
isn't, but sections 4.4 and 5.4.1 assigns semantics to this case, saying
this is an "abnormal" case which URI parsers "should" be capable of
handling. I think that the use of "" as a relative self-reference should be
treated as being wholly respectable. (What does "abnormal" actually mean?)

I'm disappointed to see that the term "current document" still appears in
section 4.4, and is nowhere defined. In 5.4.2 it appears in residual form as
"current base URI". Are "current document" and "current base URI" the same
thing as "the resource identified by the base URI"? If so, say so.

The section heading of 4.2 is "Relative URI", but in fact a relative URI
reference is not a URI, so this term should not be used.

In 4.4 the statement "the dereference should not result in a new retrieval"
seems to contradict section 1.2.2, which strongly suggests that the
semantics of the dereferencing operation are outside the scope of the RFC.

ESCAPING

It's much clearer now that a string is not a URI unless all the special
characters have been properly escaped. Nevertheless, there is still some
residual language that hints that the input to the escaping algorithm might
also be referred to as a URI. 2.4.2 says "characters within a URI string are
escaped". What exactly is a URI string? Similarly, "Once generated, a URI is
always in an escaped form" hints that there are other circumstances in which
a URI might not be in escaped form. It might be useful to define some formal
term for the unescaped representation of a URI, for example a
"URI-rendition", so that we can talk about this string without referring to
it as a URI.

URI EQUIVALENCE

The discussion is useful, but it would also be useful to define a preferred
(and named) default algorithm for comparing URIs that other specifications
can refer to.

Michael Kay

Received on Tuesday, 5 August 2003 11:04:11 UTC