Re: draft-fielding-uri-rfc2396bis-04.txt from Roy T. Fielding on 2004-04-06 (uri@w3.org from April 2004)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 6 Apr 2004 16:13:37 -0700
To: Graham Klyne <GK@ninebynine.org>
Cc: uri@w3.org
Message-Id: <089C918A-8820-11D8-9CC7-000393753936@gbiv.com>
On Tuesday, February 17, 2004, at 04:09  AM, Graham Klyne wrote:
> Section 1.2.2, para 4 (nit):
> [[
> When an author creates a reference to such a resource, they do so with 
> the intention that the reference be used in the future; what is being 
> identified is not some specific result that was obtained in the past, 
> but rather some characteristic that is expected to be true for future 
> results.
> ]]
>
> It seems odd to make claims about an author's intent in this way.  
> Suggestion:
> [[
> ... do so with the effect that the reference can be used in the future 
> ...
> ]]

Both are a bit too awkward for my taste, but I was trying to highlight
the purpose of creating the reference.  I'll rephrase it.

> Section 2.1, para 2 (suggestion):
>
> Would it be helpful to include here a forward reference to section 2.4?

Yes, added.

> Section 2.2, para 2 (editorial):
>
> I'm afraid I found this paragraph (immediately following the 
> sub-delims production) really hard to understand.
>
> As far as I can tell after reading the document through, the following 
> points are being made:
> - only characters in the reserved set may be used with special meaning 
> in URIs.
> - characters in the gen-delims have special meaning in all URI schemes.
> - characters in sub-delims may have special meaning in some URI 
> schemes.
> - characters in sub-delims that do not have special meaning in a URI 
> scheme may be used unescaped in component values in that URI scheme
> - a generic URI parser must treat all characters in the reserved set 
> as distinct from the pct-encoded version of the same.
>
> I think you're trying to say some more, but I can't figure what it is.

I don't think so, but I have untwisted the wording for the next rev.

> Section 2, general (inconsistency/editorial?):
>
> I'm not sure if there is an inconsistency between:
> [[
> This specification does not mandate the use of any particular 
> character encoding scheme for mapping between URI characters and the 
> octets used to store or transmit those characters.
> ]] -- (section 2 intro)
>
> and:
> [[
> For consistency, percent-encoded octets in the ranges of ALPHA 
> (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), 
> underscore (%5F), or tilde (%7E) should not be created by URI 
> producers and, when found in a URI, should be decoded to their 
> corresponding unreserved character by URI normalizers.
> ]] -- (section 2.3)
> ... and other references to specific escape sequences through the 
> document.
> Particularly, section 3.2.2 says "such octets must represent 
> characters encoded in the UTF-8 ...".

Those all refer to encoding into URI characters -- the non-mandate is
in regard to how the URI characters are subsequently encoded on the 
wire.
I'm afraid I've lost the line of reasoning that follows.  URI characters
are an encoding of identifying data, which may in turn be later encoded
for transport or presentation.  I'll rearrange the beginning of section 
2
to make that clearer (hopefully).

> Section 2.4, para 5 (editorial):
>
> I'm not sure that the discussion of additional encoding, such as 
> base64, really adds any value to the generic URI specification.  I'd 
> be tempted to drop this paragraph, or maybe place it somewhere other 
> than in this section discussing characters (e.g. in section 1.2.1)?

Removed base64, but the rest belongs there.

> Section 3, para 2 (editorial):
>
> I think it would be helpful (i.e. avoid requiring the reader to look 
> forward for the "path" production) if the last sentence read something 
> like this:
> [[
> In other words, if authority is present then the first segment of the 
> path must be empty, and the path must start with a '/' character.
> ]]

Subsequent changes made this no longer applicable.

> Section 3.2.1 and Appendix B (suggestion):
>
> In section 3.2.1 the specification suggests:
> [[
> Applications that render a URI for the sake of user feedback, such as 
> in graphical hypertext browsing, should render userinfo in a way that 
> is distinguished from the rest of a URI, when feasible.
> ]]
>
> Yet the regular expression in appendix B makes no attempt to separate 
> the userinfo from the rest of the authority component.  Would it not 
> be wise to encourage (by example) generic parsers to separate the 
> userinfo in order that the above exhortation is more easily followed?
>
> <background>
> The current Haskell library URI module is implemented using an exact 
> copy of the Regexp from RFC2396.  This example has impact on the ways 
> that widely-used software components are actually implemented.
> </background>

Yes, but it would also make the regular expression exceed the line
length limit for RFCs.  Perhaps we should list both as examples.

> Section 3.2.2, IPvFuture production (suggestion):
>
> Would it not be more futureproof if this were expressed as:
> [[
> IPvFuture  = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
> ]]

I suppose, though such variable-length fields just provide one more
method of attack by the bad guys.

> Section 5.1, para 1 (editorial):
>
> I think there's a subtlety here that isn't immediately obvious.  I 
> think the text is correct, and maybe best left as is, but I'll mention 
> the issue anyway.
>
> The document says:
> [[
> Aside from fragment-only references (section 4.4), relative references 
> are only usable when a base URI is known.
> ]]
>
> Which raised in my mind "useful for what?".  On reflection, I see that 
> when used for selecting a document "view", given the discussion of 
> same-document references, the meaning of a bare fragment does not 
> depend on the base URI, since that would be the base URI of the 
> encapsulating entity, whatever that may be.
>
> Is it worth trying to make this point more obvious, something like:
> [[
> In the absence of a base URI embedded in content, interpretation of a 
> fragment-only URI is implicitly with respect to the base URI of the 
> encapsulating entity.  Whatever that may be, such a URI is a "same 
> document reference" (section 4.4), and may be used as-is for retrieval 
> purposes.
> ]]

I think that only confuses the issue.  If the encapsulating entity had
a URI, then the base URI would be known and the comment does not apply.

> Section 5.1.1, para 2 (nit):
>
> I'm not aware that xml:base is part of or referenced by the 
> application/xml media type specification.
>
> Maybe, say:
> [[
> The appropriate syntax, when available, is described by the data 
> format specification associated with a media type.
> ]]

Okay.

> Section 6 (concern):
>
> The document states "...comparison methods are designed to minimize 
> false negatives while strictly avoiding false positives".
>
> I'm concerned that some of the normalizations suggested might result 
> in false equivalence.

There is no such thing.  We are talking about equivalence of URIs for
the sake of identifying a single resource, not equivalence of external
processing algorithms within languages that make use of URIs.

> Section 6.2.2.1:
> Suggests case-normalization of the authority component.  But is there 
> anything to prevent introduction of a new form of authority that *is* 
> case sensitive?  I don't see any such implication in the discussion of 
> reg-name (section 3.2.2).

The host subcomponent is case-insentive.
(I've added that to the defining section).

> Section 6.2.2.3:
> I'm concerned about empty component normalization:

It has been eliminated.

Thanks,

....Roy
Received on Tuesday, 6 April 2004 19:13:35 UTC