RE: Comments on draft-fielding-uri-rfc2396bis-07 from Graham Klyne on 2004-11-05 (uri@w3.org from November 2004)

From: Graham Klyne <GK@ninebynine.org>
Date: Fri, 05 Nov 2004 18:00:01 +0000
To: Bruce Lilly <blilly@erols.com>
Cc: uri@w3.org
Message-Id: <5.1.0.14.2.20041105173300.00ba4ce0@127.0.0.1>
I checked my code and test cases, and I clearly decided when reading the 
spec that '@' did not need to be escaped by a general URI handler.  I take 
my lead here from section 2.4, "When to Encode or Decode".

I think that is correct behaviour here:  in this respect, I think the 
w3.org implementation you mention is arguably incorrect, but this is 
probably a pretty harmless error, as all %-encoding would normally be 
reversed after the URI path component has been extracted.

For generic URI handling, I have taken the approach of encoding only those 
characters that are clearly required by the URI spec to be encoded.
However, the software also notes:
[[
       -- | Support for putting strings into URI-friendly
       --   escaped format and getting them back again.
       --   This can't be done transparently in all cases, because certain
       --   characters have different meanings in different kinds of URI.
]]
The software interfaces accordingly separate the character escaping logic 
from the test used to decide which characters need to be escaped, and then 
provides common test functions for generic URI components.

I tend to agree that this paragraph that you mentioned is not especially 
helpful:
[[
URI producing applications should percent-encode data octets that 
correspond to characters in the reserved set. However, if a reserved 
character is found in a URI component and no delimiting role is known for 
that character, then it should be interpreted as representing the data 
octet corresponding to that character's encoding in US-ASCII.
]]

If late editorial changes are being considered, I would suggest deleting 
this paragraph completely, since the first sentence can be read as 
contradicting the content of section 2.4, and as far as I can tell the 
second sentence repeats material already given at the beginning of section 2.

I do agree with you about '<' and '>':  my generic URI handling code does 
escape those, and I believe that is what the specification requires.

#g
--

At 09:21 05/11/04 -0500, Bruce Lilly wrote:

>On Thu November 4 2004 13:24, Larry Masinter wrote:
>
> > To be pedantically accurate, I could imagine
> >  'should percent-encode' => 'should otherwise percent-encode'
>
>That alone wouldn't help, in part because the "otherwise" is
>vague, in part because the mailto specification requires
>(i.e. "must" rather than merely "should") that reserved
>characters be encoded (see below for the exact quotation).
>
> > but I'm not convinced it is necessary, since a reading that
> > ALL reserved characters should be encoded no matter what
> > would lead you to encode every delimiter everywhere -- which
> > would be nonsensical.
>
>Would it?  If one goes to your message at the W3 uri mailing
>list archive,
>http://lists.w3.org/Archives/Public/uri/2004Nov/0020.html
>one can see that the "Respond" link specifies a mailto URI:
>
>  [ <a 
> href="mailto:uri%40w3.org?Subject=RE%3A%20Comments%20on%20draft-fielding-uri-rfc2396bis-07&amp;In-Reply-To=&lt;0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com&gt;&amp;References=&lt;0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com&gt;" 
> accesskey="r" title="respond to this message">Respond</a> ]
>
>Apparently the W3 archive implementor(s) also believe that
>'@' should be encoded as "%40"; likewise for ':' ("%3A").
>The only other gen-delim that appears unencoded is '?', which
>is in fact used for its reserved URI purpose, viz. as a delimiter
>introducing the query component.  [I haven't discussed the
>HTML obfuscation of '&', '<', and '>', which is yet another can
>of worms; '<' and '>' aren't merely "reserved", they're
>"excluded" in RFC 2396 terminology (and curiously barely
>mentioned in the draft under discussion); unlike the W3
>archive implementors, I believe that '<' and '>' should have
>been encoded as "%3C" and "%3E", obviating HTML
>obfuscation for the raw characters -- but for the moment,
>let's concentrate on '@'.]
>
>Clearly there is interaction between the mailto URI specification
>(RFC 2368), which references RFC 1738, which has been
>obsoleted by RFC 2369, which in turn is intended to be obsoleted
>by the draft under discussion.  Now this probably isn't the place
>to go into great detail about RFC 2368, but it does say
>
>    Note that all URL
>    reserved characters in "to" must be encoded
>
>So simply defining '@' as a UR{L,I} "reserved character" is
>sufficient to require encoding, at least in the specified mailto
>URI portion.  It doesn't say "some URL reserved characters"
>or "all except '@'", etc.  Because the mailto rules are
>specified in one document referring to "reserved characters"
>which are defined in a different document, any change to
>the definition or the set of reserved characters has an
>effect on interpretation of the rules.  Evidently (from the
>URI above and others which can be seen across the
>Internet), I am not alone in concluding that given:
>1. reserved characters must be encoded
>and
>2. '@' is a reserved character
>therefore '@' should be encoded (likewise for ':', etc.).
>
>I can see why '@' is reserved in the authority component
>of a URI (mailto URIs of course have no authority
>component), but I do not see why it would be considered
>reserved in the path component (and indeed in RFC 2369,
>which separately specifies which characters are reserved in
>individual components, it is NOT reserved in the path
>component). [Note that the path component of a mailto
>URI is precisely the "to" referred to in the above
>quotation from RFC 2368.]
>
>I hope it is clear that because of the fact that the URI syntax
>and mailto URI document specifications are intertwined, any
>significant change to the URI syntax definition or composition
>of the set of reserved characters changes how the mailto URI
>rules are interpreted unless there is a corresponding change
>to the mailto URI specification [and RFC 2368 has remained
>unchanged for more than six years, no errata published on
>the RFC Editor errata page; an earlier attempt to address
>some of these issues resulted in email to two of the three
>RFC 2368 authors (including the "other" Larry Masinter :-))
>bouncing and a response from the third that he is no
>longer interested in mailto URIs].
>
>Even if you personally believe that '@' should not be encoded,
>surely you can see how reasonable people might interpret
>RFC 2368 and the draft under discussion (were it to be
>approved as is as a replacement for RFC 2396) that way;
>simply look at mailto URIs in the wild, such as the ones from
>this very mailing list's archive!

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Friday, 5 November 2004 18:22:50 UTC