RE: Comments on draft-fielding-uri-rfc2396bis-07 from Bruce Lilly on 2004-11-05 (uri@w3.org from November 2004)

From: Bruce Lilly <blilly@erols.com>
Date: Fri, 5 Nov 2004 09:21:34 -0500
To: uri@w3.org
Cc: Larry Masinter <LMM@acm.org>
Message-Id: <200411050921.34932.blilly@erols.com>
On Thu November 4 2004 13:24, Larry Masinter wrote:

> To be pedantically accurate, I could imagine
>  'should percent-encode' => 'should otherwise percent-encode'

That alone wouldn't help, in part because the "otherwise" is
vague, in part because the mailto specification requires
(i.e. "must" rather than merely "should") that reserved
characters be encoded (see below for the exact quotation).
 
> but I'm not convinced it is necessary, since a reading that
> ALL reserved characters should be encoded no matter what
> would lead you to encode every delimiter everywhere -- which
> would be nonsensical.

Would it?  If one goes to your message at the W3 uri mailing
list archive,
http://lists.w3.org/Archives/Public/uri/2004Nov/0020.html
one can see that the "Respond" link specifies a mailto URI:

 [ <a href="mailto:uri%40w3.org?Subject=RE%3A%20Comments%20on%20draft-fielding-uri-rfc2396bis-07&amp;In-Reply-To=&lt;0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com&gt;&amp;References=&lt;0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com&gt;" accesskey="r" title="respond to this message">Respond</a> ]

Apparently the W3 archive implementor(s) also believe that
'@' should be encoded as "%40"; likewise for ':' ("%3A").
The only other gen-delim that appears unencoded is '?', which
is in fact used for its reserved URI purpose, viz. as a delimiter
introducing the query component.  [I haven't discussed the
HTML obfuscation of '&', '<', and '>', which is yet another can
of worms; '<' and '>' aren't merely "reserved", they're
"excluded" in RFC 2396 terminology (and curiously barely
mentioned in the draft under discussion); unlike the W3
archive implementors, I believe that '<' and '>' should have
been encoded as "%3C" and "%3E", obviating HTML
obfuscation for the raw characters -- but for the moment,
let's concentrate on '@'.]

Clearly there is interaction between the mailto URI specification
(RFC 2368), which references RFC 1738, which has been
obsoleted by RFC 2369, which in turn is intended to be obsoleted
by the draft under discussion.  Now this probably isn't the place
to go into great detail about RFC 2368, but it does say

   Note that all URL
   reserved characters in "to" must be encoded

So simply defining '@' as a UR{L,I} "reserved character" is
sufficient to require encoding, at least in the specified mailto
URI portion.  It doesn't say "some URL reserved characters"
or "all except '@'", etc.  Because the mailto rules are
specified in one document referring to "reserved characters"
which are defined in a different document, any change to
the definition or the set of reserved characters has an
effect on interpretation of the rules.  Evidently (from the
URI above and others which can be seen across the
Internet), I am not alone in concluding that given:
1. reserved characters must be encoded
and
2. '@' is a reserved character
therefore '@' should be encoded (likewise for ':', etc.).

I can see why '@' is reserved in the authority component
of a URI (mailto URIs of course have no authority
component), but I do not see why it would be considered
reserved in the path component (and indeed in RFC 2369,
which separately specifies which characters are reserved in
individual components, it is NOT reserved in the path
component). [Note that the path component of a mailto
URI is precisely the "to" referred to in the above
quotation from RFC 2368.]

I hope it is clear that because of the fact that the URI syntax
and mailto URI document specifications are intertwined, any
significant change to the URI syntax definition or composition
of the set of reserved characters changes how the mailto URI
rules are interpreted unless there is a corresponding change
to the mailto URI specification [and RFC 2368 has remained
unchanged for more than six years, no errata published on
the RFC Editor errata page; an earlier attempt to address
some of these issues resulted in email to two of the three
RFC 2368 authors (including the "other" Larry Masinter :-))
bouncing and a response from the third that he is no
longer interested in mailto URIs].

Even if you personally believe that '@' should not be encoded,
surely you can see how reasonable people might interpret
RFC 2368 and the draft under discussion (were it to be
approved as is as a replacement for RFC 2396) that way;
simply look at mailto URIs in the wild, such as the ones from
this very mailing list's archive!
Received on Friday, 5 November 2004 16:07:03 UTC