- From: Bruce Lilly <blilly@erols.com>
- Date: Fri, 5 Nov 2004 09:21:34 -0500
- To: uri@w3.org
- Cc: Larry Masinter <LMM@acm.org>
On Thu November 4 2004 13:24, Larry Masinter wrote: > To be pedantically accurate, I could imagine > 'should percent-encode' => 'should otherwise percent-encode' That alone wouldn't help, in part because the "otherwise" is vague, in part because the mailto specification requires (i.e. "must" rather than merely "should") that reserved characters be encoded (see below for the exact quotation). > but I'm not convinced it is necessary, since a reading that > ALL reserved characters should be encoded no matter what > would lead you to encode every delimiter everywhere -- which > would be nonsensical. Would it? If one goes to your message at the W3 uri mailing list archive, http://lists.w3.org/Archives/Public/uri/2004Nov/0020.html one can see that the "Respond" link specifies a mailto URI: [ <a href="mailto:uri%40w3.org?Subject=RE%3A%20Comments%20on%20draft-fielding-uri-rfc2396bis-07&In-Reply-To=<0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com>&References=<0I6O00MU8359UQ%40mailsj-v1.corp.adobe.com>" accesskey="r" title="respond to this message">Respond</a> ] Apparently the W3 archive implementor(s) also believe that '@' should be encoded as "%40"; likewise for ':' ("%3A"). The only other gen-delim that appears unencoded is '?', which is in fact used for its reserved URI purpose, viz. as a delimiter introducing the query component. [I haven't discussed the HTML obfuscation of '&', '<', and '>', which is yet another can of worms; '<' and '>' aren't merely "reserved", they're "excluded" in RFC 2396 terminology (and curiously barely mentioned in the draft under discussion); unlike the W3 archive implementors, I believe that '<' and '>' should have been encoded as "%3C" and "%3E", obviating HTML obfuscation for the raw characters -- but for the moment, let's concentrate on '@'.] Clearly there is interaction between the mailto URI specification (RFC 2368), which references RFC 1738, which has been obsoleted by RFC 2369, which in turn is intended to be obsoleted by the draft under discussion. Now this probably isn't the place to go into great detail about RFC 2368, but it does say Note that all URL reserved characters in "to" must be encoded So simply defining '@' as a UR{L,I} "reserved character" is sufficient to require encoding, at least in the specified mailto URI portion. It doesn't say "some URL reserved characters" or "all except '@'", etc. Because the mailto rules are specified in one document referring to "reserved characters" which are defined in a different document, any change to the definition or the set of reserved characters has an effect on interpretation of the rules. Evidently (from the URI above and others which can be seen across the Internet), I am not alone in concluding that given: 1. reserved characters must be encoded and 2. '@' is a reserved character therefore '@' should be encoded (likewise for ':', etc.). I can see why '@' is reserved in the authority component of a URI (mailto URIs of course have no authority component), but I do not see why it would be considered reserved in the path component (and indeed in RFC 2369, which separately specifies which characters are reserved in individual components, it is NOT reserved in the path component). [Note that the path component of a mailto URI is precisely the "to" referred to in the above quotation from RFC 2368.] I hope it is clear that because of the fact that the URI syntax and mailto URI document specifications are intertwined, any significant change to the URI syntax definition or composition of the set of reserved characters changes how the mailto URI rules are interpreted unless there is a corresponding change to the mailto URI specification [and RFC 2368 has remained unchanged for more than six years, no errata published on the RFC Editor errata page; an earlier attempt to address some of these issues resulted in email to two of the three RFC 2368 authors (including the "other" Larry Masinter :-)) bouncing and a response from the third that he is no longer interested in mailto URIs]. Even if you personally believe that '@' should not be encoded, surely you can see how reasonable people might interpret RFC 2368 and the draft under discussion (were it to be approved as is as a replacement for RFC 2396) that way; simply look at mailto URIs in the wild, such as the ones from this very mailing list's archive!
Received on Friday, 5 November 2004 16:07:03 UTC