W3C home > Mailing lists > Public > uri@w3.org > January 2010

Re: When is percent-encoding required.

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 05 Jan 2010 16:12:37 +0900
Message-ID: <4B42E665.60506@it.aoyama.ac.jp>
To: Charles Lindsey <chl@clerew.man.ac.uk>
CC: URI <uri@w3.org>
Hello Charles,

Bob and Joseph have said most that needs to be said. In summary, I don't 
think there's anything wrong with respect to escaping in Frank's draft. 
If you can point to anything specific, I'll have another look at it. 
(see below for details)

On 2010/01/05 3:12, Charles Lindsey wrote:
> Draft-ellermann-news-nntp-uri-11.txt is currently going through AUTH48
> and, since Frank Ellermann seems not to have been heard from for more
> than a year, and cannot be contacted, I am getting the job of seeing
> what needs to be done (most notably changes necessitated by the AUTH48
> changes in RFC 5536).
>
> I find the question of just what needs to be percent-emcoded is hard to
> deduce from RFC 3986. Clearly, anything in <gen-delims> MUST be
> percent-encoded except when used as delimiters, so that agents can
> divide a URI into scheme, authority, path, query, and fragment
> components even before they recognise that it is a news or nntp URI.

Yes indeed.


But
> is it REQUIRED for the <sub-delims> if the particular scheme does not
> use any of them as delimiters? RFC 3986 seems to imply not, so I would
> expect that in
> news:foo@bar.!#$%&'*+/=?^`{|}.example
> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and
> hence can occur in a Message-ID) I would have to percent-encode the '#'.
> '/' and '?', but not the others.

Sorry, but you also have to encode characters that are not allowed in 
URIs at all, i.e. '{', '}', '`', "'", "^", and "|". Bob mentioned these, 
but wasn't very definitive and didn't give a reason. And of course, as 
Bob mentioned, "%" has to be escaped.

Frank seems to have taken the view that
> all <sub-delims> need to be encoded, though he does at one point permit
> '*' to appear unencoded (and it was indeed explicitly allowed in RFC
> 1738), which appears to be inconsistent wuth his stance elsewhere

I have difficulties understanding:

    Characters not directly allowed in this part of an
    [RFC3986] URI have to be percent-encoded, minimally anything that is
    not <unreserved>, no ":" (colon), and doesn't belong to the
    <sub-delims>.

I think this may be slightly better:

    Characters not directly allowed in this part of an
    [RFC3986] URI have to be percent-encoded. This at a minimum includes
    anything that is not <unreserved>, is not a ":" (colon), and does
    not belong to the <sub-delims>.

So I think Frank is saying here that if it's a sub-delimiter, you don't 
have to escape it. Maybe you can find an even better way to deal with 
the double negative (i.e. ideally get rid of it).

Also, looking e.g. at

      mid-atext       = ALPHA / DIGIT /              ; RFC 2822 <atext>
                        "!" / "$" / "&" / "'" /      ; allowed sub-delims
                        "*" / "+" / "=" /            ; allowed sub-delims
                        "-" / "_" / "~" /            ; allowed unreserved
                        "%23" / "%25" / "%2F" /      ; "#" / "%" / "/"
                        "%3F" / "%5E" / "%60" /      ; "?" / "^" / "`"
                        "%7B" / "%7C" / "%7D"        ; "{" / "|" / "}"

it seems to me that Frank got everything right according to RFC 3986. I 
haven't checked all the other mid-* productions, but it looks like he 
got it mainly right. (Frank also helped a lot with escaping issues for 
draft-duerst-mailto-bis 
(http://tools.ietf.org/html/draft-duerst-mailto-bis-07), and I always 
got the impression that he understood this stuff.


> And he also includes an example
> news://news.gmane.org/p0624081dc30b8699bf9b@%5B10.20.30.108%5D
> where I would have thought he could have shown
> news://news.gmane.org/p0624081dc30b8699bf9b@[10.20.30.108]

According to RFC 3986, '[' and ']' are only allowed for IPv6 addresses, 
i.e. inside <authority>. Frank is therefore correct here. (and these two 
are in gen-delims)

Regards,   Martin.


> So exactly what latitude does RFC 3986 permit in these situations?
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Tuesday, 5 January 2010 07:13:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:13 UTC