W3C home > Mailing lists > Public > uri@w3.org > January 2010

Re: When is percent-encoding required.

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Thu, 07 Jan 2010 10:44:42 +0900
Message-ID: <4B453C8A.6020002@it.aoyama.ac.jp>
To: Charles Lindsey <chl@clerew.man.ac.uk>, "uri@w3.org" <uri@w3.org>
Hello Charles,

On 2010/01/05 19:33, Charles Lindsey wrote:
> On Tue, 05 Jan 2010 07:12:37 -0000, Martin J. Dürst
> <duerst@it.aoyama.ac.jp> wrote:
>
>> Hello Charles,
>>
>> Bob and Joseph have said most that needs to be said. In summary, I
>> don't think there's anything wrong with respect to escaping in Frank's
>> draft. If you can point to anything specific, I'll have another look
>> at it. (see below for details)
>>
>> On 2010/01/05 3:12, Charles Lindsey wrote:
>
>>> is it REQUIRED for the <sub-delims> if the particular scheme does not
>>> use any of them as delimiters? RFC 3986 seems to imply not, so I would
>>> expect that in
>>> news:foo@bar.!#$%&'*+/=?^`{|}.example
>>> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and
>>> hence can occur in a Message-ID) I would have to percent-encode the '#'.
>>> '/' and '?', but not the others.
>>
>> Sorry, but you also have to encode characters that are not allowed in
>> URIs at all, i.e. '{', '}', '`', "'", "^", and "|". Bob mentioned
>> these, but wasn't very definitive and didn't give a reason. And of
>> course, as Bob mentioned, "%" has to be escaped.
>
> Ah! I had not spotted that there existed characters that were neither
> <reserved> nor <unreserved>. Does RFC 3986 have anything to say about
> them, or does silence imply that encoding is needed?

The later. This is different in RFC 2396, see 
http://tools.ietf.org/html/rfc2396#section-2.4.3.

> Anyway, it seems the list of things needing percent encoding in that
> example has now expanded to at least "#%'/?^`{|}", leaving just "!$&*+="
> (which are all <sub-delims>) to discuss. Joseph Anthony seems to be
> saying that these need NOT be encoded. Do you agree?

Yes.

>> Frank seems to have taken the view that
>>> all <sub-delims> need to be encoded, though he does at one point permit
>>> '*' to appear unencoded (and it was indeed explicitly allowed in RFC
>>> 1738), which appears to be inconsistent wuth his stance elsewhere
>
> And indeed it is '*' which would be a real pain if it had to be encoded,
> since it is so much used in wildmats.
>>
>> I have difficulties understanding:
>>
>> Characters not directly allowed in this part of an
>> [RFC3986] URI have to be percent-encoded, minimally anything that is
>> not <unreserved>, no ":" (colon), and doesn't belong to the
>> <sub-delims>.
>
> Yes, that is the paragraph I was concerned with, since the mention of
> colon needs to be removed because it can no longer occur in a
> message-id. So I have to rewite it anyway, and there are just too many
> double negatives in it at present for it to be comprehensible.
>>
>> I think this may be slightly better:
>>
>> Characters not directly allowed in this part of an
>> [RFC3986] URI have to be percent-encoded. This at a minimum includes
>> anything that is not <unreserved>, is not a ":" (colon), and does
>> not belong to the <sub-delims>.
>
> OK, that looks like a better basis to start from, but still has too many
> 'not's in it for my taste.

For me, too.

> But it is then clear that the <sub-delims>
> are exempt, so that '*' is safe. Would it also be in order to say that
> it is always in order to percent-encode ANYTHING (even ALPHAs) is you
> feel like being awkward, in which case the meaning is always the same as
> if they were decoded before interpretation? I might even include an
> exhaustive list of all the ones where encoding was REQUIRED. Note that
> the totality of allowed characters in a message-id is now just the
> <atext>s from RFC 5322, plus ".[]".
>
>> Also, looking e.g. at
>>
>> mid-atext = ALPHA / DIGIT / ; RFC 2822 <atext>
>> "!" / "$" / "&" / "'" / ; allowed sub-delims
>> "*" / "+" / "=" / ; allowed sub-delims
>> "-" / "_" / "~" / ; allowed unreserved
>> "%23" / "%25" / "%2F" / ; "#" / "%" / "/"
>> "%3F" / "%5E" / "%60" / ; "?" / "^" / "`"
>> "%7B" / "%7C" / "%7D" ; "{" / "|" / "}"
>
> Fortunately, all those <mid-*> rules are now gone, which makes life
> considerably simpler. Message-ids in news are now identical to those in
> RFC 5322, except that '>' is still forbidden.
>
>>> And he also includes an example
>>> news://news.gmane.org/p0624081dc30b8699bf9b@%5B10.20.30.108%5D
>>> where I would have thought he could have shown
>>> news://news.gmane.org/p0624081dc30b8699bf9b@[10.20.30.108]
>>
>> According to RFC 3986, '[' and ']' are only allowed for IPv6
>> addresses, i.e. inside <authority>.
>
> Ah! I had not spotted that they were <gen-delims>.
>
> So thanks for your help. I think I can probably rewrite that paragpraph
> now, and then the job is done.

Yes, great.   Regards,   Martin.


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 7 January 2010 01:45:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:13 UTC