- From: Charles Lindsey <chl@clerew.man.ac.uk>
- Date: Wed, 06 Jan 2010 17:33:48 -0000
- To: URI <uri@w3.org>
On Tue, 05 Jan 2010 07:12:37 -0000, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote: > Hello Charles, > > Bob and Joseph have said most that needs to be said. In summary, I don't > think there's anything wrong with respect to escaping in Frank's draft. > If you can point to anything specific, I'll have another look at it. > (see below for details) > > On 2010/01/05 3:12, Charles Lindsey wrote: >> is it REQUIRED for the <sub-delims> if the particular scheme does not >> use any of them as delimiters? RFC 3986 seems to imply not, so I would >> expect that in >> news:foo@bar.!#$%&'*+/=?^`{|}.example >> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and >> hence can occur in a Message-ID) I would have to percent-encode the '#'. >> '/' and '?', but not the others. > > Sorry, but you also have to encode characters that are not allowed in > URIs at all, i.e. '{', '}', '`', "'", "^", and "|". Bob mentioned these, > but wasn't very definitive and didn't give a reason. And of course, as > Bob mentioned, "%" has to be escaped. Ah! I had not spotted that there existed characters that were neither <reserved> nor <unreserved>. Does RFC 3986 have anything to say about them, or does silence imply that encoding is needed? Anyway, it seems the list of things needing percent encoding in that example has now expanded to at least "#%'/?^`{|}", leaving just "!$&*+=" (which are all <sub-delims>) to discuss. Joseph Anthony seems to be saying that these need NOT be encoded. Do you agree? > > Frank seems to have taken the view that >> all <sub-delims> need to be encoded, though he does at one point permit >> '*' to appear unencoded (and it was indeed explicitly allowed in RFC >> 1738), which appears to be inconsistent wuth his stance elsewhere And indeed it is '*' which would be a real pain if it had to be encoded, since it is so much used in wildmats. > > I have difficulties understanding: > > Characters not directly allowed in this part of an > [RFC3986] URI have to be percent-encoded, minimally anything that is > not <unreserved>, no ":" (colon), and doesn't belong to the > <sub-delims>. Yes, that is the paragraph I was concerned with, since the mention of colon needs to be removed because it can no longer occur in a message-id. So I have to rewite it anyway, and there are just too many double negatives in it at present for it to be comprehensible. > > I think this may be slightly better: > > Characters not directly allowed in this part of an > [RFC3986] URI have to be percent-encoded. This at a minimum includes > anything that is not <unreserved>, is not a ":" (colon), and does > not belong to the <sub-delims>. OK, that looks like a better basis to start from, but still has too many 'not's in it for my taste. But it is then clear that the <sub-delims> are exempt, so that '*' is safe. Would it also be in order to say that it is always in order to percent-encode ANYTHING (even ALPHAs) is you feel like being awkward, in which case the meaning is always the same as if they were decoded before interpretation? I might even include an exhaustive list of all the ones where encoding was REQUIRED. Note that the totality of allowed characters in a message-id is now just the <atext>s from RFC 5322, plus ".[]". > Also, looking e.g. at > > mid-atext = ALPHA / DIGIT / ; RFC 2822 <atext> > "!" / "$" / "&" / "'" / ; allowed sub-delims > "*" / "+" / "=" / ; allowed sub-delims > "-" / "_" / "~" / ; allowed unreserved > "%23" / "%25" / "%2F" / ; "#" / "%" / "/" > "%3F" / "%5E" / "%60" / ; "?" / "^" / "`" > "%7B" / "%7C" / "%7D" ; "{" / "|" / "}" Fortunately, all those <mid-*> rules are now gone, which makes life considerably simpler. Message-ids in news are now identical to those in RFC 5322, except that '>' is still forbidden. >> And he also includes an example >> news://news.gmane.org/p0624081dc30b8699bf9b@%5B10.20.30.108%5D >> where I would have thought he could have shown >> news://news.gmane.org/p0624081dc30b8699bf9b@[10.20.30.108] > > According to RFC 3986, '[' and ']' are only allowed for IPv6 addresses, > i.e. inside <authority>. Ah! I had not spotted that they were <gen-delims>. So thanks for your help. I think I can probably rewrite that paragpraph now, and then the job is done. -- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Web: http://www.cs.man.ac.uk/~chl Email: chl@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Received on Wednesday, 6 January 2010 17:34:20 UTC