- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 07 Jan 2010 10:44:42 +0900
- To: Charles Lindsey <chl@clerew.man.ac.uk>, "uri@w3.org" <uri@w3.org>
Hello Charles, On 2010/01/05 19:33, Charles Lindsey wrote: > On Tue, 05 Jan 2010 07:12:37 -0000, Martin J. Dürst > <duerst@it.aoyama.ac.jp> wrote: > >> Hello Charles, >> >> Bob and Joseph have said most that needs to be said. In summary, I >> don't think there's anything wrong with respect to escaping in Frank's >> draft. If you can point to anything specific, I'll have another look >> at it. (see below for details) >> >> On 2010/01/05 3:12, Charles Lindsey wrote: > >>> is it REQUIRED for the <sub-delims> if the particular scheme does not >>> use any of them as delimiters? RFC 3986 seems to imply not, so I would >>> expect that in >>> news:foo@bar.!#$%&'*+/=?^`{|}.example >>> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and >>> hence can occur in a Message-ID) I would have to percent-encode the '#'. >>> '/' and '?', but not the others. >> >> Sorry, but you also have to encode characters that are not allowed in >> URIs at all, i.e. '{', '}', '`', "'", "^", and "|". Bob mentioned >> these, but wasn't very definitive and didn't give a reason. And of >> course, as Bob mentioned, "%" has to be escaped. > > Ah! I had not spotted that there existed characters that were neither > <reserved> nor <unreserved>. Does RFC 3986 have anything to say about > them, or does silence imply that encoding is needed? The later. This is different in RFC 2396, see http://tools.ietf.org/html/rfc2396#section-2.4.3. > Anyway, it seems the list of things needing percent encoding in that > example has now expanded to at least "#%'/?^`{|}", leaving just "!$&*+=" > (which are all <sub-delims>) to discuss. Joseph Anthony seems to be > saying that these need NOT be encoded. Do you agree? Yes. >> Frank seems to have taken the view that >>> all <sub-delims> need to be encoded, though he does at one point permit >>> '*' to appear unencoded (and it was indeed explicitly allowed in RFC >>> 1738), which appears to be inconsistent wuth his stance elsewhere > > And indeed it is '*' which would be a real pain if it had to be encoded, > since it is so much used in wildmats. >> >> I have difficulties understanding: >> >> Characters not directly allowed in this part of an >> [RFC3986] URI have to be percent-encoded, minimally anything that is >> not <unreserved>, no ":" (colon), and doesn't belong to the >> <sub-delims>. > > Yes, that is the paragraph I was concerned with, since the mention of > colon needs to be removed because it can no longer occur in a > message-id. So I have to rewite it anyway, and there are just too many > double negatives in it at present for it to be comprehensible. >> >> I think this may be slightly better: >> >> Characters not directly allowed in this part of an >> [RFC3986] URI have to be percent-encoded. This at a minimum includes >> anything that is not <unreserved>, is not a ":" (colon), and does >> not belong to the <sub-delims>. > > OK, that looks like a better basis to start from, but still has too many > 'not's in it for my taste. For me, too. > But it is then clear that the <sub-delims> > are exempt, so that '*' is safe. Would it also be in order to say that > it is always in order to percent-encode ANYTHING (even ALPHAs) is you > feel like being awkward, in which case the meaning is always the same as > if they were decoded before interpretation? I might even include an > exhaustive list of all the ones where encoding was REQUIRED. Note that > the totality of allowed characters in a message-id is now just the > <atext>s from RFC 5322, plus ".[]". > >> Also, looking e.g. at >> >> mid-atext = ALPHA / DIGIT / ; RFC 2822 <atext> >> "!" / "$" / "&" / "'" / ; allowed sub-delims >> "*" / "+" / "=" / ; allowed sub-delims >> "-" / "_" / "~" / ; allowed unreserved >> "%23" / "%25" / "%2F" / ; "#" / "%" / "/" >> "%3F" / "%5E" / "%60" / ; "?" / "^" / "`" >> "%7B" / "%7C" / "%7D" ; "{" / "|" / "}" > > Fortunately, all those <mid-*> rules are now gone, which makes life > considerably simpler. Message-ids in news are now identical to those in > RFC 5322, except that '>' is still forbidden. > >>> And he also includes an example >>> news://news.gmane.org/p0624081dc30b8699bf9b@%5B10.20.30.108%5D >>> where I would have thought he could have shown >>> news://news.gmane.org/p0624081dc30b8699bf9b@[10.20.30.108] >> >> According to RFC 3986, '[' and ']' are only allowed for IPv6 >> addresses, i.e. inside <authority>. > > Ah! I had not spotted that they were <gen-delims>. > > So thanks for your help. I think I can probably rewrite that paragpraph > now, and then the job is done. Yes, great. Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 7 January 2010 01:45:22 UTC