- From: Tom Petch <nwnetworks@dial.pipex.com>
- Date: Fri, 29 Jan 2010 18:49:08 +0100
- To: "URI" <uri@w3.org>, "Charles Lindsey" <chl@clerew.man.ac.uk>
---- Original Message ----- From: "Charles Lindsey" <chl@clerew.man.ac.uk> Sent: Friday, January 15, 2010 6:16 PM > On Wed, 13 Jan 2010 18:09:50 -0000, Julien ÉLIE <julien@trigofacile.com> > wrote: > > > Hi Charles, > > > >> Here is the wording I now propose: > >> > >> According to [RFC 3968], characters that are in <gen-delims> (a subset > >> of <reserved>) MUST be percent-encoded (though it is not wrong to > >> encode others). Specifically, the characters allowed in <msg-id-core> > >> that must be encoded are > >> "/" "?" "#" "[" and "]" > >> Note that an agent which seeks to interpret a 'news' URI needs to > >> decode all these percent-encoded characters before passing it on to an > >> NNTP server to be acted upon. > >> > >> Comments anyone? > > > > MUSTn't "%" also be encoded? > > Ah yes! That pesky '%' which, for some strange reason, is not included in > <gen-delims> > > > > I see in to-be RFC 5538: > > > > mid-left = 1*( mid-atext / "." ) / ; <dot-atom-text> > > ( "%22" mid-quote "%22" ) ; <no-fold-quote> > > mid-right = 1*( mid-atext / "." ) / ; <dot-atom-text> > > ( "%5B" mid-literal "%5D" ) ; <no-fold-literal> > > mid-atext = ALPHA / DIGIT / ; RFC 2822 <atext> > > "!" / "$" / "&" / "'" / ; allowed sub-delims > > "*" / "+" / "=" / ; allowed sub-delims > > "-" / "_" / "~" / ; allowed unreserved > > "%23" / "%25" / "%2F" / ; "#" / "%" / "/" > > "%3F" / "%5E" / "%60" / ; "?" / "^" / "`" > > "%7B" / "%7C" / "%7D" ; "{" / "|" / "}" > > > well the final form of RFC 5538 is reverting to the <msg-id-core> syntax > of RFC 5537. So the cases we are actually interested in is the > intersection of (<gen-delims> plus '%') with <atext>. But that indeed does > inlcude '%'. > > > but if I have a message-ID that contains "%23", isn't is mandatory to > > convert it into "%2523" (URI)? > > But of course "%23" is not in <atext>, whatever nonsense we might have had > in <mid-atext>. > > So here is another attempt at my wording: > > According to [RFC 3968], characters that are in <gen-delims> (a subset > of <reserved>), together with the character "%", MUST be percent-encoded > (though it is not wrong to encode others). Apologies for coming to this so late but I do not think that this statement should pass unchallenged. One known exception is the use of [ and ] in IPv6 addresses and there could be others. I do not find RFC3968 an easy read but careful study suggests that what it says is a) No character may appear in a URI unless there is an ABNF rule saying that it may and, at most, that character set is limited to reserved and unreserved. b) URIs which differ in having a reserved character percent encoded or not are not equivalent. So a scheme can require reserved characters within a component to be percent-encoded but that then becomes a MUST for that scheme, else you do not have interoperability. In describing the treatment of characters, the RFC makes no distinction between the two subsets of reserved (gen-delims and sub-delims). The only difference is an administrative one, that the subset known as sub-delims appears as a set in several of the rules for data in components making them explicitly allowed. A specific scheme can take a different view (eg requiring all sub-delims to be percent-encoded within the data of a component). Tom Petch > Specifically, the characters > allowed in <msg-id-core> > that must be encoded are > "/" "?" "#" "[" "]" and "%" > Note that an agent which seeks to interpret a 'news' URI needs to > decode all these percent-encoded characters before passing it on to an > NNTP server to be acted upon. > > -- > Charles H. Lindsey
Received on Friday, 29 January 2010 18:50:08 UTC