- From: Bob Aman <sporkmonger@gmail.com>
- Date: Mon, 4 Jan 2010 12:27:49 -0800
- To: Charles Lindsey <chl@clerew.man.ac.uk>
- Cc: URI <uri@w3.org>
> I find the question of just what needs to be percent-emcoded is hard to > deduce from RFC 3986. Clearly, anything in <gen-delims> MUST be > percent-encoded except when used as delimiters, so that agents can divide a > URI into scheme, authority, path, query, and fragment components even before > they recognise that it is a news or nntp URI. But is it REQUIRED for the > <sub-delims> if the particular scheme does not use any of them as > delimiters? RFC 3986 seems to imply not, so I would expect that in > news:foo@bar.!#$%&'*+/=?^`{|}.example > (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and hence > can occur in a Message-ID) I would have to percent-encode the '#'. '/' and > '?', but not the others. Frank seems to have taken the view that all > <sub-delims> need to be encoded, though he does at one point permit '*' to > appear unencoded (and it was indeed explicitly allowed in RFC 1738), which > appears to be inconsistent wuth his stance elsewhere For certain, you should percent-encode that "%" as well, but I'm inclined to believe you should percent encode the "^`{|}" also. I think this would be the correct normalized form: news:foo@bar.!%23$%25&'*+%2F=%3F%5E%60%7B%7C%7D.example (I have assumed that the intent was for that to be parsed as the path component.) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" However, I believe virtually all URI parsers will interpret "news:foo@bar.!%23$%25&'*+%2F=%3F^`{|}.example" as intended. -Bob Aman
Received on Monday, 4 January 2010 20:28:22 UTC