Re: When is percent-encoding required.

> I find the question of just what needs to be percent-emcoded is hard to
> deduce from RFC 3986. Clearly, anything in <gen-delims> MUST be
> percent-encoded except when used as delimiters, so that agents can divide a
> URI into scheme, authority, path, query, and fragment components even before
> they recognise that it is a news or nntp URI. But is it REQUIRED for the
> <sub-delims> if the particular scheme does not use any of them as
> delimiters? RFC 3986 seems to imply not, so I would expect that in
>   news:foo@bar.!#$%&'*+/=?^`{|}.example
> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and hence
> can occur in a Message-ID) I would have to percent-encode the '#'. '/' and
> '?', but not the others. Frank seems to have taken the view that all
> <sub-delims> need to be encoded, though he does at one point permit '*' to
> appear unencoded (and it was indeed explicitly allowed in RFC 1738), which
> appears to be inconsistent wuth his stance elsewhere

For certain, you should percent-encode that "%" as well, but I'm
inclined to believe you should percent encode the "^`{|}" also.  I
think this would be the correct normalized form:
news:foo@bar.!%23$%25&'*+%2F=%3F%5E%60%7B%7C%7D.example  (I have
assumed that the intent was for that to be parsed as the path
component.)

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

However, I believe virtually all URI parsers will interpret
"news:foo@bar.!%23$%25&'*+%2F=%3F^`{|}.example" as intended.

-Bob Aman

Received on Monday, 4 January 2010 20:28:22 UTC