- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 18 Jun 2003 12:52:32 -0400
- To: Tim Berners-Lee <timbl@w3.org>, "Roy T. Fielding" <fielding@apache.org>
- Cc: uri@w3.org
At 16:13 03/06/16 -0400, Tim Berners-Lee wrote: >2.3 Unreserved characters para 2 > >"fear of creating a conflict" and "counterproductive" are much too vague >for a foundational spec. > >Replace > >"Therefore, unreserved characters should not be escaped unless the URI is >being used in a context that does not allow the unescaped character to >appear. URI normalization processes may unescape sequences in the ranges >of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore >(%5F), or tilde (%7E) without fear of creating a conflict, but unescaping >the other mark characters is usually counterproductive." > >with: > >"URIs which differ only be the replacement of unreserved characters by the >percent-encoded strings are equivalent: they identify the same resources. >For the purposes of canonicalisation, it is recommended that characters in >ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore >(%5F), or tilde (%7E) be unescaped, but other characters in the mark set >be escaped." > > >___________________________________ That's all for now _________________ Tim, are you trying to say that ".", "!", "*", "'", "(", and ")" should always be escaped? I haven't seen too many "!", "*", or "'", but recommending this for "." would be completely against current practice and would force us to use ugly stuff such as http://example.org/document%2Ehtml. Surely nobody would do that. And XPointers might look ugly, too. Also, it seems that you are saying that for these, escaped and unescaped are interchangeable, but the escaped form should be used because the unescaped form is dangerous. But what I understand Roy is trying to say in the current wording is that escaping or unescaping one of these is not guaranteed to produce the same thing. Also, I don't see a need to use the unescaped form for the characters above. As far as I am aware, they are part of the invariant set of the ISO 646 family (which the "~" is not), and they are also more available on keyboards than "~". When I read section 2.3, I noted that I would want to see more explanation about "counterproductive". I remember a test by Larry, where he found that %2e and %2E were correctly unescaped at least on http://www.w3.org, but there may be circumstances where this is not the case. Regards, Martin. Regards, Martin.
Received on Wednesday, 18 June 2003 12:53:05 UTC