W3C home > Mailing lists > Public > uri@w3.org > June 2003

Re: draft-fielding-uri-rfc2396bis-03

From: Martin Duerst <duerst@w3.org>
Date: Wed, 18 Jun 2003 12:52:32 -0400
Message-Id: <4.2.0.58.J.20030618123733.03db09b8@localhost>
To: Tim Berners-Lee <timbl@w3.org>, "Roy T. Fielding" <fielding@apache.org>
Cc: uri@w3.org

At 16:13 03/06/16 -0400, Tim Berners-Lee wrote:

>2.3 Unreserved characters para 2
>
>"fear of creating a conflict" and "counterproductive" are much too vague 
>for a foundational spec.
>
>Replace
>
>"Therefore, unreserved characters should not be escaped unless the URI is 
>being used in a context that does not allow the unescaped character to 
>appear. URI normalization processes may unescape sequences in the ranges 
>of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore 
>(%5F), or tilde (%7E) without fear of creating a conflict, but unescaping 
>the other mark characters is usually counterproductive."
>
>with:
>
>"URIs which differ only be the replacement of unreserved characters by the 
>percent-encoded strings are equivalent: they identify the same resources.
>For the purposes of canonicalisation, it is recommended that characters in 
>ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore 
>(%5F), or tilde (%7E) be unescaped, but other characters in the mark set 
>be escaped."
>
>
>___________________________________ That's all for now _________________

Tim, are you trying to say that ".", "!", "*", "'", "(", and ")"
should always be escaped? I haven't seen too many "!", "*", or "'",
but recommending this for "." would be completely against current
practice and would force us to use ugly stuff such as
http://example.org/document%2Ehtml. Surely nobody would do that.
And XPointers might look ugly, too.

Also, it seems that you are saying that for these, escaped and
unescaped are interchangeable, but the escaped form should be
used because the unescaped form is dangerous. But what I understand
Roy is trying to say in the current wording is that escaping or
unescaping one of these is not guaranteed to produce the same
thing.

Also, I don't see a need to use the unescaped form for the characters
above. As far as I am aware, they are part of the invariant set
of the ISO 646 family (which the "~" is not), and they are also
more available on keyboards than "~".

When I read section 2.3, I noted that I would want to see more
explanation about "counterproductive". I remember a test by Larry,
where he found that %2e and %2E were correctly unescaped at least
on http://www.w3.org, but there may be circumstances
where this is not the case.


Regards,    Martin.

Regards,    Martin.
Received on Wednesday, 18 June 2003 12:53:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:31 GMT