RE: "semantics" of URI from Larry Masinter on 2003-06-08 (uri@w3.org from June 2003)

From: Larry Masinter <LMM@acm.org>
Date: Sun, 8 Jun 2003 09:55:06 -0700
To: "'Roy T. Fielding'" <fielding@apache.org>
Cc: <uri@w3.org>
Message-ID: <001001c32dde$b6ca3280$6ace8642@MASINTERPAD>

>    Escaping unreserved characters in a URI does not
>    change what resource is identified by that URI.

I think this is OK, although I'm uneasy about "does not".
Is it "will not", "should not", "must not"? Are we
establishing a conformance requirement, describing current
deployed software? In no cases does escaping unreserved
characters change the resource identified in all of the
deployed software in the world? 

>               However, it may change the result of a
>    URI comparison (section 6)

"some URI comparisons" might be better than "a URI comparison"

>                              potentially leading to
>    less efficient actions by an application. 

Not just "less efficient". In the case of caching, the result
is less efficient. In the case of namespaces, the result is
incorrect.

>                                Therefore, unreserved
>    characters should not be escaped unless the URI is being used in a
>    context that does not allow the unescaped character to appear.

I'd rather not encourage the use of such contexts, and even
in such contexts, some other mechanism might be better. Often
designers employ another layer of encoding. For example, URIs
inside XML don't URI-encode '&' as %26, but rather use XML
encoding as  &amp;.

I offer the following rewrite:

   Escaping unreserved characters in a URI should not change
   what resource is identified.  However, escaping characters
   may change the result of some URI comparisons (section 6),
   potentially leading to incorrect or inefficient behavior.
   Therefore, unreserved characters should not be escaped
   unnecessarily.

Received on Sunday, 8 June 2003 12:55:21 UTC