"semantics" of URI

Re 

> 2.3, 1st para after BNF block.  "Unreserved characters can be escaped 
> without changing the semantics of a URI".  This is at best highly 
> misleading in the case of URIs used as XML namespace names, whose only 
> semantic is identification and comparison, and where comparison is 
> typically done using strcmp(), and thus escaping an 'a' character will 
> indeed change its semantics.

and 
> That isn't a semantic.

I think we might improve the situation by avoiding using the
phrase "the semantics of a URI". It's been made clear in many
circumstances that "semantics" is a heavily loaded
word, and that use of "the semantics of a URI" presupposes
that there might be a single "semantics" associated with a
URI string.

First, this document shouldn't be defining "semantics" except
to give the URI scheme definition the opportunity for doing so,
and second, the only claim to "semantics" should be restricted
to the process of resource identification. URI strings
may also be involved in some other operations that involve
"semantics" (such as namespace names, or tokens in RDF),
and RFC 2396's definition need not interfere with those
applications.

So I suggest changing 2.3, 1st para after BNF block to read:

from

"Unreserved characters can be escaped without changing
 the semantics of a URI"

to

"Escaping unreserved characters in a URI should not change
 the resource identified."

or even more accurately:


"URI schemes should be defined such that the escaping of
 unreserved characters does not change the resource identified."

Received on Friday, 6 June 2003 14:42:22 UTC