Re: "semantics" of URI

On Friday, June 6, 2003, at 11:42  AM, Larry Masinter wrote:
>> 2.3, 1st para after BNF block.  "Unreserved characters can be escaped
>> without changing the semantics of a URI".  This is at best highly
>> misleading in the case of URIs used as XML namespace names, whose only
>> semantic is identification and comparison, and where comparison is
>> typically done using strcmp(), and thus escaping an 'a' character will
>> indeed change its semantics.
>
> and
>> That isn't a semantic.
>
> I think we might improve the situation by avoiding using the
> phrase "the semantics of a URI". It's been made clear in many
> circumstances that "semantics" is a heavily loaded
> word, and that use of "the semantics of a URI" presupposes
> that there might be a single "semantics" associated with a
> URI string.

I don't see how that would improve anything.  If there were a single
semantic then it wouldn't be plural.

> First, this document shouldn't be defining "semantics" except
> to give the URI scheme definition the opportunity for doing so,

I disagree.  There are many semantics having to do with identifiers
that are independent of scheme and cannot under any circumstances
be voided by the scheme.  Those semantics are defined here.

> and second, the only claim to "semantics" should be restricted
> to the process of resource identification. URI strings
> may also be involved in some other operations that involve
> "semantics" (such as namespace names, or tokens in RDF),
> and RFC 2396's definition need not interfere with those
> applications.

Again, I disagree.  If we allow a false semantic to be available
to the implementer as if it were an option, then implementations
may create dependencies on that falsehood.  We are better off telling
the implementer that it is wrong and then letting them judge whether
or not the false negatives are worth the gain in efficiency.  They
still must be told the negative is a false one, because it will
bite them in practice if they assume those two URIs are different.

> So I suggest changing 2.3, 1st para after BNF block to read:
>
> from
>
> "Unreserved characters can be escaped without changing
>  the semantics of a URI"
>
> to
>
> "Escaping unreserved characters in a URI should not change
>  the resource identified."

"does not" or "must not" is more appropriate.  I will make that change.

> or even more accurately:
>
> "URI schemes should be defined such that the escaping of
>  unreserved characters does not change the resource identified."

These rules are supposed to be scheme-independent.

....Roy

Received on Friday, 6 June 2003 15:29:45 UTC