anyURI values: escaped or not? from Dave Peterson on 2005-05-12 (www-xml-schema-comments@w3.org from April to June 2005)

From: Dave Peterson <davep@iit.edu>
Date: Thu, 12 May 2005 11:57:29 -0400
To: Schema Comments <www-xml-schema-comments@w3.org>
Cc: Schema IG <w3c-xml-schema-ig@w3.org>
Message-Id: <a0621020abea92b31138f@[192.168.0.2]>

In the description of anyURI, the spec first says:

>The mapping from anyURI values to URIs is as 
>defined by the URI reference escaping procedure 
>defined in Section 5.4 Locator Attribute of [XML 
>Linking Language]

This implies that the anyURI value corresponding to a given URI must
have its URI-illegal characters *unescaped*, since the mapping referred
to surely must escape any "real" percent signs.  If some or all of the
URI-illegal characters are already escaped, how is an implementation
to know this?

On the other hand, shortly thereafter the spec says:

>Note:  Spaces are, in principle, allowed in the 
>·lexical space· of anyURI, however, their use is 
>highly discouraged (unless they are encoded by 
>%20).

This implies that spaces are best escaped in the lexical representations;
the WG has asserted that the lexical mapping is the identity, so this
seems to be saying that at least spaces *should* be pre-escaped.

There needs to be an explanation of what is to be done here.
-- 
Dave Peterson
SGMLWorks!

davep@iit.edu

Received on Thursday, 12 May 2005 15:57:38 UTC