URIs versus wannabe-URIs

In a number of places in our specifications we specify that a "URI" or "URI
Reference" must be supplied. An example is the href attribute of
xsl:include; another example is the module namespace in XQuery. (which "must
contain a valid URI").

Many other W3C specifications, including XML Schema, XLink, and XML
Namespaces have a less strict rule: they allow what I call a "wannabe-URI",
that is, a string which would be a legal URI if it were escaped. For
example, "file:///My Documents/module.xsl" is not a legal URI, and cannot be
used in the href attribute of xsl:include as currently specified, although
it would be valid in the xlink:href attribute in XLink.

Many existing XSLT products are more liberal still, and allow strings such
as "c:\temp\test.xml". I'm not proposing that we go this far.

However, I would propose that all our interfaces that require URIs should
permit any string that is in the lexical space of the xs:anyURI type defined
in XML Schema, subject only to the fact that delimiters such as (") might be
reserved to recognize the end of the string.

I'm afraid there's a lot of messy detail here. For example, fn:doc() says
that any xs:anyURI value is acceptable, but then goes on to talk about how
relative URI references are handled without discussing the need to perform
escaping before you can decide whether it is a relative URI reference.
Equally, it doesn't make it clear whether "http://a/a b.xml" and
"http://a/a%20b.xml" are "the same URI Reference (after resolution to an
absolute URI Reference)". Since you can't resolve a wannabe-URI without
first escaping it, I think the answer has to be that they are.

Michael Kay

Received on Wednesday, 4 August 2004 06:06:37 UTC