F+O: anyURI to string conversion

The description of the fn:string function in F+O section 2.3 says:

If $arg is an atomic value, then the function returns the same string as is
returned by the expression "$arg cast as xs:string" (see 17 Casting), except
in the case below:

    * If the type of $arg is xs:anyURI, it is converted to a string without
any escaping of special characters.


However, it's not clear (a) whether a cast from xs:anyURI to string actually
does any escaping of special characters, and (b) if it does, why fn:string
should be any different.

The casting rules for xs:anyURI to string are in 17.7, and are covered by
the bullet:

<quote>
In all other cases, TV is the XPath canonical representation of SV (or the
result of casting the value to a string, in the case of types that have no
canonical lexical representation).
</quote>

This is the only place in the document that the phrase "XPath canonical
representation" is used; for want of a definition I assume that it means the
same as "canonical lexical representation".

The clause in parentheses is clearly self-referential: we are defining the
rules for casting to a string, so we cannot refer to them at the same time.
So it appears we have no workable rule covering data types that have no
canonical lexical representation.

xs:anyURI appears to be such a data type; no canonical lexical
representation for xs:anyURI is defined in Schema Part 2 section 3.2.17, nor
in the errata as far as I can establish.

Perhaps XML Schema intended to define the escaped form as the canonical
lexical representation (though actually, the escaping produced by the
referenced algorithm in XLink does not produce a canonical representation
because it allows a choice of upper or lower case hex digits). If it were
amended to do so, then casting an anyURI to a string would indeed invoke
escaping. But then why would we want to make the fn:string() function behave
differently?

Michael Kay

Received on Monday, 22 September 2003 12:28:41 UTC