RE: non-XML characters in text nodes

> I am trying to reconcile these statements:
> 
> 1. http://www.w3.org/TR/xslt20/#section-Calling-Extension-Functions
> 
>   "NOTE: Implementations are not required to perform full 
> validation of
>   values returned by extension functions. For example, the effect of
>   returning a string containing characters that are not legal XML 
>   characters is implementation-defined."
> 
> 2. http://www.w3.org/TR/query-datamodel/#TextNode
> 
>   "Text nodes encapsulate consecutive XML character data."
> 
> 3. http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#string
> 
>   "[Definition:]  The string datatype represents character strings in
>   XML. The ·value space· of string is the set of finite-length
>   sequences of characters (as defined in [XML 1.0 (Second Edition)])
>   that ·match· the Char production from [XML 1.0 (Second Edition)]."
> 
> So it is pretty clear to me that there is no way that a node 
> in an XPath/XQuery/XSLT node tree can contain non-XML 
> characters; the node must be serializable as XML.

What we are trying to say is that it is an error for an extension function
(or the unparsed-text() function) to return strings containing characters
that are illegal in XML; but we aren't requiring implementations to detect
this error immediately. The effect of the error is implementation-defined;
it might, for example, result in the processor spitting out ill-formed XML. 

I'm not sure if you are trying to say that this is not clearly stated, or if
you are saying that you disagree with the decision we have made. Do you
think we should require vendors to check each character in the returned
string to see it it is OK? We felt that this decision ought to be left to
the implementor - who then has the choice, of course, of allowing it to be
made by the user.

We have some other situations which we say are errors, but which the
implementation is not obliged to detect, and which therefore give undefined
results: an example is reading and writing the same resource in the course
of a transformation. I think this is a reasonable stance where we feel that
the cost of detecting the error may be very high.

> 
> Since it is implicit that an extension function that returns 
> a string must return an xs:string, the first paragraph I 
> quoted above is moot. 

It appears that the meaning of the word "moot" is moot, in the sense that a
lot of people seem to be using the word in the opposite sense to the Oxford
Dictionary, which defines it as meaning "debateable". So I'm not actually
sure what you mean.

Michael Kay

Received on Thursday, 24 October 2002 18:55:36 UTC