The use of XML syntax in XML Query

I am very surprised that the W3C is proposing to standardise a language
that uses syntactic constructs such as

<...> ... </...>

&#...;

<!-- ... -->

<![CDATA[ ... ]>

in a format which is _not_ XML.
This just seems guaranteed to ensure user confusion.

Xquery is claimed to be the "human readable" syntax as opposed to the
XML syntax of XqueryX, however this human readable syntax has now
adopted so many XML features I suggest that it should be made into an
XML format in its own right. There is still a use case for an XML
syntax that "fully expands" the Xpath expression syntax into an XML tree,
but that does not mean that XQuery, like XSLT, could not be defined as
layered over an XML parser.

One of the main benefits of using XML as a syntax for a new text based
format is that lots of tricky i18n and encoding issues are dealt with at
that level. By choosing not to be XML, Xquery loses these benefits and,
as far as I can see in the drafts, proposes no alternative.

A couple of small examples, although it is easy to generate more:

*
Xquery includes a production CharRef which allows the syntax &#xe9;
As far as I can see the meaning of this construct is not specified but
one assumes that it means the same as in XML, and denotes an e-acute.

In XSLT I can write my stylesheet in any encoding I want, but still
query the full range of XML documents. For example

<xsl:value-of select="&#xe9;"/>

would return the value of the element with name e-acute.

In Xquery, as in Xpath, you can not use & within a Qname and so there
is no equivalent to this XSLT construct in Xquery, one has to write out
the Qname with character data, which means that the encoding of the
query document has to include these characters. (Perhaps all Xquery
documents are in utf8? in which case this is not an issue technically
but might still be inconvenient for users. However I can see no mention
of the possible encodings of a query document within the current draft so
it's hard to be sure what is proposed here.


*
The introduction states that any expression that is both a valid
Xpath2 and Xquery1 expression will generate the same result in both
languages. There is a technical sense in which this is true but the fact
that Xpath is most commonly used in XSLT, ie in XML, and Xquery is not
XML, means that the user perception will be that many expressions are 
valid in both contexts but have wildly different results.

It is very common to XML-quote the apostrophes used to delimit Xpath
strings in order to fit the expression into an XML attribute, but note
this is using the XML entity or character reference syntax on Xpath
delimiters, something that is not permitted in Xquery.

'a&apos;=&39;b'

is, as far as I can make out, valid as


a "raw" Xpath expression 
in which case it has value a string of length 13

but usually when people write Xpaths they write them as they appear in
XML/XSLT in which case the above is a legal Xpath which is just
an XML-encoded version of 'a'='b' which is a legal expression with value
boolean false().

However Xquery introduces (as far as I can tell) a new incompatible
interpretation of this string, namely as a single string a'='b
of length 5.

Given that Xpaths (and one assumes in the future, Xqueries) are often
written by XML tools such as XSLT which don't give the user a lot of
control over the serialisation of character data, the non-XML
interpretation of XML character references by Xquery seems very
worrying to me.


Changing Xquery to be an XML format would not require so many changes in
the document. The parts of the grammar copied from the XML grammar could be
removed. Incidentally those parts highlight another problem with
copying rather than using XML, they use the character productions from
XML 1.0 but XML 1.1 is (perhaps) going to change them. If Xquery
referenced XML it would be a lot more straightforward managing the
issues related to unicode 3.x characters in QNames. (At least it would
be XML's problem rather than Xquery's)

Xquery would then need to introduce specific constructs to generate XML
comments (cf xsl:comment) as <!-- would no longer be a reliable
mechanism. Also, probably, you'd want a top level element in some Xquery
namespace just to wrap the query up as a well formed document.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Received on Thursday, 3 January 2002 06:48:02 UTC