RE: Entities in predicates

Michael,
 
Your assumption about the form of the element is correct (B&amp;O).  Thank you for your description of the differences between XPath and XQuery.  Can you tell me, though, what is gained by forcing the XQuery user/client to escape XML entities?  In real XML it is obvious, you must disambiguate a "<" in content from the start of a new element.  It is not obvious to me that you must do the same in an XQuery since the context makes it clear how the "<" is being used.  It seems to me that XQuery StringLiteral should be in the format of the InfoSet, thus [. = "B&O < Reading"] should be legal and valid for both XPath and XQuery.  The XQuery engine, just like an XPath engine, can easily tell that the & in this case is not an entity reference and can correctly interpret the "<" as literal content of the StringLiteral.
 
Thanks,
Steve Condas
 

Michael Kay <mhk@mhk.me.uk> wrote:
I assume your element is actually written as <Railroad>B&amp;O</Railroad> (otherwise it would not be well-formed XML).
 
XPath and XQuery are deliberately different here. XPath is designed to be embedded in XML documents, for example XSLT, XML Schema, or Schematron documents. It's therefore expected that the XML parser will preprocess the XPath expressions to expand any entity or character references. The XPath grammar describes the syntax after performing this expansion. Alternatively, XPath expressions might be written as string literals embedded in languages such as C or Java, in which case the C or Java escaping conventions will be used instead of the XML conventions, for example newline will be written \n rather than &#xa;.
 
XQuery is a free-standing language and isn't XML-based, so it has to have its own machinery for escaping special characters, and it has chosen a mechanism that is very close (but not identical) to the one used in XML.
 
When you write XPath-embedded-in-XML, and when you write XQuery, the rules end up being very similar: in both cases special characters such as & in string literals must be written &amp;. The difference between the two is in handling characters outside string literals. XPath-embedded-in-XML requires the "<" operator to be escaped as &lt; while XQuery requires it to be unescaped, and distinguishes it from XML-like markup by its syntactic context. 
 
Michael Kay
http://www.saxonica.com/


---------------------------------
From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On Behalf Of Steve Condas
Sent: 05 October 2004 20:03
To: www-ql@w3.org
Subject: Entities in predicates



I am new to XPath/XQuery, so please forgive my ignorance.  I am caught between multiple interpretations of XPath/XQuery StringLiteral that are causing be development difficulties.  I have an XML element that contains an entity (e.g. <Railroad>B&O</Railroad>), and I am trying to find that element using two different tools.  In one tool, Java/Jaxen, I must use a predicate of the form [. = 'B&O'] to select the node.  In another tool, NeoCore XMS I must use a predicate of the form [. = 'B&amp;O'] to select the node.  Finally, I have found that the Mark Logic XQuery engine will accept either predicate phrasing and return the desired node.
 
I have examined the EBNF for XPath 1.0,  XPath 2.0 (Draft), and XQuery 1.0 (Draft).  For the two XPath specs, the EBNF for StringLiteral is:
 
('"' (('"' '"') | [^"])* '"') | ("'" (("'" "'") | [^'])* "'")
 
The XQuery EBNF is:
 
('"' (PredefinedEntityRef | CharRef | ('"' '"') | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | ("'" "'") | [^'&])* "'")
 
Should the XPath 2.0 and XQuery 1.0 EBNF for StringLiteral be identical?  Which of my three tools (Jaxen, NeoCore, MarkLogic) is implementing the target spec correctly?  From a developers perspective, it seems as though the Mark Logic implmentation is the ideal behavior since I don't have to expand entities to entity references before submitting my query.
 
Thanks in advance,
Steve Condas

Received on Wednesday, 6 October 2004 13:26:14 UTC