RE: Entities in predicates from Michael Kay on 2004-10-05 (www-ql@w3.org from October to December 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Tue, 5 Oct 2004 23:13:49 +0100
To: "'Steve Condas'" <scondas@yahoo.com>, <www-ql@w3.org>
Message-ID: <E1CExZg-0004ZI-Db@frink.w3.org>
I assume your element is actually written as <Railroad>B&amp;O</Railroad>
(otherwise it would not be well-formed XML).
 
XPath and XQuery are deliberately different here. XPath is designed to be
embedded in XML documents, for example XSLT, XML Schema, or Schematron
documents. It's therefore expected that the XML parser will preprocess the
XPath expressions to expand any entity or character references. The XPath
grammar describes the syntax after performing this expansion. Alternatively,
XPath expressions might be written as string literals embedded in languages
such as C or Java, in which case the C or Java escaping conventions will be
used instead of the XML conventions, for example newline will be written \n
rather than &#xa;.
 
XQuery is a free-standing language and isn't XML-based, so it has to have
its own machinery for escaping special characters, and it has chosen a
mechanism that is very close (but not identical) to the one used in XML.
 
When you write XPath-embedded-in-XML, and when you write XQuery, the rules
end up being very similar: in both cases special characters such as & in
string literals must be written &amp;. The difference between the two is in
handling characters outside string literals. XPath-embedded-in-XML requires
the "<" operator to be escaped as &lt; while XQuery requires it to be
unescaped, and distinguishes it from XML-like markup by its syntactic
context. 
 
Michael Kay
http://www.saxonica.com/


  _____  

From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On Behalf Of
Steve Condas
Sent: 05 October 2004 20:03
To: www-ql@w3.org
Subject: Entities in predicates


I am new to XPath/XQuery, so please forgive my ignorance.  I am caught
between multiple interpretations of XPath/XQuery StringLiteral that are
causing be development difficulties.  I have an XML element that contains an
entity (e.g. <Railroad>B&O</Railroad>), and I am trying to find that element
using two different tools.  In one tool, Java/Jaxen, I must use a predicate
of the form [. = 'B&O'] to select the node.  In another tool, NeoCore XMS I
must use a predicate of the form [. = 'B&amp;O'] to select the node.
Finally, I have found that the Mark Logic XQuery engine will accept either
predicate phrasing and return the desired node.
 
I have examined the EBNF for XPath 1.0,  XPath 2.0 (Draft), and XQuery 1.0
(Draft).  For the two XPath specs, the EBNF for StringLiteral is:
 
('"' (('"' '"') | [^"])* '"') | ("'" (("'" "'") | [^'])* "'")
 
The XQuery EBNF is:
 
('"' ( <http://www.w3.org/TR/xquery/#prod-xquery-PredefinedEntityRef>
PredefinedEntityRef |  <http://www.w3.org/TR/xquery/#prod-xquery-CharRef>
CharRef | ('"' '"') | [^"&])* '"') | ("'" (
<http://www.w3.org/TR/xquery/#prod-xquery-PredefinedEntityRef>
PredefinedEntityRef |  <http://www.w3.org/TR/xquery/#prod-xquery-CharRef>
CharRef | ("'" "'") | [^'&])* "'")
 
Should the XPath 2.0 and XQuery 1.0 EBNF for StringLiteral be identical?
Which of my three tools (Jaxen, NeoCore, MarkLogic) is implementing the
target spec correctly?  From a developers perspective, it seems as though
the Mark Logic implmentation is the ideal behavior since I don't have to
expand entities to entity references before submitting my query.
 
Thanks in advance,
Steve Condas
Received on Tuesday, 5 October 2004 22:14:24 UTC