RE: Entities in predicates from Michael Kay on 2004-10-06 (www-ql@w3.org from October to December 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Wed, 6 Oct 2004 14:37:06 +0100
To: "'Steve Condas'" <scondas@yahoo.com>, <www-ql@w3.org>
Message-ID: <E1CFBzC-00039G-4R@frink.w3.org>
Actually, looking at the spec more carefully, the only character you are
required to escape in an XQuery string literal is "&". I think that's a good
rule; if someone accidentally leaves off the ";" at the end of an entity
reference, or forgets the "#" in a character references such as &#xa;, you
want that to be an error, rather than just leading to "no results" after a
long search.
 
Michael Kay
http://www.saxonica.com/


  _____  

From: Steve Condas [mailto:scondas@yahoo.com] 
Sent: 06 October 2004 14:26
To: Michael Kay; www-ql@w3.org
Subject: RE: Entities in predicates


Michael,
 
Your assumption about the form of the element is correct (B&amp;O).  Thank
you for your description of the differences between XPath and XQuery.  Can
you tell me, though, what is gained by forcing the XQuery user/client to
escape XML entities?  In real XML it is obvious, you must disambiguate a "<"
in content from the start of a new element.  It is not obvious to me that
you must do the same in an XQuery since the context makes it clear how the
"<" is being used.  It seems to me that XQuery StringLiteral should be in
the format of the InfoSet, thus [. = "B&O < Reading"] should be legal and
valid for both XPath and XQuery.  The XQuery engine, just like an XPath
engine, can easily tell that the & in this case is not an entity reference
and can correctly interpret the "<" as literal content of the StringLiteral.
 
Thanks,
Steve Condas
 

Michael Kay <mhk@mhk.me.uk> wrote:

I assume your element is actually written as <Railroad>B&amp;O</Railroad>
(otherwise it would not be well-formed XML).
 
XPath and XQuery are deliberately different here. XPath is designed to be
embedded in XML documents, for example XSLT, XML Schema, or Schematron
documents. It's therefore expected that the XML parser will preprocess the
XPath expressions to expand any entity or character references. The XPath
grammar describes the syntax after performing this expansion. Alternatively,
XPath expressions might be written as string literals embedded in languages
such as C or Java, in which case the C or Java escaping conventions will be
used instead of the XML conventions, for example newline will be written \n
rather than &#xa;.
 
XQuery is a free-standing language and isn't XML-based, so it has to have
its own machinery for escaping special characters, and it has chosen a
mechanism that is very close (but not identical) to the one used in XML.
 
When you write XPath-embedded-in-XML, and when you write XQuery, the rules
end up being very similar: in both cases special characters such as & in
string literals must be written &amp;. The difference between the two is in
handling characters outside string literals. XPath-embedded-in-XML requires
the "<" operator to be escaped as &lt; while XQuery requires it to be
unescaped, and distinguishes it from XML-like markup by its syntactic
context. 
 
Michael Kay
http://www.saxonica.com/


  _____  

From: www-ql-request@w3.org [mailto:www-ql-request@w3.org] On Behalf Of
Steve Condas
Sent: 05 October 2004 20:03
To: www-ql@w3.org
Subject: Entities in predicates


I am new to XPath/XQuery, so please forgive my ignorance.  I am caught
between multiple interpretations of XPath/XQuery StringLiteral that are
causing be development difficulties.  I have an XML element that contains an
entity (e.g. <Railroad>B&O</Railroad>), and I am trying to find that element
using two different tools.  In one tool, Java/Jaxen, I must use a predicate
of the form [. = 'B&O'] to select the node.  In another tool, NeoCore XMS I
must use a predicate of the form [. = 'B&amp;O'] to select the node.
Finally, I have found that the Mark Logic XQuery engine will accept either
predicate phrasing and return the desired node.
 
I have examined the EBNF for XPath 1.0,  XPath 2.0 (Draft), and XQuery 1.0
(Draft).  For the two XPath specs, the EBNF for StringLiteral is:
 
('"' (('"' '"') | [^"])* '"') | ("'" (("'" "'") | [^'])* "'")
 
The XQuery EBNF is:
 
('"' ( <http://www.w3.org/TR/xquery/#prod-xquery-PredefinedEntityRef>
PredefinedEntityRef |  <http://www.w3.org/TR/xquery/#prod-xquery-CharRef>
CharRef | ('"' '"') | [^"&])* '"') | ("'" (
<http://www.w3.org/TR/xquery/#prod-xquery-PredefinedEntityRef>
PredefinedEntityRef |  <http://www.w3.org/TR/xquery/#prod-xquery-CharRef>
CharRef | ("'" "'") | [^'&])* "'")
 
Should the XPath 2.0 and XQuery 1.0 EBNF for StringLiteral be identical?
Which of my three tools (Jaxen, NeoCore, MarkLogic) is implementing the
target spec correctly?  From a developers perspective, it seems as though
the Mark Logic implmentation is the ideal behavior since I don't have to
expand entities to entity references before submitting my query.
 
Thanks in advance,
Steve Condas
Received on Wednesday, 6 October 2004 13:37:42 UTC