Re: [xml-dev] XSLT and XQuery from Jonathan Robie on 2002-01-04 (www-xml-query-comments@w3.org from January 2002)

From: Jonathan Robie <jonathan.robie@softwareag.com>
Date: Fri, 04 Jan 2002 18:43:22 -0500
To: Joe English <jenglish@flightlab.com>, www-xml-query-comments@w3.org, xml-dev@lists.xml.org
Message-Id: <5.1.0.14.0.20020104175303.0451a8c0@softwareag.com>
I believe it is technically feasible to integrate XSLT and XQuery along the 
lines suggested by James in:

http://www.jclark.com/xml/construct.html

However, I don't think it would be a good idea to try to do so before 
XQuery 1.0 is released. This is a cost-benefit decision for me. In the rest 
of this message, it may sound like I don't like the idea James proposes, 
but that's not really true. I think it is an interesting proposal that 
should be considered in due time, but not placed on the critical path for 
XQuery 1.0. The proposal raises a large number of procedural and technical 
questions, and I think this should be freely explored over a period of 
time, not made into a new set of requirements for an existing standards effort.

Here is the problem that James says he is trying to solve:

>Currently XSLT and XQuery use very similar syntaxes for element and
>attribute construction. For example,
>
><p>This is a <a href="{@ref}">link</a>.</p> works the same in both
>XSLT and XQuery.

There are actually many constructs which are similar, but different, in 
XSLT and XQuery, just as many of the features of Java and C++ are similar, 
but different. That does not necessarily mean that Java and C++ should be 
unified. There are compelling reasons to unify the path expression 
language. I am not yet convinced that there is an equally urgent need to 
unify element constructors, and if we did so, I would think that would best 
be done as part of a deeper integration of the two languages altogether. 
This is the sort of thing that requires careful exploratory work by a small 
group over time, and I don't think it should be placed on the critical path 
for XQuery 1.0.

>However, there are numerous minor differences:
>
>o  In XSLT, curly braces interpolate expressions inside attributes, but
>are just ordinary characters inside elements. Inside elements,
>expressions are interpolated by using elements of a distinguished
>namespace are used instead. In XQuery, curly braces are used
>uniformly inside attributes and elements.

XSLT could decide to allow curly braces in elements, with the same meaning 
as in XQuery. This would not require deep cooperation between the two 
working groups.

>o Elements and attributes whose names are specified by an expression
>use a syntax in XQuery that is not well-formed XML. In XSLT, this is
>done by xsl:element.

XQuery's element constructors are not well-formed XML only with attribute 
constructors that do not use the quoted syntax:

         <foo bar={ //baz } />

The old computed element constructors have been dropped. Note that you can 
express the above using quoted element values, with the same meaning:

         <foo bar="{ //baz }" />

We have an open issue asking whether we should drop one of the above syntaxes.

>o In XSLT, documents are parsed first by an XML parser. The XSLT
>processor operates on the infoset produced by the XML parser. This
>means that use of < and & are subject to the usual XML rules. In
>particular, in XSLT when < is used in a comparison operator it needs
>to be escaped as &lt;. In XQuery, parts of the query are processed
>using an XML-like parse, and parts are processed using an XPath-like
>parse; the XPath-like parsing does not operate on the results of the
>XML-like parsing. So you can use "<" as a comparison operator without
>escaping it. In XSLT, character references are recognized uniformly as
>in XSLT; in XQuery, they are recognized only in a few places.

In that last sentence, I assume that "as in XSLT" was intended to read "as 
in XML", right?

>I believe that many users will need to work with both XSLT and XQuery,
>and these numerous subtle differences will be extremely confusing to
>such users.

I do not think that users are confused to find that "foo[bar < 12]" is 
legal in XQuery. I think they would be more confused to find that they have 
to type "foo[bar &lt; 12]". Of course, if you put characters in an XML 
document, they behave as characters do in XML documents, so I am not being 
critical of XSLT here. I think it would also be quite reasonable to come up 
with a standard dummy element to wrap around an XQuery so that you *can* 
embed an XQuery in a document in a standard way that will be recognized by 
all XQuery processors.

  But requiring all queries to be in an XML document seems like overkill to me.

>I believe we should take the best aspects of the XSLT and
>XQuery element construction syntax and unify them into a single syntax
>to be used by both XSLT 2.0 and XQuery 1.0.

Why unify just one syntactic construct? Or is this part of completely 
unifying the two languages?

Here are some costs I am concerned about:

First, we have just spent an entire year working together to unify XPath 
and XQuery, and we are not yet done with that work. Unifying the remaining 
portion of XQuery and all of XSLT would certainly take a significant amount 
of time -- I would estimate it as at least an additional year. James' 
proposal is largely about unifying syntax, but in order to unify the syntax 
we would also have to unify the semantics at a very deep level, and we 
would have to take into account the reasons that the two languages evolved 
differently in the first place. We would also have to answer difficult 
questions about the extent to which we could sacrifice compatibility with 
existing XSLT standards.

Second, the two languages often have different syntax for similar but 
slightly different functionality. In fact, in the proposal, James gives a 
good example of this:

   if ($x = 1)
   then
      <foo/>
   else
      <bar/>

   <xsl:choose>
     <xsl:when test="$x = 1">
       <foo/>
     </xsl:when>
     <xsl:otherwise>
       <bar/>
     </xsl:otherwise>
   </xsl:choose>

If we unified the two languages, I assume we would not stop at element 
constructors, so we would want to add an element syntax for if-then-else, 
and a string syntax for choose-when-otherwise. Then again, we might decide 
to abandon one or the other of these syntaxes, but the debate about which 
to abandon would take some time - and we would have this debate for every 
single feature in which the two languages differ, or else wind up with a 
language that drops in a bunch of confusingly similar features from both 
languages.

Third, the basic approach to programming in XSLT is recursive descent, 
which is a fairly rare pattern of usage in XQuery. If XSLT continues to be 
programmed as it currently is, and XQuery continues to be programmed as it 
currently is, then we will have two distinct user communities using 
different subsets of the common language. Are we suggesting that one 
community or the other adopt the standard approaches of the other 
community? If so, do we have any indication that the other community is 
interested in doing so? If not, will we have a PL-1 situation, where people 
can both be programming in the same language, but can't read each other's code?

Here are the other benefits cited by James:

>Full XML is supported. It is not necessary to subset XML by removing 
>entity references, comments and processing instructions. Furthermore 
>parsing is consistent with XML. For example, XML has complex rules about 
>how whitespace in attribute values are normalized; at the moment XQuery 
>must choose between interpreting attribute values inconsistently with XML 
>and requiring all XQuery parsers to replicate XML's complex attribute 
>value normalization algorithm.

Comments and processing instructions are supported in XQuery. Entity 
references are not, except for the built-in character references of XML. If 
people need them, they can embed their queries in an XML document. That 
does not mean that everybody should be required to do so. People who do not 
need character references should not have to type "foo[bar &lt; 7]" when 
they mean "foo[bar < 7]".

The attribute normalization algorithm is easily written as a function. An 
XQuery implementation can call that function.

>Parsing XML is a non-trivial exercise. XML parsers are widely available 
>for all major platforms. Implementation is made easier if implementors can 
>use standard parsers rather than having to write their own nearly-XML parsers.
>
>XQuery/XSLT may need to embed fragments in other XML vocabularies. Other 
>XML vocabularies may need to embed XQuery. For example, XQuery might need 
>to embed fragments of XML Schema. XML Schema has annotations which might 
>use XQuery to express constraints. If XQuery is well-formed XML, this is 
>easy and causes no problems. If not, then every level of embedding adds an 
>extra layer of escaping; this quickly becomes totally unworkable.

I certainly agree that it must be possible to write XQuery, including 
element constructors, in a way that is well-formed XML. This is possible 
today. It does not require deeper integration with XSLT.

>Often users want to construct documents that have a lot of boilerplate XML 
>(typically XHTML) and often only a small amount of variable content that 
>depends on the query. For such documents, it is useful to be able to use 
>XML tools to edit and otherwise manipulate the boilerplate.

This feels like the most common of your scenarios. Again, I think that the 
real requirement is to be able to embed XQuery in well-formed XML.

>XML takes care of the specifying and detecting character encodings. If you 
>don't use XML, then you have to invent your own mechanisms for dealing 
>with this. This is a crucial I18N issue.
>
>XML provides a mechanism for referring to arbitrary Unicode characters. 
>XPath was designed to be embedded in XML documents, and so does not 
>provide its own syntax for character references, but instead expects to 
>rely on XML's mechanism. This all works fine for XSLT. However, with 
>XQuery's parsing model, character references are only recognized in 
>attribute values and literal element content; there's no way to put a 
>character reference in a literal string.

Again, I think all of the above criteria are met by any approach that 
allows XQuery to be embedded in an XML document. We need not require that 
all XQueries be embedded in XML documents.

So here's my take:

1. It really is important to be able to embed XQuery in an XML document. We 
should also come up with a trivial, but standard, way of doing this. It 
basically amounts to agreeing on the name of the wrapper element:

         <xq:yourNameHere xmlns:xq="http://www.w3.org/2002/XQuery">
                 //foo
         </xq:yourNameHere>

2. It may be very interesting to think about the potential benefits of 
integrating XSLT and XQuery. If there is a compelling benefit to 
integrating the languages, we should consider doing so after the release of 
XQuery 1.0.

Jonathan
Received on Friday, 4 January 2002 18:43:41 UTC