Re: Resolution of XSLT issue 99: Constructing Sequences in XSLT from Jeni Tennison on 2002-09-28 (public-qt-comments@w3.org from September 2002)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 28 Sep 2002 12:16:06 +0100
To: w3c-xsl-wg@w3.org
Cc: public-qt-comments@w3.org
Message-ID: <119159002924.20020928121606@jenitennison.com>
Mike Kay wrote:
> This note responds to an issue originally raised on 11 Jan 2002 by
> Jeni Tennison, which is archived at:
>
> http://lists.w3.org/Archives/Public/xsl-editors/2002JanMar/0050.html
>
> The XSL WG has considered the proposal on a number of occasions, and
> we have finally decided not to take this route.

Thanks for responding to my proposal; I assumed that it had been
discarded months ago! I can't say that I'm surprised that the WG has
decided not to take it forward, and I am sympathetic to the pragmatic
considerations that you describe.

Anyway, I didn't really want my first post as a member of the WG to be
like this, but I didn't get to argue my case beyond the initial
proposal and I guess that a position statement would probably be
useful for those that haven't read my posts elsewhere, and that this
is as good an opportunity as any to make one. I hope you'll permit me
this one indulgence before I start working with you on the important
goal of getting XPath and XSLT out the door.

---

> On the minus side, the boundary between XSLT and XPath becomes less
> clear-cut, and harder to explain. At present, XSLT instructions are
> concerned either with creating nodes and constructing trees, or with
> controlling the execution of other instructions (iteration,
> conditionals, calls). XSLT instructions at present can only return
> newly constructed nodes; XPath expressions never do so (except by
> making a function call to XSLT). This has the merit that the
> stylesheet tree always acts as a template for the construction of
> the result tree, which is a model that users find easy to
> understand.

I do not think that users find this model natural or easy to
understand. How many times do we see evidence that XSLT 1.0 users
think that:

  <xsl:variable name="foo">
    <xsl:value-of select="foo" />
  </xsl:variable>

sets the $foo variable to a string or to an element rather than a
result tree fragment? I think that the design of XPath+XSLT 1.0 may
make us *think* that users understand the distinctions between the
three because (due to the weak typing) it rarely matters when they
don't. I think that we'll find with XPath+XSLT 2.0 that they actually
don't understand the distinctions, and that they will be revealed to
be a lot more confused than we imagine.

Further, as I've argued before, I don't believe that "XSLT for
newly-constructed nodes, XPath for other things" actually works as a
distinction. Speaking for myself, I find it far more natural to make
the distinction in terms of the processes I can perform (e.g.
iterating, conditions, creation of nodes) than in terms of what I
perform those processes on. Switching between the two languages just
because I'm working with a different kind of data doesn't seem at all
natural to me as a user (though it may make perfect sense for an
implementer). Judging from the feedback that I've received after this
proposal and other posts, I do not think that I'm alone in feeling
this way.

One of the reasons that I don't think it works is that the main thing
that we do with XSLT stylesheets is process a source document. When we
process a source document using XSLT, we get a lot of help, especially
when dealing with recursive structures, from the ability to apply
templates. When we see someone doing something like:

  <xsl:for-each select="node()">
    <xsl:choose>
      <xsl:when test="self::strong">
        ...
      </xsl:when>
      <xsl:when test="self::emphasis">
        ...
      </xsl:when>
      ...
    </xsl:choose>
  </xsl:for-each>

we leap on it as being bad stylesheet design. And yet when we have to
process source trees using XPath 2.0 (because we want to generate
simple values, or collect existing nodes), we can't use the template
approach, so we're forced to use this ugly and hard-to-extend
construction.

Let me give you an example. This is a simplified version of a real
problem from a real project that I'm working on. In this project,
documents are of three kinds: data, metadata and collections. A
collection document points to multiple data or other collection
objects; each data document points to a metadata document. My task was
to collect all the unique <creator> elements from all the metadata of
all the referenced data documents to create some metadata for the
collection as a whole. The best way to approach this, I think, is to
collect together all the creator metadata and then use
distinct-values() to get the unique <creator> elements.

According to the current XPath/XSLT split, this is an XPath job
because it needs to return existing nodes. However, because the
elements need to be collected recursively, I have to use a recursive
function to do it:

<xsl:variable name="creators" select="my:creators(/)" />

<xsl:function name="my:creators">
  <xsl:param name="docs" select="/.." />
  <xsl:result select="
    for $d in $docs
    return if ($d/collection)
           then my:creators($d/collection/data/@xlink:href)
           else document($d/data/metadata/@xlink:href)/metadata/creator
    " />
</xsl:function>

Note that the if/then/else within the for loop here is exactly like
the choose/when/otherwise in the example above.

This can be compared to using the natural mode of operation in XSLT,
which is to use templates to traverse the source trees. With sequence
construction in XSLT, the solution would have been:

<xsl:variable name="creators">
  <xsl:apply-templates select="/" mode="creators" />
</xsl:variable>

<xsl:template match="collection" mode="creators">
  <xsl:apply-templates select="document(data/@xlink:href)"
                       mode="creators" />
</xsl:template>

<xsl:template match="data" mode="creators">
  <xsl:item
    select="document(metadata/@xlink:href)/metadata/creator" />
</xsl:template>

As well as being a more "natural" (to an XSLT user) solution to the
problem, if I now decide that I actually want to create some 'creator'
elements and return those new nodes rather than the existing ones
(perhaps because I want to alter them a little -- add an extra
attribute or something), this change is very easy to make in the
latter stylesheet, but requires total rewriting of the former.

> Under the proposal, it would not always be obvious when XSLT
> instructions are building a tree and when they are building a
> sequence. For example, consider:
>
> <xsl:template name="x">
>   <xsl:text>[</xsl:text>
>   <xsl:value-of select="$param"/>
>   <xsl:text>]</xsl:text>
> </xsl:template>
>
> If this is called while constructing a tree, it returns a single
> text node. If it could also be called while constructing a sequence,
> it would return a sequence of three text nodes. Users might easily
> change the implementation of the template (for example, to use
> concat()), without realising the consequences for some callers.

I thought that I'd expressed this in the original proposal, but
perhaps not. I'd made the assumption that the rule that says that text
nodes should be combined when added to a tree would be extended to say
that when you create a sequence of newly-created text nodes as in the
above, they are merged into a single text node. To create a sequence
of separate newly-created text nodes (a rare requirement, in my
opinion), you would use <xsl:item> as follows:

<xsl:template name="x">
  <xsl:item>[</xsl:item>
  <xsl:item>
    <xsl:value-of select="$param"/>
  </xsl:item>
  <xsl:item>]</xsl:item>
</xsl:template>

This resolves the cited problem, I think, but perhaps I'm missing
something.

> There are arguments in favour of keeping XPath as small as possible
> (and therefore putting functionality at the XSLT level wherever
> there is a choice), but there are also arguments the other way:
>
> (a) There are advantages both to implementors and to users in
> maximising the common subset that XSLT shares with XQuery, and this
> argument leads to such shared functionality sitting in XPath.

While I strongly agree that where there's shared functionality between
XQuery and XPath+XSLT there should be a shared data model and set of
operations and functions, I do not necessarily believe that this means
there should be a shared syntax. I do not think that either language
benefits from being shackled together so closely: XQuery ends up with
"line noise" that they dislike, XPath ends up with verbosity that XSLT
users dislike. Of course they should use shared syntax where it's
appropriate (for example location paths, operators and function
calls), but I think that there's a balance to be made here and that
the current extreme overlap will damage XSLT in the long run.

> (b) Functionality available in XPath (such as conditional
> expressions) is usable in contexts such as key definitions and sort
> keys, where otherwise a call-back from XPath to XSLT would be
> needed.

I'm not sure what you mean by a call-back from XPath to XSLT?
Presumably you mean that the user would have to write an extension
function in order to deal with it? If that's a problem, then I think
it's a problem in the current set-up as well. As I outlined in my
original proposal, and demonstrated in the above example, there are
lots of things that XPath cannot do all on its own; you're not going
to be able to support users doing everything in XPath unless you have
function definitions in XPath, and I *really* hope that ain't gonna
happen.

I should also make clear, in case it wasn't already, that I am fully
in favour of simple conditional expressions (e.g. condition ?
true-value : false-value) and simple mapping expressions (e.g.
sequence -> map-function) within XPath 2.0. I believe that these,
together with date-time data types and the new aggregation functions
(min(), max(), avg()) would handle well over 80% of the requirement
for functionality in XPath to support key and sort values. The
remainder can always be handled through user-defined functions.

> (c) Functionality implemented at the XPath level is probably easier
> to optimize, because within the context of an XPath expression,
> there are no side-effects, whereas XSLT instructions have the
> side-effect of writing nodes to the result tree.

I'll have to take your word for that. I don't understand, though, how
XQuery manages to deal with optimising around the side-effect of
writing to the result tree (which presumably it can do?) and yet XSLT
can't. I don't understand why:

  for $b in //books { <book>...</book> }

is any more optimisable than:

  <xsl:for-each select="//books"><book>...</book></xsl:for-each>

These seem to be slightly different syntaxes for roughly the same
thing; I don't see why they can't be interpreted in the same way.

Further, I have to say that I don't think that optimisability is as
high on my priorities for XPath/XSLT 2.0 as usability.

> (d) Functionality implemented at the XPath level is available in
> standalone XPath environments, for example XPath used within the
> context of XPointer or DOM. Since the XPath data model relies
> critically on sequences, some mechanism for constructing sequences
> is needed that is not dependent on XSLT.

Actually, I view this as an argument for *simplifying* XPath rather
than one for packing as much functionality as possible into it. I
think that it's much more likely that a Java programmer using DOM
would use Java methods to construct sequences of integers than that
they'd use XPath to do so. Adding functionality to XPath just makes it
harder to get XPath added to XPointer/DOM/etc. implementations. Even
with XPath 1.0, other standards that use XPath 1.0 (I'm thinking W3C
XML Schema and XForms in particular) have defined their own subsets
because they didn't want to inflict the burden of implementing all of
XPath on the implementers of their technologies. This will be even
more true with XPath 2.0.

> Finally, there are pragmatic considerations. Although it would
> almost certainly be possible to design a language along the lines of
> Jeni Tennison's proposal that met all the XSLT/XPath 2.0
> requirements, it would require a lot of rework and a lot of
> negotiation between the XSL and XQuery working groups. If the
> current language were badly broken, we would be justified in
> tackling this rework. But the current language works, and we want to
> get it finished. In fact, if we did embark on making the changes
> required by this proposal, there is a serious prospect that we would
> end up with a language in which we had added sequence construction
> to XSLT but failed to remove anything from XPath. We would thus have
> increased duplication between the languages instead of reducing it.

Oh well, since I'm writing this diatribe anyway I'll just say that no
one in the XSLT community will thank the XSL WG for bringing out a
standard quickly if that standard fails to recognise the requirements
of that community. Just because you can do all you need to do in a
language (it "works") does not mean that the language is well-designed
or easy to use. I do understand the desire to see XSLT get out soon,
and to not discard the time, effort and patience that the XSL WG has
put into the current design, but I also think that it's worth taking
the time to get it right. It will involve even more rework to have to
start again in however-many months time if it's rejected as a
Candidate or Proposed Recommendation.

---

Apologies for the rant. Now I've got it off my chest, I'm looking
forward to the real work. I hope that you'll bear with me as I get up
to speed on the latest drafts and discussion.

> (Welcome to the group, Jeni!)

Thanks Mike :)

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Saturday, 28 September 2002 07:23:28 UTC