RE: Resolution of XSLT issue 99: Constructing Sequences in XSLT from Kay, Michael on 2002-09-30 (public-qt-comments@w3.org from September 2002)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Mon, 30 Sep 2002 20:14:59 +0200
To: Jeni Tennison <jeni@jenitennison.com>, w3c-xsl-wg@w3.org
Cc: public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453DBE1@daemsg02.software-ag.de>
Thanks, Jeni

I'm not going to attempt a detailed counter-argument, for several reasons:

(a) I actually think the arguments are fairly finely balanced. If I tried to
put forward the "case for the prosecution" it would give the impression that
I thought your arguments were wrong, which isn't the case. We're stuck with
a two-language system, and I think any decisions on where to put the
boundary between the two languages are going to be good for some use cases
and bad for others. My view is that the boundary we have got in the current
specs is at least as good as any other, and gives at least some criteria we
can use to prevent creeping overlap of functionality between the languages.

(b) I want to focus my energy on the open issues that still need work. This
note arose from an attempt to close down as many open issues as we can, on
the basis that we can't keep talking for ever without making decisions.

Michael Kay

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@jenitennison.com] 
> Sent: 28 September 2002 12:16
> To: w3c-xsl-wg@w3.org
> Cc: public-qt-comments@w3.org
> Subject: Re: Resolution of XSLT issue 99: Constructing 
> Sequences in XSLT
> 
> 
> 
> Mike Kay wrote:
> > This note responds to an issue originally raised on 11 Jan 2002 by 
> > Jeni Tennison, which is archived at:
> >
> > http://lists.w3.org/Archives/Public/xsl-editors/2002JanMar/0050.html
> >
> > The XSL WG has considered the proposal on a number of 
> occasions, and 
> > we have finally decided not to take this route.
> 
> Thanks for responding to my proposal; I assumed that it had 
> been discarded months ago! I can't say that I'm surprised 
> that the WG has decided not to take it forward, and I am 
> sympathetic to the pragmatic considerations that you describe.
> 
> Anyway, I didn't really want my first post as a member of the 
> WG to be like this, but I didn't get to argue my case beyond 
> the initial proposal and I guess that a position statement 
> would probably be useful for those that haven't read my posts 
> elsewhere, and that this is as good an opportunity as any to 
> make one. I hope you'll permit me this one indulgence before 
> I start working with you on the important goal of getting 
> XPath and XSLT out the door.
> 
> ---
> 
> > On the minus side, the boundary between XSLT and XPath becomes less 
> > clear-cut, and harder to explain. At present, XSLT instructions are 
> > concerned either with creating nodes and constructing 
> trees, or with 
> > controlling the execution of other instructions (iteration, 
> > conditionals, calls). XSLT instructions at present can only return 
> > newly constructed nodes; XPath expressions never do so (except by 
> > making a function call to XSLT). This has the merit that the 
> > stylesheet tree always acts as a template for the 
> construction of the 
> > result tree, which is a model that users find easy to understand.
> 
> I do not think that users find this model natural or easy to 
> understand. How many times do we see evidence that XSLT 1.0 
> users think that:
> 
>   <xsl:variable name="foo">
>     <xsl:value-of select="foo" />
>   </xsl:variable>
> 
> sets the $foo variable to a string or to an element rather 
> than a result tree fragment? I think that the design of 
> XPath+XSLT 1.0 may make us *think* that users understand the 
> distinctions between the three because (due to the weak 
> typing) it rarely matters when they don't. I think that we'll 
> find with XPath+XSLT 2.0 that they actually don't understand 
> the distinctions, and that they will be revealed to be a lot 
> more confused than we imagine.
> 
> Further, as I've argued before, I don't believe that "XSLT 
> for newly-constructed nodes, XPath for other things" actually 
> works as a distinction. Speaking for myself, I find it far 
> more natural to make the distinction in terms of the 
> processes I can perform (e.g. iterating, conditions, creation 
> of nodes) than in terms of what I perform those processes on. 
> Switching between the two languages just because I'm working 
> with a different kind of data doesn't seem at all natural to 
> me as a user (though it may make perfect sense for an 
> implementer). Judging from the feedback that I've received 
> after this proposal and other posts, I do not think that I'm 
> alone in feeling this way.
> 
> One of the reasons that I don't think it works is that the 
> main thing that we do with XSLT stylesheets is process a 
> source document. When we process a source document using 
> XSLT, we get a lot of help, especially when dealing with 
> recursive structures, from the ability to apply templates. 
> When we see someone doing something like:
> 
>   <xsl:for-each select="node()">
>     <xsl:choose>
>       <xsl:when test="self::strong">
>         ...
>       </xsl:when>
>       <xsl:when test="self::emphasis">
>         ...
>       </xsl:when>
>       ...
>     </xsl:choose>
>   </xsl:for-each>
> 
> we leap on it as being bad stylesheet design. And yet when we 
> have to process source trees using XPath 2.0 (because we want 
> to generate simple values, or collect existing nodes), we 
> can't use the template approach, so we're forced to use this 
> ugly and hard-to-extend construction.
> 
> Let me give you an example. This is a simplified version of a 
> real problem from a real project that I'm working on. In this 
> project, documents are of three kinds: data, metadata and 
> collections. A collection document points to multiple data or 
> other collection objects; each data document points to a 
> metadata document. My task was to collect all the unique 
> <creator> elements from all the metadata of all the 
> referenced data documents to create some metadata for the 
> collection as a whole. The best way to approach this, I 
> think, is to collect together all the creator metadata and then use
> distinct-values() to get the unique <creator> elements.
> 
> According to the current XPath/XSLT split, this is an XPath 
> job because it needs to return existing nodes. However, 
> because the elements need to be collected recursively, I have 
> to use a recursive function to do it:
> 
> <xsl:variable name="creators" select="my:creators(/)" />
> 
> <xsl:function name="my:creators">
>   <xsl:param name="docs" select="/.." />
>   <xsl:result select="
>     for $d in $docs
>     return if ($d/collection)
>            then my:creators($d/collection/data/@xlink:href)
>            else 
> document($d/data/metadata/@xlink:href)/metadata/creator
>     " />
> </xsl:function>
> 
> Note that the if/then/else within the for loop here is 
> exactly like the choose/when/otherwise in the example above.
> 
> This can be compared to using the natural mode of operation 
> in XSLT, which is to use templates to traverse the source 
> trees. With sequence construction in XSLT, the solution would 
> have been:
> 
> <xsl:variable name="creators">
>   <xsl:apply-templates select="/" mode="creators" /> </xsl:variable>
> 
> <xsl:template match="collection" mode="creators">
>   <xsl:apply-templates select="document(data/@xlink:href)"
>                        mode="creators" />
> </xsl:template>
> 
> <xsl:template match="data" mode="creators">
>   <xsl:item
>     select="document(metadata/@xlink:href)/metadata/creator" 
> /> </xsl:template>
> 
> As well as being a more "natural" (to an XSLT user) solution 
> to the problem, if I now decide that I actually want to 
> create some 'creator' elements and return those new nodes 
> rather than the existing ones (perhaps because I want to 
> alter them a little -- add an extra attribute or something), 
> this change is very easy to make in the latter stylesheet, 
> but requires total rewriting of the former.
> 
> > Under the proposal, it would not always be obvious when XSLT 
> > instructions are building a tree and when they are building a 
> > sequence. For example, consider:
> >
> > <xsl:template name="x">
> >   <xsl:text>[</xsl:text>
> >   <xsl:value-of select="$param"/>
> >   <xsl:text>]</xsl:text>
> > </xsl:template>
> >
> > If this is called while constructing a tree, it returns a 
> single text 
> > node. If it could also be called while constructing a sequence, it 
> > would return a sequence of three text nodes. Users might 
> easily change 
> > the implementation of the template (for example, to use concat()), 
> > without realising the consequences for some callers.
> 
> I thought that I'd expressed this in the original proposal, 
> but perhaps not. I'd made the assumption that the rule that 
> says that text nodes should be combined when added to a tree 
> would be extended to say that when you create a sequence of 
> newly-created text nodes as in the above, they are merged 
> into a single text node. To create a sequence of separate 
> newly-created text nodes (a rare requirement, in my opinion), 
> you would use <xsl:item> as follows:
> 
> <xsl:template name="x">
>   <xsl:item>[</xsl:item>
>   <xsl:item>
>     <xsl:value-of select="$param"/>
>   </xsl:item>
>   <xsl:item>]</xsl:item>
> </xsl:template>
> 
> This resolves the cited problem, I think, but perhaps I'm 
> missing something.
> 
> > There are arguments in favour of keeping XPath as small as possible 
> > (and therefore putting functionality at the XSLT level 
> wherever there 
> > is a choice), but there are also arguments the other way:
> >
> > (a) There are advantages both to implementors and to users in 
> > maximising the common subset that XSLT shares with XQuery, and this 
> > argument leads to such shared functionality sitting in XPath.
> 
> While I strongly agree that where there's shared 
> functionality between XQuery and XPath+XSLT there should be a 
> shared data model and set of operations and functions, I do 
> not necessarily believe that this means there should be a 
> shared syntax. I do not think that either language benefits 
> from being shackled together so closely: XQuery ends up with 
> "line noise" that they dislike, XPath ends up with verbosity 
> that XSLT users dislike. Of course they should use shared 
> syntax where it's appropriate (for example location paths, 
> operators and function calls), but I think that there's a 
> balance to be made here and that the current extreme overlap 
> will damage XSLT in the long run.
> 
> > (b) Functionality available in XPath (such as conditional
> > expressions) is usable in contexts such as key definitions and sort 
> > keys, where otherwise a call-back from XPath to XSLT would 
> be needed.
> 
> I'm not sure what you mean by a call-back from XPath to XSLT? 
> Presumably you mean that the user would have to write an 
> extension function in order to deal with it? If that's a 
> problem, then I think it's a problem in the current set-up as 
> well. As I outlined in my original proposal, and demonstrated 
> in the above example, there are lots of things that XPath 
> cannot do all on its own; you're not going to be able to 
> support users doing everything in XPath unless you have 
> function definitions in XPath, and I *really* hope that ain't 
> gonna happen.
> 
> I should also make clear, in case it wasn't already, that I 
> am fully in favour of simple conditional expressions (e.g. 
> condition ? true-value : false-value) and simple mapping 
> expressions (e.g. sequence -> map-function) within XPath 2.0. 
> I believe that these, together with date-time data types and 
> the new aggregation functions (min(), max(), avg()) would 
> handle well over 80% of the requirement for functionality in 
> XPath to support key and sort values. The remainder can 
> always be handled through user-defined functions.
> 
> > (c) Functionality implemented at the XPath level is 
> probably easier to 
> > optimize, because within the context of an XPath 
> expression, there are 
> > no side-effects, whereas XSLT instructions have the side-effect of 
> > writing nodes to the result tree.
> 
> I'll have to take your word for that. I don't understand, 
> though, how XQuery manages to deal with optimising around the 
> side-effect of writing to the result tree (which presumably 
> it can do?) and yet XSLT can't. I don't understand why:
> 
>   for $b in //books { <book>...</book> }
> 
> is any more optimisable than:
> 
>   <xsl:for-each select="//books"><book>...</book></xsl:for-each>
> 
> These seem to be slightly different syntaxes for roughly the 
> same thing; I don't see why they can't be interpreted in the same way.
> 
> Further, I have to say that I don't think that optimisability 
> is as high on my priorities for XPath/XSLT 2.0 as usability.
> 
> > (d) Functionality implemented at the XPath level is available in 
> > standalone XPath environments, for example XPath used within the 
> > context of XPointer or DOM. Since the XPath data model relies 
> > critically on sequences, some mechanism for constructing 
> sequences is 
> > needed that is not dependent on XSLT.
> 
> Actually, I view this as an argument for *simplifying* XPath 
> rather than one for packing as much functionality as possible 
> into it. I think that it's much more likely that a Java 
> programmer using DOM would use Java methods to construct 
> sequences of integers than that they'd use XPath to do so. 
> Adding functionality to XPath just makes it harder to get 
> XPath added to XPointer/DOM/etc. implementations. Even with 
> XPath 1.0, other standards that use XPath 1.0 (I'm thinking 
> W3C XML Schema and XForms in particular) have defined their 
> own subsets because they didn't want to inflict the burden of 
> implementing all of XPath on the implementers of their 
> technologies. This will be even more true with XPath 2.0.
> 
> > Finally, there are pragmatic considerations. Although it 
> would almost 
> > certainly be possible to design a language along the lines of Jeni 
> > Tennison's proposal that met all the XSLT/XPath 2.0 
> requirements, it 
> > would require a lot of rework and a lot of negotiation 
> between the XSL 
> > and XQuery working groups. If the current language were 
> badly broken, 
> > we would be justified in tackling this rework. But the current 
> > language works, and we want to get it finished. In fact, if we did 
> > embark on making the changes required by this proposal, there is a 
> > serious prospect that we would end up with a language in 
> which we had 
> > added sequence construction to XSLT but failed to remove 
> anything from 
> > XPath. We would thus have increased duplication between the 
> languages 
> > instead of reducing it.
> 
> Oh well, since I'm writing this diatribe anyway I'll just say 
> that no one in the XSLT community will thank the XSL WG for 
> bringing out a standard quickly if that standard fails to 
> recognise the requirements of that community. Just because 
> you can do all you need to do in a language (it "works") does 
> not mean that the language is well-designed or easy to use. I 
> do understand the desire to see XSLT get out soon, and to not 
> discard the time, effort and patience that the XSL WG has put 
> into the current design, but I also think that it's worth 
> taking the time to get it right. It will involve even more 
> rework to have to start again in however-many months time if 
> it's rejected as a Candidate or Proposed Recommendation.
> 
> ---
> 
> Apologies for the rant. Now I've got it off my chest, I'm 
> looking forward to the real work. I hope that you'll bear 
> with me as I get up to speed on the latest drafts and discussion.
> 
> > (Welcome to the group, Jeni!)
> 
> Thanks Mike :)
> 
> Cheers,
> 
> Jeni
> 
> ---
> Jeni Tennison
> http://www.jenitennison.com/
>
Received on Monday, 30 September 2002 14:15:05 UTC