RE: Constructing Sequences in XSLT

Many thanks for submitting this, Jeni. As I mentioned to you, it describes
an approach that has been discussed before by the WG, and it proposes
interesting solutions to some of the difficulties we identified. I'm hoping
we will find time on our agenda to discuss it thoroughly when we next meet
later this month.

Mike Kay

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@jenitennison.com]
> Sent: 11 January 2002 09:31
> To: xsl-editors@w3.org
> Subject: Constructing Sequences in XSLT
> 
> 
> Hi,
> 
> Following is a proposal for constructing sequences in XSLT rather than
> XPath, for your consideration. Let me know if anything is unclear.
> 
> [It differs slightly from the draft posted on XSL-List, mainly in
>  sections 6, 7, 8 and 9.]
> 
> Cheers,
> 
> Jeni
> 
> ----
> 
> Executive summary
> -----------------
> 
> Rather than XPath being continuously extended to allow it to do what
> XSLT can already do, XSLT should be modified to support the thing that
> it can't already do: sequence construction. This could be achieved by
> amending the definition of content constructors in XSLT 2.0 and
> introducing a new xsl:item instruction. This change would make XSLT
> more consistent and more usable.
> 
> 
> Contents
> --------
> 
> 1.  Requirement
> 2.  Sequence constructors
> 3.  Producing simple typed values and existing nodes
> 4.  Impact on XPath
> 5.  Impact on function definitions
> 6.  Impact on variable bindings
> 7.  Parentless (documentless) nodes
> 8.  Impact on result tree generation
> 9.  Creating node trees
> 10. Conclusions
> 11. References
> 
> 
> Requirement
> -----------
> 
> Recently, David C. posted a message to www-xpath-comments@w3.org that
> described how XPath is restricted by the lack of a general
> variable-binding expression (let clause) [1].
> 
> I think that the lack of a let clause restricts what's practical in
> XPath (even if it doesn't affect what's theoretically possible). For
> example, with the for expression, you have to reconstruct any sequence
> that you create within the for expression each time you use it, which
> probably isn't particularly efficient and leads to maintenance
> headaches. For example:
> 
>   for $o in $orders
>   return if (count($o/item[(@price * @quantity) > 100]) > 5)
>          then do:something($o/item[(@price * @quantity) > 100])
>          else do:something-else($o/item[(@price * @quantity) > 100])
> 
> The way around this is with functions, because then you can use
> xsl:variable to assign the variable:
> 
>   for $o in $orders
>   return do:process-items($o)
> 
> and:
> 
> <xsl:function name="do:process-items">
>   <xsl:param name="order" />
>   <xsl:variable name="items"
>                 select="$order/item[(@price * @quantity) > 100]" />
>   <xsl:result select="if (count($items) > 5)
>                       then do:something($items)
>                       else do:something-else($items)" />
> </xsl:function>
> 
> but it's hardly ideal.
> 
> The same kind of problem occurs within an if expression within a for
> expression, when certain variables are relevant within one branch of
> the if and not in the other. For example:
> 
>   if ($string and $keyword)
>   then if ((starts-with($string, $keyword) or
>             ends-with(substring-before($string, $keyword), ' ')) and
>            (not(substring-after($string, $keyword)) or
>             starts-with(substring-after($string, $keyword), ' ')))
>        then (substring-before($string, $keyword),
>              $keyword,
>              substring-after($string, $keyword))
>        else $string
>   else ()
> 
> which could be managed with:
> 
>   if ($string and $keyword)
>   then (for $before in substring-before($string, $keyword),
>             $after  in substring-after($string, $keyword)
>         return if ((not($before) or ends-with($before, ' ')) and
>                    (not($after) or starts-with($after, ' ')))
>                then ($before, $keyword, $after)
>                else $string
>   else ()
> 
> but which would be much clearer (and more accurate, since you're not
> really iterating) as:
> 
>   if ($string and $keyword)
>   then (let $before := substring-before($string, $keyword),
>             $after  := substring-after($string, $keyword)
>         if ((not($before) or ends-with($before, ' ')) and
>             (not($after) or starts-with($after, ' ')))
>         then ($before, $keyword, $after)
>         else $string
>   else ()
> 
> Again, you could create a function to do the testing, but if we have
> to generate new functions every time we want to bind variables, we're
> going to have them coming out of our ears.
> 
> It's certainly true that you could add a let clause to XPath; you
> could also add a where clause... and a sortby clause... and
> typeswitches... and even element constructors... but what you end up
> with is a replication of all the facilities of XSLT, but using a
> non-XML syntax, and stuffed inside XML attributes.
> 
> 
> Sequence constructors
> --------------------
> 
> So I'd like to suggest an alternative. Instead of modifying XPath so
> that it can do all the things that XSLT can do plus construct
> sequences, why not modify XSLT so that it can construct general
> sequences rather than just node sequences?
> 
> Doing this is (I *think*) simpler than it sounds. In XSLT 2.0,
> "content constructors" are defined as [2]:
> 
>   "a sequence of nodes in the stylesheet that, when evaluated,
>    constructs and returns a sequence of new nodes suitable for adding
>    to the result tree. This sequence is referred to below as the
>    result sequence."
> 
> If we modify that definition, so that "content constructors" don't
> necessarily return *nodes* (they should probably then be called
> "sequence constructors"):
> 
>    a sequence of nodes in the stylesheet that, when evaluated,
>    constructs and returns a sequence. This sequence is referred to
>    below as the result sequence.
> 
> We can amend the description of XSLT instructions in line with this:
> 
> XSLT instructions then produce a sequence of zero, one, or more items
> as their result. These items are added to the result sequence. Some
> instructions, such as xsl:element, return a newly-constructed node
> (which may have its own attributes, namespaces, children, and other
> descendants); others, such as xsl:if, return items produced by their
> own nested sequence constructors.
> 
> [There are a couple of incompatibility problems here that I think can
>  be handled; I'll come on to those later.]
> 
> 
> Producing simple typed values and existing nodes
> ------------------------------------------------
>  
> All we need now is an element that can add a simple typed value or an
> existing node to the result sequence. This could be achieved with an
> xsl:item element:
> 
>   <!-- Category: instruction -->
>   <xsl:item
>     select = expression
>     type = datatype>
>     <!-- Content: sequence-constructor -->
>   </xsl:item>
> 
> The xsl:item element works similarly to variable-binding elements: it
> produces a sequence of items from either its select attribute or its
> content. This enables you to add simple typed values or existing nodes
> to a sequence.
> 
> For example, the equivalent to the for expression that we looked at
> earlier would be:
> 
>   <xsl:variable name="new-orders" type="item*">
>     <xsl:for-each select="$orders">
>       <xsl:variable name="items"
>                     select="item[(@price * @quantity) > 100]" />
>       <xsl:item select="if (count($items) > 5)
>                         then do:something($items)
>                         else do:something-else($items)" />
>     </xsl:for-each>
>   </xsl:variable>
> 
> The $new-orders variable would have a value of a sequence of items.
> 
> 
> Impact on XPath
> ---------------
> 
> Enabling XSLT to generate sequences will remove the requirement for
> XPath to support expressions that involve range variables. For
> example:
> 
>   <xsl:variable name="join" type="xs:integer*"
>                 select="for $i in (1, 2),
>                             $j in (3, 4)
>                         return ($i, $j)" />
> 
> could be done with:
> 
>   <xsl:variable name="join" type="xs:integer*">
>     <xsl:for-each select="(1, 2)">
>       <xsl:variable name="i" select="." />
>       <xsl:for-each select="(3, 4)">
>         <xsl:variable name="j" select="." />
>         <xsl:item select="($i, $j)" />
>       </xsl:for-each>
>     </xsl:for-each>
>   </xsl:variable>
> 
> [Of course a syntax for simple mapping would still be useful for
>  when you just need to convert one sequence into another.]
>   
> This change would also remove the requirement for the sort() function
> (from XSLT, and indeed named sort specifications altogether) or the
> adoption of the sortby clause from XQuery, since the existing xsl:sort
> can be used.
> 
> For example, instead of:
> 
>   <xsl:sort-key name="subtotal-sort">
>     <xsl:sort select="@price * @quantity" data-type="number"
>               order="descending" />
>     <xsl:sort select="@part-id" order="ascending" />
>   </xsl:sort-key>
>   <xsl:variable name="sorted-items"
>                 select="sort($items, 'subtotal-sort')" />
> 
> you could do:
> 
>   <xsl:variable name="sorted-items">
>     <xsl:for-each select="$items">
>       <xsl:sort select="@price * @quantity" data-type="number"
>                 order="descending" />
>       <xsl:sort select="@part-id" order="ascending" />
>       <xsl:item select="." />
>     </xsl:for-each>
>   </xsl:variable>
> 
> 
> Impact on function definitions
> ------------------------------
> 
> Adding the xsl:item element allows us to get rid of the xsl:result
> element when defining functions. The xsl:function element's new syntax
> would be:
> 
> <xsl:function
>   name = qname
>   type = datatype>
>   <!-- Content: (xsl:param*, sequence-constructor) -->
> </xsl:function>
> 
> The xsl:function element would simply return the sequence produced by
> its content constructor.
> 
> For example:
> 
>   <xsl:function name="my:split-string" type="xs:string*">
>     <xsl:param name="string" type="xs:string" />
>     <xsl:param name="keyword" type="xs:string" />
>     <xsl:if test="$string and $keyword">
>       <xsl:variable name="before"
>                     select="substring-before($string, $keyword)" />
>       <xsl:variable name="after"
>                     select="substring-after($string, $keyword)" />
>       <xsl:item select="if (not($before) or 
> ends-with($before, ' ')) and
>                            (not($after) or starts-with($after, ' '))
>                         then ($before, $keyword, $after)
>                         else $string" />
>     </xsl:if>
>   </xsl:function>
> 
> 
> Impact on variable bindings
> ---------------------------
> 
> The current XSLT 2.0 WD states:
> 
>   "[ERR030] Elements such as xsl:variable, xsl:param, xsl:message,
>    and xsl:result-document construct a new document node, and use the
>    result sequence returned by the content constructor to form the
>    children of this document node. In this case it is an dynamic error
>    if the result sequence contains namespace or attribute nodes. The
>    processor must either signal the error, or must recover by ignoring
>    the offending nodes. The elements, comments, processing
>    instructions, and text nodes in the node sequence form the children
>    of the newly constructed document node."
> 
> I'll concentrate on variable-binding elements here (xsl:message and
> xsl:result-document are discussed in the next section).
> 
> Supporting the creation of sequences means that rather than create a
> new document node, variable-binding elements must bind the variable to
> the result sequence produced by their sequence constructor. This
> sequence must be able to contain all kinds of nodes.
> 
> There is a backwards incompatibility here - if a variable is assigned
> a value through the content of the variable-binding element, then
> rather than conceptually holding the "root node of the result tree
> fragment" as in XSLT 1.0, the variable holds a sequence of items
> (nodes, assuming you're using the variable as in XSLT 1.0).
> 
> Currently, when users get the string value of a result tree fragment,
> they get the string value of the *root node* of the result tree
> fragment - the concatenation of the string values of the text node
> descendants in the result tree fragment.
> 
> On the other hand, when users get the string value of a sequence, they
> get the string value of the first item in the sequence.
> 
> Therefore if you have:
> 
>   <xsl:variable name="foo">
>     <element>A</element>
>     <element>B</element>
>   </xsl:variable>
> 
> then string($foo) will give "AB" in XSLT 1.0 and just "A" in XSLT 2.0
> (if sequence constructors were supported).
> 
> [I don't think that people get the string values of result tree
>  fragments that contain elements very often but it's 
> sometimes useful.]
> 
> Another difference applies if people are used to using node-set()
> extension functions to convert variables to node sets. As there is no
> document node, addressing the items in the sequence does not involve
> stepping down to them.
> 
> For example, given the above definition of $foo, the equivalent of the
> following in XSLT 1.0:
> 
>   <xsl:for-each select="exsl:node-set($foo)/element">
>     ...
>   </xsl:for-each>
> 
> is simply:
> 
>   <xsl:for-each select="$foo">
>     ...
>   </xsl:for-each>
> 
> [There's an argument that XSLT 2.0 shouldn't have to worry about
>  backwards compatibility with extension functions, but the node-set()
>  extension function is very widely used and is based on the
>  description of result tree fragments from XSLT 1.0.]
>  
> These backwards compatibility issues could be resolved by having the
> type attribute on the variable-binding element determine the behaviour
> of the variable-binding element. If the type attribute is not present,
> or if the type attribute indicates that the variable should contain a
> single document node, then the variable-binding element creates a
> result tree (as described later), and the variable is bound to a new
> document node; otherwise, the variable is bound to the sequence.
> 
> [This is similar to the role played by the separator attribute on
>  xsl:value-of.]
> 
> 
> Parentless (Documentless) nodes
> -------------------------------
> 
> Section 3.1 of the XSLT 2.0 WD [3] states:
> 
>   "The data model defined in [Data Model] allows a node to be part of
>    a tree whose root is a node other than a document node.
> 
>   "Although such nodes may exist transiently during the course of XSLT
>    processing, every node that is processed by an XSLT stylesheet
>    (that is, a node that may be returned in the result of an
>    expression) will belong to a tree whose root is a document node."
> 
> Under the scheme described above, this would no longer be true. It
> would be possible to create sequences containing nodes that 
> do not have
> a parent.
> 
> I think that it would sometimes be handy to allow documentless nodes
> to be generated by a sequence constructor, for example to dynamically
> create a set of attributes that can then be added to several different
> elements.
> 
> [Currently in XSLT you have to do this by creating the attributes on a
>  dummy element; attribute sets don't help if an attribute should only
>  be present under certain circumstances.]
> 
> However, it may create problems with parentless attributes nodes,
> since they cannot gain access to namespace nodes through their parent
> element. I think that this is sufficiently rare that it's not
> particularly worrisome; in the worst case, it could be an error to
> have a sequence contain parentless attribute nodes.
> 
> Note that if the suggestion for retaining backwards compatibility with
> variable-binding elements is used, then if XSLT 2.0 is used like XSLT
> 1.0 (i.e. without type attributes on variable-binding elements, and
> without user-defined functions) it is still true that every node that
> may be returned in the result of an expression will belong to a tree
> whose root is a document node.
> 
> 
> Impact on result tree generation
> --------------------------------
> 
> Document nodes are generated automatically in four places in XSLT 2.0
> as defined:
> 
>   - within variable-binding elements
>   - within xsl:message
>   - within xsl:result-document
>   - within the stylesheet as a whole
> 
> The sequence generated from the content constructor forms the children
> of the document node. With xsl:result-document, the href attribute
> gives instructions about where that document should go. The
> destination and format of the document node generated by the
> stylesheet as a whole can be indicated by xsl:destination, or
> implicit. The other document nodes (from variable-binding elements and
> xsl:message) don't have an explicit destination - I'll call these
> anonymous documents.
> 
> If we generalise to sequence constructors, the role of
> xsl:result-document is similar to that of xsl:element - it creates a
> node and uses its sequence constructor to create the content of that
> node. If you view it like this, I think that xsl:document is the more
> appropriate name (because it ties in with xsl:element etc.). I also
> think that you should be able to explicitly create anonymous
> documents.
> 
> Assuming that anonymous documents could be created explicitly using
> xsl:document, The handling of an anonymous document created in this
> way depends on where the sequence containing the anonymous document is
> produced:
> 
>   - if it's produced from the content of a variable-binding element,
>     then the variable is bound to that document node (actually the
>     sequence that includes that document node, since feasibly other
>     document nodes could be generated as well)
> 
>   - if it's produced from the content of an xsl:message, then the
>     document is written to an implementation-defined destination for
>     error messages (e.g. stderr)
> 
>   - if it's produced from the stylesheet as a whole, then the document
>     is written to an implementation-defined destination for the result
>     of the transformation (e.g. stdout) or the destination indicated
>     by the xsl:destination element.
> 
> Note that it should always be a dynamic error if there's more than one
> anonymous document in a sequence.
> 
> For backwards compatability with XSLT 1.0 (and for convenience), if
> the result sequence consists of documentless nodes, an anonymous
> document should be implicitly created in certain circumstances:
> 
>   - by variable-binding elements, if they don't have a type attribute
>     or have a type attribute with the value "document" (or whatever
>     DataType expression is used to indicate a document node)
> 
>   - by xsl:message
> 
>   - by the stylesheet as a whole
> 
> Allowing xsl:document would enable you to create sequences that
> contained several document nodes (with an error if any of those
> document nodes had the same destination). It would also potentially
> allow you to create sequences that mixed new document nodes and other
> items.
> 
> This could be an error, such that sequences should either consist
> entirely of document nodes (with different destinations), or consist
> entirely of documentless nodes. If it was an error, and you wanted to
> generate multiple documents, you'd need to use xsl:document to create
> the main document as well as the secondary ones.
> 
> Also, you wouldn't be able to construct a document node while you were
> in the middle of constructing another document. This is a very
> different model from the 'tree of documents' approach of the current
> XSLT 2.0 WD, the XSLT 1.1 WD and most extension elements. I'm not sure
> whether this restriction makes it impractical (or any more impractical
> than the current restriction that you can't create a secondary result
> document within a variable).
> 
> It could also mean additional processing because, for example, you
> couldn't run through a bunch of nodes, creating a secondary result
> document for each node at the same time as creating a link in the main
> result document. You'd have to run through the same set of nodes twice
> in order to create the two different bits of content.
> 
> On the other hand, that restriction (you can only do one thing at a
> time) is true elsewhere in XSLT, so why shouldn't it be true when it
> comes to creating documents?
> 
> Alternatively, it could be permitted to mix document nodes and other
> nodes (after all, it should be allowed to mix document nodes from the
> source documents with other nodes in node sequences). This would make
> node-tree construction (see below) a little more complex, but I think
> it could be handled.
> 
> 
> Creating node trees
> -------------------
> 
> This final issue is about how to create content to be added to other
> nodes from a sequence. This applies to the construction of the content
> of element nodes and document nodes (as described above). It also
> applies, slightly differently, to the construction of comment,
> attribute, processing instruction, text and namespace nodes (which
> I'll call simple nodes so that I don't have to repeat their names
> constantly).
> 
> Currently, content constructors construct a sequence of nodes, and
> this sequence of nodes can be made into a node tree by adding a parent
> node, or converted to a string to be used as the value of a simple
> node. Under certain circumstances, the presence of certain types of
> nodes in the node sequence is a recoverable dynamic error (e.g.
> attribute nodes when creating a document; element nodes when getting
> the string value for an attribute).
> 
> If we had the more general sequence constructors, result trees would
> need to be constructed from sequences containing any mixture of simple
> typed values and nodes (both newly created (rootless) and pre-existing
> (rooted)), rather than those containing just newly created nodes.
> 
> In fact, this is exactly the same issue as that faced by xsl:copy-of
> (which also has to cope with sequences containing a mixture of types
> of items in order to create a sequence of (new) nodes). The only
> difference is that under the proposals above, the sequence could
> contain documentless nodes and (potentially) document nodes.
> 
> In some cases, documentless nodes may be added to the node tree simply
> by giving them a parent. However, this cannot be done all the time
> since a variable may still hold a reference to the node; giving it a
> parent would change the result of counting its ancestors, for example.
> In addition, the documentless node might be added to two different
> parents, which would cause problems.
> 
> The options, I think, are:
> 
>   - copy documentless nodes (as you do with nodes that have documents)
> 
>   - make it an error for a variable to hold a sequence of documentless
>     nodes (in most cases such sequences will be automatically
>     converted to a document node whose content is that sequence)
> 
> Since I think that sequences of documentless nodes could be useful, I
> favour the first option.
> 
> Document nodes are more tricky. If they are allowed in these
> situations at all, then I think there needs to be some way of
> 'bubbling up' document nodes so that in the end you get a sequence of
> document nodes.  For example, the result of:
> 
>   <xsl:element name="foo">
>     <xsl:document><xsl:call-template name="bar" /></xsl:document>
>   </xsl:element>
> 
> would actually be a sequence containing the foo element node followed
> by the document node, the equivalent of:
> 
>   <xsl:element name="foo" />
>   <xsl:document><xsl:call-template name="bar" /></xsl:document>
> 
> 
> Conclusions
> -----------
> 
> If XPath were extended to be a usable method of generating sequences,
> it would end up replicating the variable assignment and flow control
> features that are already available within XSLT. While there is an
> argument for constructing a language that performs transformations
> without using XML syntax, that niche is already filled by XQuery. In
> addition, because XPaths are used within attributes in XSLT, XSLT with
> extended XPath will become a lot harder to read, write, and maintain
> than the equivalent XSLT instructions.
> 
> Extending the concept of 'content constructors' to more general
> 'sequence constructors' and introducing an xsl:item element to add
> simple typed values and pre-existing nodes to this sequence gives XSLT
> the power to construct sequences of all descriptions. Rather than
> learning one language for constructing sequences of nodes and a
> different language with similar constructs for constructing other
> sequences, users will only have to learn one, unified, language.
> 
> 
> References
> ----------
> 
> [1] 
> http://lists.w3.org/Archives/Public/www-xpath-comments/2002Jan
Mar/0026.html
[2] http://www.w3.org/TR/xslt20/#dt-content-constructor
[3] http://www.w3.org/TR/xslt20/#rootless-nodes

Received on Friday, 11 January 2002 05:31:01 UTC