Grounded consuming sequences, and the last() function

There’s been a fair bit of recent discussion about the last() function in conjunction with input that is grounded but consuming, and I thought it might be useful to try and summarize. The discussion spans several different bug reports (29120, 29153, 29161) so I thought a little bit of consolidation might be useful.

Our general principle has been that you can do anything you like with a grounded node; in particular, when you use copy-of() or snapshot() to copy a node, you can navigate freely within that copy. Equally, we generally allow you to do anything you like with a sequence of grounded nodes, even if the expression that created this sequence is consuming.

We make no distinction between 

copy-of(/a/b/c)

and

/a/b/c/copy-of()

even though one expression at first sight appears to create a copy of a sequence, and the other to create a sequence of copies. The two expressions produce the same result and have the same streamability rules.

We therefore allow you to do things with such a grounded sequence that might involve putting the whole sequence in memory. For example, you can put such a sequence in a variable:

<xsl:variable name=“v” select=“/a/b/c/copy-of()”/>

or you can use reverse():

for $x in reverse(/a/b/c/copy-of()) …

or you can sort, or you can use last().

In all these cases, if used carelessly, the effect of streaming may be defeated. This means:

(a) the fact that the stylesheet is guaranteed streamable under our rules means there is no guarantee that it will run in finite memory

(b) achieving streamability in practice, particularly with “windowed streaming” coding patterns, puts a burden on the implementor to use pipelined evaluation of grounded sequences wherever possible, for example there is a clear implication that a construct such as

<xsl:for-each select=“/a/b/c/copy-of()”>
  <xsl:value-of select=“price + discount”/>
</xsl:for-each>

should not materialize the entire sequence of c elements in memory, and there is a reasonable expectation that this will also apply to

<xsl:for-each select=“copy-of(/a/b/c)”>
  <xsl:value-of select=“price + discount”/>
</xsl:for-each>

But there is nothing we say in the spec to enforce this; it is a quality of implementation issue.

All of this applies to the use of last() as much as to other constructs that prevent pipelined processing of a sequence. The significant difference with last() is that its use may be less obvious than other constructs such as reverse() and sorting. When you do something like xsl:for-each or xsl:apply-templates (without a contained xsl:sort), you appear to be processing the sequence one item at a time, but while you are processing that item, last() gives you the opportunity to reach out beyond that item and do something that requires information about the sequence as a whole. In the case of xsl:apply-templates the use of last() is particularly dangerous because it can’t be statically detected.

My original observation, which started this discussion, was in bug 29120 in relation to the use of last() within xsl:merge. This again is rather insidious because the snapshot operation happens behind the scenes; it’s less obvious that the streamed input sequence has been grounded. Equally, the intent of streamed merging relies entirely on the sequence of grounded nodes (snapshots) not being materialized in memory, but being pipelined so it is processed one node (or one merge-group) at a time. Any use of last() clearly defeats this purpose, and since it can be detected statically (with caveats about dynamic function calls) there seems a strong case for disallowing it.

In bug 29153 comment #1 I proposed the following rule:

<quote>
fn:last() is roaming and free-ranging if its focus-setting container is consuming.
</quote>

On reflection, though, I suspect this doesn’t actually help much. The reason for this is that the rule only comes into play if the streamability analysis gets as far as considering the call on last(), and in many cases, once we are in “grounded territory”, we don’t actually analyse any further.

I think I’m therefore going to suggest closing these three bugs with no action, other than to add advisory text to the spec warning implementors and users of the consequences. This essentially means that last() is allowed in these “dangerous” contexts; implementations may of course produce a warning if they can detect that a call on last() will prevent effective streaming, but they must respect the semantics (running out of memory if necessary). The notes to this effect will appear largely under the sections on streamability of xsl:merge and of last(), perhaps referenced from copy-of() and snapshot().

Michael Kay
Saxonica

Received on Thursday, 1 October 2015 10:01:42 UTC