Grounded Consuming Expressions

Following discussion, I propose to add the following section before 19.8

Grounded Consuming Constructs

A construct is grounded if the items it delivers do not include nodes from a streamed document; it is consuming if evaluation of the expression reads nodes from a streamed input in a way that requires the current position in the input to advance.

Grounded consuming constructs play an important role in streaming, and this section discusses some of their characteristics.

Examples of grounded consuming constructs (assuming the context item is a streamed node) include:

sum(.//transaction/@value)

copy-of(./account/history/event)

distinct-values(./account/@account-nr)

<xsl:for-each select=“transaction”>
  <t><xsl:value-of select=“@value”/></t>
</xsl:for-each>

In general the result of a grounded consuming construct is a sequence. Depending on how this sequence is used, it may or may not be necessary for the processor to allocate sufficient memory to hold the entire sequence. The streamability rules in this specification place few constraints on how a grounded sequence is used. This is deliberate, because it gives users control: by creating a grounded sequence (for example, by use of the copy-of function) stylesheet authors create the possibility to process data in arbitrary ways (for example, by sorting the sequence), and accept the possibility that this may consume memory.

Pipelined evaluation of a sequence is analogous to streamed processing of a source document. Pipelined evaluation occurs when the items in a sequence can be processed one-by-one, without materializing the entire sequence in memory. Pipelining is a common optimization technique in all functional programming languages. Operations for which pipelined evaluation is commonly performed include filtering ($transactions[@value gt 1000]), mapping ($transactions!(@value - @processing-fee)), and aggregation (sum($transactions)). Operations that cannot be pipelined (because, for example, the first item in the result sequence cannot be computed without knowing the last item in the input sequence) include those that change the order of items (reverse(), sort()). Other operations such as distinct-values() allow the input to be processed one item at a time, but require memory that potentially increases as the sequence length increases. Saving a grounded sequence in a variable is also likely in many cases to require allocation of memory to hold the entire sequence.

When the input to an operation is a grounded consuming sequence (more accurately, a sequence resulting from the evaluation of a grounded consuming construct), this specification does not attempt to dictate whether the operation is pipelined or not. The goal of interoperable streaming in finite memory can therefore only be achieved if stylesheet authors take care to avoid constructing grounded sequences that occupy large amounts of memory. In practice, however, users can expect that many simple grounded consuming constructs (such as those listed above) will be pipelined in any well engineered processor.

The use of the *last* function requires particular care because of its effect on pipelining.  The streamability rules prevent the use of last() in conjunction with an expression that returns streamed nodes (because it would require look-ahead in the stream), but there is no similar constraint for grounded sequences. So for example it is not possible (in a context that requires streaming) to write

<xsl:for-each select=“transaction”>
  <xsl:value-of select=“position(), ‘ of ‘, last()”/>
</xsl:for-each>

but it is quite permissible to write

<xsl:for-each select=“transaction/copy-of(.)”>
  <xsl:value-of select=“position(), ‘ of ‘, last()”/>
</xsl:for-each>

because the call on *copy-of* makes the sequence grounded. In this simple example the impact of the call on *last* is easily detected both by the human reader and by the XSLT processor, but there are other cases where the effect is less obvious. For example if one template does

<xsl:apply-templates select=“transaction/copy-of(.)”/>

then the presence of a call on *last* in one of the template rules that gets invoked might not be easily spotted; yet the effect is exactly the same in preventing the result being computed by processing input items strictly one at a time. Avoiding such effects is entirely the responsibility of the stylesheet author.


Michael Kay
Saxonica

Received on Thursday, 8 October 2015 22:55:26 UTC