Sequences with mixed posture

In bug 29507 I consider the streamability of the two expressions

(A) (/books/book, $moreBooks) / title

versus

(B) (/books/book, $moreBooks) ! title

where /books/book is a striding expression selecting streamed nodes, and $moreBooks is grounded.

(The title of the bug talks about grouping, but that's because I discovered the problem with a grouping use case.)

I think that in the spec, both of these are guaranteed streamable. But streaming (A) causes considerable difficulty because the LH operand of "/" is not known to be in document order, and therefore the results need to be sorted.

One implementation strategy to achieve streaming of (A) is to replace the comma operator by a union operator:

(C) (/books/book | $moreBooks) / title

This is safe because the two operands of the comma operator are known to be disjoint sets (a striding node and a grounded node are necessarily in different documents). This enables an execution plan such as

merge(/books/book, sort($moreBooks)) ! title

where the sorting into document order has been pushed down the expression tree so it applies only to the grounded nodes, not to the streamed nodes. The "merge" operation here takes two node-sequences each of which is in document order and produces an output sequence which is also in document order - a streamable operation.

It's complicated by the fact that this rewrite only works if we know that the RH operand of "/" will deliver nodes. If it delivers atomic values, then we can't safely reorder the LHS; and we don't necessarily know statically whether the RH operand will deliver nodes or atomic values. We have to consider worst-case scenarios such as

(/books/book, $moreBooks) / ( if (@hasTitle) then title else ("A book by " || author))

which delivers a sequence of title elements in document order if every book has a title, a sequence of strings in LH-book order if no book has a title, or a dynamic error if some books have titles and others don't.

However, we can still apply the same strategy dynamically: rather than eliminating the sort operation from the expression tree, we can argue that sorting the result of a striding expression into document order is a streamable operation, in that it can be done by first sorting the grounded nodes into document order, then merging these into their proper place in the sorted sequence (which will either be before the streamed nodes, or after them).

If this is the way we want to go, then the fix is:

(a) we need to change the definition of "striding" in the spec: it asserts that the results of a striding expression are always in document order. We need to refine this: the result of a striding expression in general contains a mix of streamed and unstreamed nodes, and the streamed nodes in the result will be in document order. (The same probably applies to crawling).

(b) we should say something about our reasoning and the implications for this test case, perhaps in the section on streamability of path expressions.

(c) we need to consider whether this only solves this particular example, or whether there is a wider problem.

The problem essentially arises from rule 2(d)(iv) in the general streamability rules, which says that if exactly one operand of an expression is consuming, and its operand usage is transmission, then the posture is the posture of that operand. So the problem potentially affects any expression that has two or more operands where one has usage transmission. We're concerned with cases where the other operand might cause the result to be not in document order. I think that the only other constructs that accept an operand with usage transmission and that can deliver anything other than an ordered subset of the input nodes are sequence constructors and the insert-before() function. (Possibly also streamable stylesheeet functions - need to check).

I don't think sequence constructors are a problem because I don't think we ever get into a situation where we assume that the result of the sequence constructor is in document order.

That leaves insert-before - for example insert-before(/books/book, 2, $moreBooks) / title. If we adopt the "dynamic" approach - sorting a striding sequence into document order is a streamable operation - then exactly the same considerations apply as for the comma operator.

Are there any alternative solutions?

OPTION X: the result of any operation that combines striding and grounded nodes is free-ranging. This seems severe; it prevents many easily-streamable expressions such as (B) above, or <xsl:apply-templates select="$var, *"/>.

OPTION Y: the result of any operation that combines striding and grounded nodes is free-ranging if it used in a context where sorting into document order is required. The problem with this is that we need a lot new machinery to define the rules; we don't really have any explicit classification currently of expressions based on whether their results and/or their operands are in document order.

So I'm inclined to the approach which involves minimum change to the spec: mixed-posture expressions such as ((/books/book, $moreBooks) / title) are guaranteed streamable, and we just have to put the burden on implementors to make it happen.

Michael Kay
Saxonica
(posting as mhk@mhk.me.uk because of network problems)

Received on Sunday, 28 February 2016 11:35:52 UTC