RE: Grounded consuming sequences, and the last() function from Abel Braaksma on 2015-10-01 (public-xsl-wg@w3.org from October 2015)

From: Abel Braaksma <abel.braaksma@xs4all.nl>
Date: Thu, 1 Oct 2015 16:25:37 +0200
To: "'Public XSLWG'" <public-xsl-wg@w3.org>
Message-ID: <140501d0fc55$0b809420$2281bc60$@xs4all.nl>
> 
> We make no distinction between
> 
> copy-of(/a/b/c)
> 
> and
> 
> /a/b/c/copy-of()
> 

It is true that we don't make a distinction in the *result* of the streamability rules, but they are quite different. In the first, the consuming, striding expression /a/b/c is an argument to an operand role of absorption, which grounds it. In the second, only the last part, "child::c", is an argument to an operand role of absorption. The first part can be considered equivalent to:

<xsl:template match="/" mode="streaming">
    <xsl:copy-of select="a/b/c" />
</xsl:template>

And the second to be equivalent to:

<xsl:template match="/" mode="streaming">
    <xsl:apply-templates select="a" mode="streaming" />
</xsl:template>

<xsl:template match="a" mode="streaming">
    <xsl:apply-templates select="b" mode="streaming"/>
</xsl:template>

<xsl:template match="b" mode="streaming">
    <xsl:apply-templates select="c" mode="streaming"/>
</xsl:template>

<xsl:template match="c" mode="streaming">
    <xsl:copy-of select="." mode="streaming"/>
</xsl:template>

I the latter case, the processor can stream for a long while, having only the start of the current node in memory and only a copy  of the last. The first case requires a processor to create a copy of the whole sequence at once.

As a result, using last() in the second case has a different effect than using last() in the first case.

> 
> In bug 29153 comment #1 I proposed the following rule:
> 
> <quote>
> fn:last() is roaming and free-ranging if its focus-setting container is
> consuming.
> </quote>
> 
> On reflection, though, I suspect this doesn’t actually help much. The reason
> for this is that the rule only comes into play if the streamability analysis gets
> as far as considering the call on last(), and in many cases, once we are in
> “grounded territory”, we don’t actually analyse any further.

I suspect so too.

> 
> I think I’m therefore going to suggest closing these three bugs with no action,
> other than to add advisory text to the spec warning implementors and users
> of the consequences. This essentially means that last() is allowed in these
> “dangerous” contexts; implementations may of course produce a warning if
> they can detect that a call on last() will prevent effective streaming, but they
> must respect the semantics (running out of memory if necessary). The notes
> to this effect will appear largely under the sections on streamability of
> xsl:merge and of last(), perhaps referenced from copy-of() and snapshot().

A rather unfortunate conclusion, but I don't have anything better to offer. The one rule we have always been able to rely on was the grounded-can-be-ignored by streaming rule. But we have also supported windowed streaming for a long while, and we have studied the differences between a/b/copy-of() and copy-of(a/b) before, just never encountered this idiosyncrasy.

A few proposals in the bug reports can be made workable in a large set of cases, but the one case I don't see any proposal possibly covering is where we lose the ability to detect a context of grounded-but-still-streaming:

Works with proposals:

<xsl:for-each select="a/b/c/copy-of()|">
    <!-- use last() -->
</xsl:for-each>

Does not work with any proposal:

<xsl:template match="/" mode="streamable">
    <xsl:apply-templates select="a/b/c/copy-of()" mode="non-streamable" />
</xsl:template>

<xsl:template match="c" mode="non-streamable">
    <!-- use last() -->
</xsl:template>

Unless we make it a dynamic error to use last() in such cases of windowed streaming, I don't see a way around processors requiring to consume the whole stream *and* making sure that the whole stream stays in memory.

The problem that remains is solving the question whether this can be detected. I.e., whether a processor can reliably start copying node-for-node and discarding any node he has visited, and at the same time support fn:last().

At the moment, I know we don't support it. We follow a path we call "optimistic streaming", which means that, provided that the streamability rules are solid, there will not be any more look-ahead then determining whether the next child is available (has-children), and after visiting a closing tag, we discard the streamed node. This has proven more solid than we expected (but still a lot to do). But this will break in the scenario where last() goes undetected. It consumes the stream, leaving other expressions that select anything before the end of the stream, to raise an error (an internal error, something like "node has been discarded").

I don't think we should force absorbing the whole stream if the copy or snapshot is taken only an item at the time.

But I don't know if it is solvable.

Cheers,
Abel
Received on Thursday, 1 October 2015 14:26:16 UTC