Re: last() with grounded consuming sequences

Here’s a revised version of the proposal that attempts to deal with the issues raised by Abel:

<rule>
In any focus-changing construct C (§19.3), if the following conditions hold:

(a) the controlling operand of C is grounded and consuming, 

(b) there is no FunctionCall with the function name fn:last whose focus-setting container is C

then the focus that is established for the controlled operands has an absent context size. When last() is evaluated with such a focus, it throws XPDY0002.
</rule>

What this means is that you are allowed to call last() if your call on last() is statically visible. You’re not allowed to do it covertly. So it’s fine to do

<xsl:for-each select=“emp/copy-of()”>
  <xsl:if test=“position() = last()”>…

and it’s fine to do

select=“(emp/copy-of())[ position() lt last() div 2 ]

(though the system might give you a warning about having to buffer the sequence in memory)

but it’s not OK to do

<xsl:apply-templates select=“emp/copy-of()”/>

and then call last() in the invoked template rule.

Dealing specifically with Abel’s concerns:

> 
> Obviously, this puts some strains on implementations that we previously didn't have:
> 
> 1) grounded seqtors or constructs were "free havens", where any expression was allowed, now not anymore

The only change is where you’re using some kind of iteration construct to iterate over the consuming sequence. I think it’s a fairly intuitive message to explain: when you process a streamed sequence item by item, the system doesn't know how long the sequence is going to be. You’re allowed to ask how long the sequence is, but you have to be up-front about it: the system needs to know that you need to know, so that it can anticipate the question, typically by putting the whole sequence in memory. The usual use cases for “last()” inside xsl:for-each and inside predicates still work.

> 2) variable bindings can, under certain circumstances, *not* be (aggressively) inlined anymore

I think that’s already true with streaming to some extent. Usually inlining a variable will turn something that was non-streamable into something streamable. But yes, we now have a different rule for 

<xsl:variable name=“v” select=“emp/copy-of()”/>
<xsl:apply-templates select=“$v”/>

than the rule for

<xsl:apply-templates select=“emp/copy-of()”/>

because in the first case the apply-templates/@select expression is not consuming.

> 3) this error may happen in the most peculiar depths of nested, recursive, repeating templates, which may be hard to explain to a user *why* he is not free in a non-streaming template / for-each / xsl:iterate / xpath expression etc

Firstly, I think the error will be rather rare, because out-of-band calls to last() are unusual. Secondly, I think one can do a reasonable job of the diagnostics, because at the point where you set up a focus with absent context size, you can also capture information in that context about the point at which this focus was created, enabling a message such as “The value of last() is not available because the context item is part of a sequence that results from item-by-item processing of streamed input, initiated using the xsl:apply-templates instruction at line NNN."

> 
> Another, remaining question is: what happens in match="bar[last()]" in a non-streaming template, where the input is a result of a select expression using copy-of()? Current rules would dictate that, because it raises an error, it will never match. Is that an acceptable behavior?

last() here means that the bar element is the last child bar of its parent element; it has nothing to do with the position of the bar element in the “current sequence”. If bar is a streamed node this will fail (statically); if it not a streamed node, it will succeed: it might not do what the author intended, but that’s not a new problem.
>> 
> 
> I could repeat my idea of [allowing] a single construct (xsl:on-last, or fn:is-last()), but I believe you were little enthusiastic about that in an earlier discussion (for obvious reasons, though a single call / construct is easier to track than a combination of function calls).

I think there are adequate workarounds. For example, a user who knows that the value of last() is needed in a called template can do

<xsl:for-each select=“emp/copy-of()”>
  <xsl:apply-templates select=“.”>
    <xsl:with-param name=“last” select=“last()”/>
  </
</

which works because the call on last() is “overt”.

> 
> This seems a good suggestion all in all, but I would like to add that our previous conclusion on the same, using dynamic errors caused by streaming, should be avoided at all cost. I have, in principal, no objections to this renewed effort to try to find a more suitable, and kinder-to-the-user solution.
> 

Yes, this does break the principle that we don’t have dynamic errors caused by streaming. It’s not the first case, but yes, it’s significant. However, I think it’s clear to users why the restriction exists, and I think the immediate failure caused by the attempt to call last() is better than the “late” failure caused by running out of memory because the system tries to buffer the sequence in a vain attempt to evaluate last(). At least you’ll find the error when you test your code on small amounts of data rather than only finding it when you scale up.

Michael Kay
Saxonica

Received on Thursday, 22 October 2015 19:18:11 UTC