Re: about bug 29472 and turning streaming on and off from C. M. Sperberg-McQueen on 2016-06-16 (public-xsl-wg@w3.org from June 2016)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 16 Jun 2016 10:01:57 -0600
To: Abel Braaksma <abel.braaksma@xs4all.nl>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Public XSLWG" <public-xsl-wg@w3.org>
Message-Id: <1A8010FD-FAAB-4DFD-BC92-72976DFE9007@blackmesatech.com>
On Jun 16, 2016, at 5:56 AM, Abel Braaksma wrote:

> See also my comments between the lines.
> 
> In summary, a lot of your argument resolves around the fact that you seem to think that a stylesheet is either streamable or not. But we do not have a concept of streamable stylesheets (or streamable modules or packages). We have a concept of streamable constructs.

You may be right, but I don't see how any of my remarks depend
on the concept of 'streamable stylesheet' being defined by the spec,
or even being a particularly important concept.


> 
> Another (big) part of your argument is control from the API or commandline. The irony is that it is precisely that point that initiated the xsl:stream/@streamable="yes|no" discussion. Without that attribute it is not possible to control it via the commandline. Consider:
> 
> <xsl:param name="stream" select=" 'no' " static="yes" />
> 
> <xsl:mode _streamable="{$stream}"/>
> 
> <xsl:template match="/">
>   <xsl:stream href="strm.xml">
>     <xsl:apply-templates />
>   </xsl:stream>
> </xsl:template>
> 
> As written, this is not streamable and MUST raise an error (but a processor MUST also provide a commandline option to try to stream it anyway, but see my previous mail).

Why must it raise an error?  That's two questions:  

(1) what part of the spec requires that?  

And (2) if it is so, why did we write the spec that way?  

If a non-streaming processor is expected to implement the entire language (which 
I thought was an agreed design principle) , then a non-streaming processor 
should implement xsl:stream, and its implementation should work (up to the 
limits of its resources) regardless of whether any construct, mode, or template 
is guaranteed streamable, streamable in fact, or declared streamable.

I think your argument here takes the form:  

  - the status quo is untenable
  - the only way to change the status quo to make it tenable in this case is
    an attribute to turn streaming on or off for a given stream instruction
  - therefore, we should adopt the stream instruction

Neither the first premise nor the second seems intuitively obvious to me.
We want our spec to behave well in realistic situations, and to be well
defined in all situations.  If you take this to be a realistic situation, you may
need to provide some more context so that I can share that view; off hand
I'm having trouble understanding how I could end up with this situation
in real life.  I don't believe we should ignore cases like this, but my primary
desire for cases I don't expect in real life is that the spec should clearly
specify what to do. On your account of the spec, it does:  this is a
stylesheet which will always raise an error.  I think it will always be possible
to write such stylesheets in any sufficiently powerful stylesheet language.

> 
> Currently, the only way to counter this is as follows:
> 
> <xslparam name="stream" select=" 'no' " static="yes" />
> 
> <xsl:mode _streamable="{$stream}"/>
> 
> <xsl:template match="/">
>   <xsl:stream href="strm.xml" use-when="$stream = 'yes' or $stream = '1' or $stream = 'true' ">
>     <xsl:apply-templates />
>   </xsl:stream>
>  <xsl:apply-template select="doc('strm.xml')" use-when="$stream = 'no' or $stream = '0' or $stream = 'false' " />
> </xsl:template>
> 
> This is not only counter-intuitive, it is simply very hard to do this everywhere in your stylesheet and clutters your code tremendously. If you could instead write this:
> 
> <xsl:param name="stream" select=" 'no' " static="yes" />
> 
> <xsl:mode _streamable="{$stream}"/>
> 
> <xsl:template match="/">
>   <xsl:stream href="strm.xml" _streamable="{$stream}">
>     <xsl:apply-templates />
>   </xsl:stream>
> </xsl:template>
> 
> You can now use the API or commandline interface of your processor to control streaming.

I don't see how that would work:  the proposal I see in the bug report is for
streamable="no" to request non-streaming processing.  Have I misunderstood?

If the xsl:stream instruction turns streaming on and off, then what effect can 
a run-time parameter have, other than make the processor override the semantics 
of the language and become a non-conforming processor?  If we want to 
contemplate run-time options to behave in non-conforming ways, we don't 
need a streamable attribute on xsl:stream to allow the original stylesheet to 
be run without errors.

>> 
>> 
>> 1 Streamability is an optimization.
>> 
>> The goal of writing a stylesheet using the guaranteed-streamable
>> subset of XSLT and the goal of declaring things streamable in such a
>> stylesheet is to enable the stylesheet to be evaluated in a way that
>> limits one particular cost: storage consumption. The language of
>> guaranteed-streamable constructs is a subset, not an extension, of
>> XSLT.
> 
> There is a fundamental mistake in this reasoning: a stylesheet is not guaranteed streamable. Nor is a module, or a package. A construct is.

I don't see anything in the paragraph quoted which is inconsistent with the
observation that individual constructs are streamable.  Those constructs
do appear in a stylesheet, do they not? The stylesheet thus uses the
guaranteed-streamable subset of XSLT for those expressions and instructions,
does it not?

I did not say, and did not mean, "writing a stylesheet using only the
guaranteed-streamable subset of XSLT".


> 
> One stylesheet can contain both guaranteed streaming and non-streaming code.

Agreed.

> If the API is the sole way of distinguishing these parts (think of xsl:mode, xsl:accumulator, xsl:template, xsl:stream, xsl:attribute-set) then it becomes very, very complex to define what part is streamable and what is not.

I do not understand this sentence.  Why would the API be responsible for
identifying which parts of a stylesheet module are either guaranteed streamable
or streamable in fact? 


> Hence we decided at some point to *declaratively* state that a given construct SHOULD BE guaranteed streamable. 

I think you refer here to the 'streamable' attribute.  

This seems to me a very useful observation, because it identifies a potentially
important difference of view.  I do not believe that the WG ever decided that 
streamable="yes" should have the meaning you ascribe to it, and I would like
to ask if anyone can document a decision to the effect that it should.  If I 
participated in a decision with that effect without noticing it, then I must 
apologize to the WG for not doing my job properly:  I do not think such a 
meaning is a good idea.

I understand the meaning differently:  I believe streamable="yes" means
"Hint to processor:  if you are smart enough, you will find it possible to 
evaluate this in space sublinear to the size of the input trees."  The relation
to the guaranteed-streamable subset of XSLT is simply that any streaming
processor is required to be smart enough to evaluate a construct in sublinear
space, if the construct (and those called from it, recursively, etc., etc.) is 
guaranteed streamable.

I have always understood the 'streamable' attribute as a simple optimization, 
since checking part of a stylesheet to see if you can in fact stream it is likely to 
be quicker and less complicated than checking the entire stylesheet.


> 
> This also serves another purpose: compatibility between processors. If we do not have these declarations, how can you ever determine, as a user, that two constructs will be processed in a streamable way by two compatible processors?

I am not sure what to say here.  i don't know why I will ever want to establish such a 
fact, and if I did want to, I don't see any way to go about it other than measuring
memory usage of the two processors on test data and hoping I can interpret the
resulting data correctly.

I do agree that the overt declarations of streamability do serve an interoperability
purpose:  I think of it as a processor-independent hint.


> 
> And then there is the implementation problem: if you precompile a stylesheet, a processor MUST know beforehand that a construct is streamable, because the way such a construct is compiled is fundamentally different. In fact (but I don't know about Saxon), it changes the way the XDM is build. As such, this information must be declaratively present to allow a processor to make the right decisions.

That sounds plausible.  I am not sure I see how it bears on bug 29472, but 
perhaps I can be educated.

> 
>> 
>> 2 Optimizations do not affect the meaning of a stylesheet.
>> 
>> Like other optimizations, streamability affects characteristics of the
>> execution process which lie outside the explicitly defined meaning of
>> the stylesheet (which I take to be, formally, a mapping from inputs to
>> outputs).
> 
> Streamability is more than just an optimization. Some streams, like continuous streams, can never be processed by a non-streaming processor. And while they have a mapping input vs output in a given point in time, they do not have a stable mapping, that is, if inside one execution you call xsl:stream twice, you get different results with the same input.

Am I guaranteed to get different results?  Please say no!

Am I guaranteed to know how many times a processor will evaluate any
construct?  I hope not.

Does the fact that xsl:stream does not guarantee repeatability mean that
in the spec as now written it behaves differently for streaming processors
and non-streaming processors?

I think I may have been careless in my expression:  probably I should have
written not "streamability" but "streaming processing".  Apologies for the
ensuing confusion.


> A non-streaming processor cannot do that.

Why not?  If a non-streaming processor can interpret xsl:stream (and I think it
ought to be able to), does the spec say that it is required to treat xsl:stream as
if xsl:stream made a consistency guarantee?


> 
> In part this is also true for non-continuous streams. Streams are non-deterministic and therefore more than just an optimization.

This makes me very nervous and makes me fear that we tried to 
allow an optimization and ended up making fundamental changes
to the language without realizing it.  Please reassure me before I 
decide that the only safe thing to do is to abandon all our work on
streaming.


> 
>> 4 Streaming and non-streaming processors should never produce
>> different results for a given stylesheet on a given input, except in
>> areas documents as implementation-defined or -dependent.
>> 
>> N.B. I am using "result" here to mean "output produced by a stylesheet
>> evaluation that runs to completion without errors or exceptions"; we
>> expect non-streaming processors to fail on some inputs which streaming
>> processor can handle, but if neither processor fails, we expect the
>> same results.
> 
> Not quite true. Two streaming processors MUST provide the same output if run with the same variables (time, input stream bytes etc), but a non-streaming processor does not have to, because it cannot in some scenarios.

If it cannot run to completion, it does not produce results, as that term is
meant, and defined, in the paragraph above.  I think you are saying only
that processors may fail to complete an evaluation of a stylesheet if they
run out of resources.  Surely this applies to streaming processors as well as
non-streaming processors?

I am going to stop here, because the call is beginning.  I will continue
with the rest of the mail later.

Michael

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Thursday, 16 June 2016 16:02:23 UTC