Re: about bug 29472 and turning streaming on and off from C. M. Sperberg-McQueen on 2016-06-16 (public-xsl-wg@w3.org from June 2016)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 16 Jun 2016 13:31:11 -0600
To: Abel Braaksma <abel.braaksma@xs4all.nl>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Public XSLWG" <public-xsl-wg@w3.org>
Message-Id: <2E242D50-BB24-4A7E-9FB3-647B44EB83F7@blackmesatech.com>
In earlier mail [1], I responded to the first part of ABr's reply.   This mail is for
the next part.

I am coming to believe that root of the issue lies in the distinction (or in 
some cases the failure to distinguish) between declarative meanings 
and imperative meanings in our discussion and perhaps in the spec.

 
On Jun 16, 2016, at 5:56 AM, Abel Braaksma wrote:
> 
> 
>> II. What am I looking for?  Where am I coming from?
>> 
>> Now to discharge my action by describing what I think makes sense for
>> mechanisms to control streaming.
>> 
>> a) First, I should say explicitly that I share the expectation that in
>> developing a complex stylesheet it will be very helpful to be able to
>> turn streaming off.
>> 
>> Just as people check the behavior of stylesheets (especially but not
>> only buggy stylesheets) by running them with different processors, I
>> would like to be able to check the behavior of a streaming stylesheet
>> I'm writing by running it with a non-streaming processor. This is one
>> reason I value points 3 and 4 above.
>> 
>> If a processor I'm using comes with both a streaming mode of operation
>> and a non-streaming mode, I expect I will want to test problematic
>> code in both modes.
> 
> Exactly. But on a per-construct basis. Not the stylesheet as a whole.

You can make the case that turning streaming on and off for 
individual constructs is important, if you like. I was not attempting
to address that question in the paragraphs just quoted.


>> b) Since it seems it may be a minority view, I should also say
>> explicitly that I expect my foreseeable user requirements in this area
>> to be met with a global streaming vs non-streaming switch. That is, I
>> don't foresee an urgent requirement on my part to turn off streaming
>> on one particular stream while keeping streaming turned on for all
>> other streams.  I may be wrong.
> 
> This in itself can be a processor-switch: to "switch" the whole processor into a non-streaming processor.

I'm not sure what the antecedent of "This" is.  I agree that a processor-level
switch which turns the processor into a non-streaming processor would
have the effect of turning off streaming for all constructs.  

> 
> But streaming is a property of a construct, not of a stylesheet or a package as a whole. And an precompiled package may be compiled in such a way that it can only be run with processors that support the streamability feature.

In that case, as a user I expect that either (a) the library package is well debugged
and my problem does not lie there (so my inability to turn streaming off for
that package won't hurt me) or (b) the bug is in fact in the library code, and
nothing I can do will fix it anyway.


> 
> Conversely, we can already switch it on/off for *all* constructs that support streaming,

I don't see how, unless "streamable=yes" means "please stream", which is consistent
neither with what I think it means, nor with the meaning you ascribe to it at the top
of your note.

> except one, xsl:stream. It is a matter of orthogonality and usability to allow users to switch it on/off on xsl:stream as well.


If streamable="yes" and streamable="no" meant "please stream" and "please do 
not stream", then I think you would have a point.  (I might be wrong; I haven't
thought about it very long.)  But if that is what they mean, then I have been
worse than useless in this WG.



> 
>> 
>> c) In other contexts (e.g. C compilation, Java execution, image
>> compression, database queries), I am accustomed to control the
>> aggressiveness of optimization and various aspects of resource usage
>> with command-line (invocation-time) switches.
> 
> I disagree with the notion that streaming is merely an optimization. It affects the code as a whole. The optimizations you speak of are of the kind of removing debugging information, tail-call recursion to loop optimization, function inlining etc. They are of a very different kind.
> 
> In C you cannot use an optimization switch to control whether or not the bodies of functions should be inlined.

Do you mean that no C compilers offer that? Or that in the nature of the case no
C compiler could possibly allow an --inline option which affects all functions,
or an --inline=X option which affects function X?

> In fact, for that they have the "inline" keyword.

You are quite right; it is annotations like the inlining request that 
led me to point g below.

But whether you share my views or not, the statement of fact in this point
remains true:  I am accustomed to controlling such things with command-line
options which have imperative semantics.


> In F#, you can force a declaration to be tail-called optimized, in the code.

I'll take your word for it; I confess that strikes me as more than a little
weird.  
 

> You can then have a debug build (without the TCO) and a non-debug build with the TCO. 
> 
> But in both cases, the programmer must write it in his code correctly and must inform the processor from his code that he wants the compiler to assess that a certain construct is inlineable or TCO'able. 
> 
>> 
>> My mental model of the 'right' way to control optimization, resource
>> usage, and other things which are orthogonal to the correctness of a
>> process is shaped by examples like these:
>> 
>>  - the -O0, -O1, -O2, and similar flags of gcc;
> 
> These optimizations do not control resource optimization,

?  Since when is time not a resource?


> 
>> 
>>  - the Java options for controlling memory and garbage collection
>>    (-Xmx, -Xms, etc., etc.);
> 
> Again, these do not control memory consumption of the code, they *limit* the possible memory consumption of the code. The programmer still has to write the code in such a way that it fits in the memory.
> 
> In XSLT this is not different: the programmer must program the code in such a way that it does not consume much memory. And then we added a way to allow the programmer to control on a per-construct basis whether it should consume all, or very limited memory.

I hope not.  I believe that what we added is a way for the programmer to make
declarative statements about how much memory is required by a sufficiently
intelligent processor.  If the streamable attribute is a way to control how
much memory *should* be used, then we have drifted much too far away from
our original design principles.

> ...
>> 
>>  - the run-time options on gzip and other compression software which
>>    control the compression method used and the tradeoff between
>>    compression time and compressed size;
> 
> I don't see the analogy.

Two compressed files have the same 'meaning' (in this analogy) if they expand
to the same uncompressed file.  The size of the compressed files is an 
important property for practical reasons but is orthogonal to the meaning of
the compressed files.  (And for the file compression utilities I use, it's not 
controlled by an in-line processing instruction in the data to be compressed.)


> 
>> 
>>  - the index construction statements of SQL.
> 
> In this case, the language SQL has this as part of the language. Just like XSLT has it as part of the language to control whether or not you stream. Or am I missing your point here?

Constructing an index may affect the speed with which a SELECT statement,
an INSERT statement, or an UPDATE statement executes; it is likely to affect 
the amount of disk space used by a database.  It has no effect at all, however,
upon the meaning or syntax of a SELECT, an INSERT, or an UPDATE statement.
If I understand correctly, in some shops application programmers do not
run indexing statements; only database administrators do that.

> 
>> 
>> All of these affect things like speed (of the compiler, of the
>> resulting executable, of the compression processor, of INSERT and
>> UPDATE statements, of SELECT statements) or size (of executable, of
>> virtual machine, of compressed output, of database on disk), but none
>> of them change the meaning of the C program, Java program, image or
>> other file being compressed, or SQL INSERT / UPDATE / SELECT
>> statement.
> 
> Yes, but in all but a few examples above, the programmer controls this *through the language*, not with compiler switches.

We seem to disagree vigorously about the facts of the case, as well as about
our interpretation of the facts of the case.

> Sometimes, a compiler switch adds something extra (remove debugging information, automatic TCO), but we have that in XSLT as well, remove xsl:assert, or add the execution plan. 
> 
> But I think we are closer to SQL, where you can use the language itself to create indexes, intermediate tables, in-memory tables, temp tables etc. All by using standard SQL alone and not command-line switches on firing up the database. 

I see no control over intermediate, in-memory, or temp tables in 
standard SQL 92 (the most recent version of the standard I have 
studied in any depth, sorry) -- that made SQL radically different from
other relational and non-relational languages I had any acquaintance
with.

If your perception of the world is that in all good computing environments,
control over implementation details and properties like memory usage
and so on is in the language semantics at a fine grain and not carefully 
kept separate and orthogonal, then it seems likely that you and I will not
agree on this topic.  All the examples I cite seem to me clear examples
of separating operational properties like speed and storage usage from 
the declarative semantics of a language; if they seem to you to be examples
of mixing declarative and operational semantics, then I can only think you
see the world from a very funny angle.

> 
> Streaming is not a property of size and speed. It is a fundamentally different way of treating the input tree and certain operations do not apply to it (preceding-sibling). To be able to assess that and to allow users control over it, forces us to declare this as a property of a construct. 

The only difference I know of that we want streaming processing to make
is that the streaming processing should be possible in s smaller space.

If XSLT 3.0 exposes those differences to the XSLT programmer, then has
our design effort not failed?  We discussed at some length the possibility
of introducing a new kind of node, with different rules, and rejected it.  
If it has snuck in nevertheless, I am sorry to realize it.



> 
>> 
>> 
>> d) By analogy, my instinct is to say that the right way to handle a
>> switch to control whether streaming is undertaken or not is with an
>> invocation-time switch, on the command-line or as an option passed to
>> an API.  Or, more generally, something wholly outside the stylesheet
>> itself.
> 
> Again, this is not possible, I think. How would you devise such an API? Streamability is a property of a construct, not of a stylesheet.

Pointing at parts of XML documents is quite a general problem.  XPath,
a series of drafts of the XPointer spec, and the final XPointer recommendation
are all solutions to the problem.  How can it be impossible to point from
the command line at a particular construct and say "That one, please
handle it this way!"?

> 
>> 
>> This may mean that I disagree with the premise of bug 29472, for which
>> the initial description begins:
>> 
>>    ... we assessed that it was very desirable to have the possibility
>>    to switch OFF streaming for xsl:stream. The current means to do so
>>    are cumbersome to do implementation-independent way, or are at API
>>    level.
>> 
>> I expect command-line options to be implementation-dependent or to be
>> at the API level.  I no more expect an implementation-independent way
>> to control streaming than I expect an implementation-independent way
>> to ask a C compiler for aggressive optimization or for none at all.
> 
> It is not an optimization.

How is "please conserve storage while evaluating this" not an optimization?


> And xsl:stream is simply the only way to have an implementation-independent way of saying "I want to stream this".

Want to?  I don't want a way to say "I want to ...".

I want (a) a way to say "this can be handled in a streaming way" and (b) a 
way to say "please handle this in a streaming way".  I would like the first
to be part of the XSLT language, and the second to part of my interface
to a processor. 

I don't want XSLT constructs for talking about my hopes and desires.


> 
>> 
>> If I were specifying command-line options for a processor that wanted
>> to allow individual xsl:stream instructions to be processed in with
>> different values of streaming-mode (yes vs no), my first sketch would
>> be: (a) assign an xml:id to every stream you wish to control in this
>> way; (b) use the --streaming=ID or --nostreaming=ID options
>> (repeatable) to turn streaming mode on or off on individual xsl:stream
>> instructions.  I mention this not because I think any implementors
>> have anything to learn from me in this regard but because it may help
>> other WG members understand where I am coming from (i.e. just how
>> benighted I may be, from their point of view).
> 
> I think the wrong line of reasoning here is the assumption that xsl:stream is the only way to start streaming analysis. It is not. The streamable="yes|no" attribute is available on a myriad of instructions and declarations.

I do not see any assumption in the material quoted that xsl:stream is the only
way to start streaming analysis.  You and the bug have talked about turning
streaming processing on or off for some xsl:stream instructions but not others,
and I had the impression that there was some worry that a command-line
option could not provide that functionality.  I was trying to address that concern 
here.  

> 
>> 
>> 
>> e) In general, if streaming is orthogonal to the meaning of the
>> stylesheet (point 2 above), it seems to follow that nothing in the
>> stylesheet itself can affect the choice of streaming mode or
>> non-streaming mode.
> 
> No, it is not orthogonal to the meaning of a stylesheet. A stylesheet is not streamable. Some parts of it may be.

Well, we seem to have a flat disagreement here.

If you can provide an example of a stylesheet and a set of inputs
which have one prescribed set of outputs when processed in
streaming mode and a different prescribed set of outputs when
processed in non-streaming mode, then I will agree that streaming
processing is not orthogonal to the meaning of a stylesheet but 
changes that meaning.  

> 
> 
>> f) If point 2 is accepted not just as describing a state of affairs
>> but as enunciating a design principle, then it seems to follow that
>> nothing in the stylesheet should be *allowed* to affect a processor's
>> choice of streaming mode or non-streaming mode.
> 
> If this were true, how could a processor ever detect which of the following parts should be processed using streaming?
> 
> A)
> <xsl:template match="foo[bar]">do something</xsl:template>
> 
> B)
> <xsl:template match="foo[@bar]">do something</xsl:template>

How does any processor ever decide whether to apply a particular
optimization?



> 
> Without the fine control by the programmer to say what should be processed by using streaming or not,

What constructs in our language specify that something "should be" streamed or not?

I don't see any:  the streamable attribute is described as declaring that something can
be processed in a streamable way (which, again, I take to mean "with storage costs
sublinear in the size of the input"). It does not, as I understand it, constitute a request
that something should be streamed.  Nor does streamable="no" constitute a request that
something not be streamed.  (Any more than the presence of a DOCTYPE declaration
in an XML document constitutes a request for validation, or the absence of such a 
declaration a request that no validation be attempted.)

> 
> 
>> 
>> This explains why the proposal in comment 3 troubles me.
>> 
>> 
>> g) There is, however, a counter-argument.
>> 
>> It is not a universal truth that no program in a well-designed
>> language ever contains anything that affects optimization or other
>> non-semantic properties of the program.  There is a long history of
>> using pragmas, sometimes in the form of magic comments, to control the
>> behavior of compilers (including sometimes controlling the level of
>> optimization to be attempted).  And our sister language XQuery has a
>> well-developed system of pragmas and function annotations which
>> appears to work well for its intended purposes.
>> 
>> In general, inserting pragmas to control streaming of individual
>> xsl:stream constructs seems to me a poor choice for things one wants
>> to change from one run to the next.  It will remind some people of the
>> barbed remark in Kernighan and Pike (or Kernighan and Plauger?) about
>> passing run-time parameters to the program by defining them as
>> constants in the program and passing them to the program by using the
>> compiler as an intermediary.
>> 
>> But if we do want to make it possible to control streaming processing
>> from inside the stylesheet, then making something that looks and feels
>> like a pragma, and not like declarative information about the
>> construct, would feel to me like a better design.  XSLT doesn't have a
>> lot of things that feel like pragmas, but XML does provide processing
>> instructions for precisely this purpose.  A PI immediately before (as
>> immediately preceding sibling of) an xsl:stream instruction, or as its
>> first non-whitespace child, would feel better to me than
>> streamable=yes|no with a meaning unlike that of any other attribute
>> named 'streamable' in the spec.
> 
> How do you mean "unlike any other attribute named streamable"? It has *exactly* the same meaning here as with other constructs. The only difference is that its default is "yes", because the construct was introduced to help streaming.

What do you take that meaning to be?

What meaning is assigned to that attribute by the spec?

> 
> In hindsight, I would have find it better to have xsl:doc, as a counterpart to fn:doc, which can then have an @streamable="yes|no", which would be more in line with existing constructs.
> 
>> 
>> 
>> III. Some other points


The comments in this section lead me to suspect that we have suffered a complete
breakdown of communication.  A point by point commentary seems unhelpful;
almost every point would consist of saying "huh?  What are you talking about?"


>> 
>> * What should they do if the code is streamable but not guaranteed
>> streamable?
>> 
>> If I ask for streaming, I hope they stream it.  If I ask for
>> non-streaming processing, I hope they don't.  Since streaming
>> processing is not tightly defined, non-streaming processing cannot be
>> tightly define either; I do not expect to be able to do more than
>> hope, one way or the other.  I do not expect to be able to argue that
>> a processor is non-conforming because of the way its --stream=YES|NO
>> option behaves. I do expect to be able to argue that there is a bug if
>> that option causes different results in transforms that run to
>> completion without error (modulo implementation-defined or -dependent
>> differences).
> 
> Streaming processing *is* tightly defined.

I meant only that, as section 2.12 of the spec says

    This specification does not attempt to legislate precisely which implementation 
    techniques fall under the definition of streaming, and which do not.

> 
>> 
>> B. Comments 2, 3, 5, 6 entertain variations on a proposal to add an
>> xsl:stream/@streamable attribute with 'yes' and 'no' (and possibly
>> other) values.
>> 
>> On other elements, attributes named 'streamable' indicate that the
>> construct declared is streamable (either guaranteed streamable or
>> streamable in fact).  Since the use of xsl:stream already has this
>> meaning, making it mean "please stream" doesn't lose information, but
>> it does seem to make the design less consistent.  Making it carry the
>> meaning "please stream" in all cases would, I think, be a mistake:
>> "this is streamable" and "please stream" are very different sentences,
>> with very different meanings.  The former has a clear declarative
>> meaning; the latter is an imperative which would feel out of place in
>> a declarative language like XSLT.
> 
> I think here you may have misunderstood the proposal. The meaning of xsl:stream/@streamable is proposed to have *precisely* the same meaning as with the same property in other constructs. Otherwise I don't see how it makes any sense.

I think part of the difficulty we have with this bug is that we do not seem
to have WG-wide agreement on what meaning the 'streamable' attribute
has in the places where it is currently defined.  The spec says in 6.6.4,
for example,

    Specifying streamable="yes" on an xsl:mode declaration declares an 
    intent that every template rule that includes that mode ... should be 
    streamable, either because it is guaranteed-streamable, or because it 
    takes advantage of streamability extensions offered by a particular processor.

We don't define "streamable" as a term, but I take it to mean "capable of
being processed in a streaming way" -- which in turn I take to mean "capable
of being processed with storage consumption sub-linear in the size of the
input".  

In the description of but 29472, however, what is desired is the ability 

    to switch OFF streaming

for a particular construct.  And in your email, you write (with reference, I
think, to @streamable) that we provide 

    a way to allow the programmer to control on a per-construct basis 
    whether it should consume all, or very limited memory

You also write (speaking I think in general, but I believe you intend this to
apply to @streamable)

    Without the fine control by the programmer to say what should be processed 
    by using streaming or not, there is no way a processor can decide which of 
    the above should be processed using streaming.

These lead me to believe that you and others wish to treat @streamable
as having an imperative meaning, instructing the processor to stream, or to
refrain from streaming, a particular construct.

Nothing in the sentence "X is capable of being processed with sub-linear
storage consumption" seems to me to entail the request "please process
X with sub-linear storage consumption".  And conversely:  failing to claim
that X is capable being processed with sub-linear storage consumption
does not seem to me to mean the same thing as "do not attempt to stream
X".

The two pairs of sentences do have related semantics, but this conversation
will continue to go nowhere if we cannot reliably distinguish between
declarative statements and imperative statements.


-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Thursday, 16 June 2016 19:31:53 UTC