Re: Static streamability, regarding bug 29984 from C. M. Sperberg-McQueen on 2016-12-08 (public-xsl-wg@w3.org from December 2016)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 8 Dec 2016 15:58:39 -0700
To: Abel Braaksma <abel.braaksma@xs4all.nl>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Public XSLWG <public-xsl-wg@w3.org>
Message-Id: <259CEF03-CB07-4187-9ADC-0739EA993010@blackmesatech.com>
> On Dec 8, 2016, at 9:51 AM, Abel Braaksma <abel.braaksma@xs4all.nl> wrote:
> 
> ...
> So, some parts of expression rewriting must happen prior to streamability analysis, some parts afterwards. I would like to find a way that rewriting is not relevant for the streamability analysis (or more generally, optimizations). It is possible that I am vague about this, I just don't have the right expressive power to make my point understood, it seems.

I see that my previous response to this paragraph rather missed the point; sorry.

I don’t think you are vague, though I think you are failing to
distinguish the different sets relevant for these discussions:

  - GS: the set of constructs which are guaranteed streamable

  - SIF: the set of constructs which are streamable in fact, given a
    sufficiently clever processor

  - SIP(P): the set of constructs recognized as streamable by a given
    processor P

(Note in passing: if our definition of GS is sound, which we believe
but have not proven, then GS is a subset of SIF.  If processor P’s
streaming code is sound, then SIP(P) is a subset of SIF.  If a
processor conforms to our spec, then GS is a subset of SIP(P).  So in
the expected case, GS in SIP(P) in SIF.)

Every streaming processor P must analyse code to decide whether it is
in set SIP(P) or not.  (This is not required by the spec but by the
reality that not all XSLT code can be streamed.)  Implicit in your
proposal is the desire to make the code which makes this decision ALSO
decide whether the code is in GS or not, and with no addtional work,
or very little.  

In the status quo, GS is a proper subset of SIP(P) for all known
processors P, and one of the reasons is that static analysis and
rewriting often reveal a construct to be streamable, by reducing it to
the same internal form as a GS construct.  But unless one either
suppresses certain rewrites, or duplicates certain static analyses, or
keeps around both the rewritten and the original form of an
expression, or otherwise does some work that has no effect on the
actual evaluation of the stylesheet in the presence of data, then
calculating whether a given construct C is in SIP(P) does not suffice
to calculate whether C is in GS.

You would like, if I understand you correctly, to extend set GS to
include not only all constructs defined as GS to our rules, but any
that turn out to be equivalent to GS by means of rewriting; this
essentially would make rewrites "irrelevant" for purposes of GS
analysis (not for SIP(P) analysis, of course -- I assume that by
"streamability analysis" you here mean GS analysis not SIP(P)
analysis, since SIP(P) analysis is not constrained by our spec).

You may now judge whether you have made yourself understood or not.

I don't think the problem is that you are vague or not being
understood.  I think the problem is that we disagree.

It is an unavoidable fact that if GS and SIP(P) are different sets,
calculating membership in SIP(P) does not suffice to tell whether a
construct is in GS.  (The converse does hold, however, because GS is a
subset of SIP(P), so GS(C) implies SIP(P)(C).)

You would like to modify the definition of GS so that deciding
membership in SIP(P) does suffice to decide membership in GS.  This
does not require that SIP(P) be identical to GS (though that would
suffice); it is enough that at some point in the calculation of
SIP(P), all members of GS and all non-members of GS be identified.
(MK appears to propose a slightly different approach: not a change to
the definition of GS, but a relaxation of the requirement that "C is
in GS" be explicitly decided and reported for every construct C. The
difference is not unimportant, but is not relevant for this mail.)

It seems clear to me that this change to GS means that membership in
GS will be different for different processors -- so strictly speaking,
we have not "C is in GS" but "C is in GS(P)" for some processor P.
That this is so follows from the premise that no one is proposing to
standardized expression rewriting rules or inference rules or other
rules of code analysis and optimization which are relevant to the
calculation of SIP(P).  Since rewrite and inference rules are not
standardized, it is possible for one processor P to realize that an
expression E will never consult a streaming node while another
processor Q does not realize it.  C will be in GS(P) but not in GS(Q).

From this variation among processors it follows that it will no longer
be possible in principle to learn, from a single conforming processor,
whether a stylesheet will be in GS for all processors.

Your view, as I understand it, is that the change in definition of GS
is advantageous for implementors, that the resulting improvements in
implementation will be advantageous for users, and that these
advantages outweight any disadvantages associated with the loss of a
user's ability to learn, from a single conforming processor, whether a
stylesheet is in GS for all processors.  (I do not know whether this
is because you believe there are no disadvantages at all associated
with that loss, or because the disadvantages exist but are small [at
least smaller than the advantages], or because you expect the number
of actual cases involved to be very small, so the disadvantages will
be very rare,)

My view, on the contrary, is that the disadvantages outweigh the
advantages by a substantial margin.  I think the user's ability to
write portable streamable stylesheets (by which I mean stylesheets
which any conforming streaming processor will stream) is in practice
dependent on being able to determine whether the stylesheet is in GS
or not.  Losing the ability to learn that from a single conforming
processor reduces the idea of writing a stylesheet to be in GS to a
theoretical possibility; I regard that as a major disadvantage,
probably fatal to the utility of the 3.0 specification and certainly
much more important than the advantages to implementors of not having
to write separate code for calculating membership in GS and membership
in SIP(P).

best,

Michael


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
Received on Thursday, 8 December 2016 22:59:11 UTC