Static streamability, regarding bug 29984

During today's meeting we discussed Bug 29984 which is about a "feature at risk" in the latest public CR, namely the ability to enforce strict raising of XTSE3430 even in cases where a construct is obviously streamable, but our rules say it isn't.

After the discussion, MSM wrote the following thoughts in IRC:

[...] But if we are going to remove the interoperability rule so thoroughly, why are we defining a concept of guaranteed-streamable in the first place?  If that definition has become pointless, as well as far more complex than I wish it were, then why not forget it entirely?

We didn't address this comment and concern in depth, but I would like to try to respond to it and hope to remove some of the concerns raised.

We currently have two sets:

(1) the set of guaranteed-streamable constructs
(2) the set of non-guaranteed-streamable constructs

The current rules are:

A) If any construct falls within set (1), it is *always* guaranteed-streamable, regardless of processor.
B) If any construct falls within set (2), some processor *may* be able to process it in a streamable way

The issue in Bug 29984 defines a third set, by two definitions:

3a) the set of constructs that by static rewriting is guaranteed-streamable
3b) the set of constructs that by static analysis never accesses a streamed node

In the bug report Michael Kay showed an example of a construct that is not an expression but is still trivially streamable, yet not by our rules unless we allowed (3a). The rule of (3b) is currently in our spec in some places (i.e. on axis steps). An (extreme) example is (foo, bar)[0], which always returns the empty sequence.

If we were to accept either (or both) these rules we can say:

C) If any construct falls within set (3a) or (3b) it is *always* guaranteed streamable, but it is processor-dependent whether this is detected

Therefore, this technically falls within the "may be able to stream" of (B), but the only difference being we don't require the processor to have a user option to raise an error in this case. 

Extending our rules this way still allows the interoperability given by sets (1) and (2). And users that want their stylesheets to be guaranteed streamable should simply stick to set (1). Should they consciously use set (3a/3b)? I don't think so. But if they unconsciously happen to be in that area, we don't require a processor to interrupt processing and raise an error.

Allowing such rewrites is very similar to the way XPath allows rewrites that raise, or do not raise errors. That specification has gone to great lengths to allow optimizations and allow a certain leeway in whether or not a processor raises an exception. Does this mean that the concept of having exceptions is futile? I don't think so. Even if it means that certain tests have no guaranteed outcome (i.e. (1, 2/0)[1] may or may not raise an error).

I think it is a good thing if we allow a similar kind of leeway with streamability. It is a far fuzzier subject than error raising, yet we are trying to be more strict about it. I don't think that is in the interest of end-users, optimizing implementations and future improvements to algorithms or type systems.

By allowing the leeway only in a very limited set of provable expressions and constructs, I think we increase in usability and testability. We no longer need a test that proves that (A + A) is non-streamable because we can show that (2 * A) is the same and is streamable, we can allow it be both streamable and non-streamable. The same way we allow it to sometimes raise an error and sometimes not.

I can understand that there's some resentment against this proposal. In fact, for a long time I was against it myself. However, I think there's room for improvement here without opening up the strictness of guaranteed-streamability too much, if at all (one can wonder, if a + b == b + a, and we allow a + b, but disallow b + a, we may have been writing the wrong kind of strictness in the spec, this bug is trying to address that).

Cheers,
Abel 

Received on Thursday, 1 December 2016 18:53:41 UTC