- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 16 Jun 2016 11:27:04 -0600
- To: Public XSLWG <public-xsl-wg@w3.org>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
[This was sent to the member-only discussion list in error; now sending it again to the public list. Sorry for the snafu.] > ACTION 2016-06-02-001 (bug 29472) on MSMcQ to put his views on this > in writing, in email or in the bugzilla entry. During the XSLT WG call of 2 June I was asked to try to put my thoughts on bug 29472 in writing, partly because I seem to be coming at things from a very different angle. The proposal to add a 'streamable' attribute to xsl:stream, with values 'yes' and 'no', feels to me like the wrong direction, and this is my attempt to explain why. I apologize for its length. A quick summary: In general, control over optimization, execution speed, and other properties orthogonal to the meaning of a program or object seem to me to belong outside the object (program, stylesheet), not inside it. Since streaming is an optimization orthogonal to the meaning (I/O mapping) of a stylesheet, the right place for controls that say "yes" or "no" to streaming processing for particular inputs seems to me to be command-line options or invocation-time options in an API, not the stylesheet itself. This appears to mean I am out of sympathy with the desire at the heart of 29472, to be able to turn streaming processing on or off in implementation-independent ways. If we do accede to the wish fo such controls, I hope we can make them look and feel like the pragmas or processing instructions they are, and not like declarative information about the stylesheet. Conflating the proposition "this construct is in the guaranteed-streamable subset of XSLT" and the request "please process this construct in a streaming way" would conflate two different bits of information of quite different kinds. The details: I. Design issues I believe the issue raised touches on several basic design principles which I believe have governed our work on streamability; some of these principle have I believe been explicitly discussed as guiding principles in the working group, while others may not have been (and might not have commanded agreement if they had been). 1 Streamability is an optimization. The goal of writing a stylesheet using the guaranteed-streamable subset of XSLT and the goal of declaring things streamable in such a stylesheet is to enable the stylesheet to be evaluated in a way that limits one particular cost: storage consumption. The language of guaranteed-streamable constructs is a subset, not an extension, of XSLT. 2 Optimizations do not affect the meaning of a stylesheet. Like other optimizations, streamability affects characteristics of the execution process which lie outside the explicitly defined meaning of the stylesheet (which I take to be, formally, a mapping from inputs to outputs). 3 Non-streaming processors are expected to implement all the constructs introduced in XSLT 3.0 for the sake of streaming. This was explicitly discussed when one expected implementor said he did not plan to implement some construct or other (accumulators? forks?) because his would be a non-streaming processor. My recollection is that we explicitly agreed that all conforming processors were required to handle all constructs. 4 Streaming and non-streaming processors should never produce different results for a given stylesheet on a given input, except in areas documents as implementation-defined or -dependent. N.B. I am using "result" here to mean "output produced by a stylesheet evaluation that runs to completion without errors or exceptions"; we expect non-streaming processors to fail on some inputs which streaming processor can handle, but if neither processor fails, we expect the same results. My apologies if the foregoing is just a restatement of the obvious. II. What am I looking for? Where am I coming from? Now to discharge my action by describing what I think makes sense for mechanisms to control streaming. a) First, I should say explicitly that I share the expectation that in developing a complex stylesheet it will be very helpful to be able to turn streaming off. Just as people check the behavior of stylesheets (especially but not only buggy stylesheets) by running them with different processors, I would like to be able to check the behavior of a streaming stylesheet I'm writing by running it with a non-streaming processor. This is one reason I value points 3 and 4 above. If a processor I'm using comes with both a streaming mode of operation and a non-streaming mode, I expect I will want to test problematic code in both modes. b) Since it seems it may be a minority view, I should also say explicitly that I expect my foreseeable user requirements in this area to be met with a global streaming vs non-streaming switch. That is, I don't foresee an urgent requirement on my part to turn off streaming on one particular stream while keeping streaming turned on for all other streams. I may be wrong. c) In other contexts (e.g. C compilation, Java execution, image compression, database queries), I am accustomed to control the aggressiveness of optimization and various aspects of resource usage with command-line (invocation-time) switches. My mental model of the 'right' way to control optimization, resource usage, and other things which are orthogonal to the correctness of a process is shaped by examples like these: - the -O0, -O1, -O2, and similar flags of gcc; - the Java options for controlling memory and garbage collection (-Xmx, -Xms, etc., etc.); - the run-time options on gzip and other compression software which control the compression method used and the tradeoff between compression time and compressed size; - the index construction statements of SQL. All of these affect things like speed (of the compiler, of the resulting executable, of the compression processor, of INSERT and UPDATE statements, of SELECT statements) or size (of executable, of virtual machine, of compressed output, of database on disk), but none of them change the meaning of the C program, Java program, image or other file being compressed, or SQL INSERT / UPDATE / SELECT statement. There are plenty of examples in computing history of cases where the meaning of a formal language is not separated in this way from properties like memory usage and speed: C itself is one, in many areas; database management systems which provide different syntaxes for search depending on whether a given field is indexed are another. There is at least one XQuery engine in which a given expression will have two very different meanings (the apparent meaning, or the empty sequence) depending on whether one has built a Lucene index of the database. My mental model says that those are mostly good examples of why I want XSLT to have a clean separation of meaning from properties like size and speed. d) By analogy, my instinct is to say that the right way to handle a switch to control whether streaming is undertaken or not is with an invocation-time switch, on the command-line or as an option passed to an API. Or, more generally, something wholly outside the stylesheet itself. This may mean that I disagree with the premise of bug 29472, for which the initial description begins: ... we assessed that it was very desirable to have the possibility to switch OFF streaming for xsl:stream. The current means to do so are cumbersome to do implementation-independent way, or are at API level. I expect command-line options to be implementation-dependent or to be at the API level. I no more expect an implementation-independent way to control streaming than I expect an implementation-independent way to ask a C compiler for aggressive optimization or for none at all. In some styles of command-line options, options like --stream vs --nostream or --streaming=yes|no would be natural; in others, they would look different. If I were specifying command-line options for a processor that wanted to allow individual xsl:stream instructions to be processed in with different values of streaming-mode (yes vs no), my first sketch would be: (a) assign an xml:id to every stream you wish to control in this way; (b) use the --streaming=ID or --nostreaming=ID options (repeatable) to turn streaming mode on or off on individual xsl:stream instructions. I mention this not because I think any implementors have anything to learn from me in this regard but because it may help other WG members understand where I am coming from (i.e. just how benighted I may be, from their point of view). e) In general, if streaming is orthogonal to the meaning of the stylesheet (point 2 above), it seems to follow that nothing in the stylesheet itself can affect the choice of streaming mode or non-streaming mode. f) If point 2 is accepted not just as describing a state of affairs but as enunciating a design principle, then it seems to follow that nothing in the stylesheet should be *allowed* to affect a processor's choice of streaming mode or non-streaming mode. This explains why the proposal in comment 3 troubles me. g) There is, however, a counter-argument. It is not a universal truth that no program in a well-designed language ever contains anything that affects optimization or other non-semantic properties of the program. There is a long history of using pragmas, sometimes in the form of magic comments, to control the behavior of compilers (including sometimes controlling the level of optimization to be attempted). And our sister language XQuery has a well-developed system of pragmas and function annotations which appears to work well for its intended purposes. In general, inserting pragmas to control streaming of individual xsl:stream constructs seems to me a poor choice for things one wants to change from one run to the next. It will remind some people of the barbed remark in Kernighan and Pike (or Kernighan and Plauger?) about passing run-time parameters to the program by defining them as constants in the program and passing them to the program by using the compiler as an intermediary. But if we do want to make it possible to control streaming processing from inside the stylesheet, then making something that looks and feels like a pragma, and not like declarative information about the construct, would feel to me like a better design. XSLT doesn't have a lot of things that feel like pragmas, but XML does provide processing instructions for precisely this purpose. A PI immediately before (as immediately preceding sibling of) an xsl:stream instruction, or as its first non-whitespace child, would feel better to me than streamable=yes|no with a meaning unlike that of any other attribute named 'streamable' in the spec. III. Some other points Some of the specific questions raised in the bug and subsequent discussion should probably be addressed. A. In comment 1, MK asks * What should a streaming/non-streaming processor do? I think: a non-streaming processor will handle all constructs in its non-streaming way. A streaming-only processor, if it could exist, would handle all constructs in its streaming way (I am imagining a streaming processor with no alternative non-streaming implementation of things like xsl:stream -- but such a processor cannot exist, because a conforming stylesheet can contain code which is not guaranteed streamable, applied to a stream. A streaming processor will by default stream everything, and may allow the user control over whether to stream everything, nothing, or selected bits of the stylesheet. * What should they do if the code is/is-not guaranteed-streamable? I believe that whether the code is guaranteed streamable or not, all conforming processors must process it according to its semantics, provided that they have the resources to do so. As a user, of course, I hope that a good streaming processor with an aggressive optimizer will be able to stream even things that are not guaranteed streamable. * What should they do if the code is streamable but not guaranteed streamable? If I ask for streaming, I hope they stream it. If I ask for non-streaming processing, I hope they don't. Since streaming processing is not tightly defined, non-streaming processing cannot be tightly define either; I do not expect to be able to do more than hope, one way or the other. I do not expect to be able to argue that a processor is non-conforming because of the way its --stream=YES|NO option behaves. I do expect to be able to argue that there is a bug if that option causes different results in transforms that run to completion without error (modulo implementation-defined or -dependent differences). B. Comments 2, 3, 5, 6 entertain variations on a proposal to add an xsl:stream/@streamable attribute with 'yes' and 'no' (and possibly other) values. On other elements, attributes named 'streamable' indicate that the construct declared is streamable (either guaranteed streamable or streamable in fact). Since the use of xsl:stream already has this meaning, making it mean "please stream" doesn't lose information, but it does seem to make the design less consistent. Making it carry the meaning "please stream" in all cases would, I think, be a mistake: "this is streamable" and "please stream" are very different sentences, with very different meanings. The former has a clear declarative meaning; the latter is an imperative which would feel out of place in a declarative language like XSLT. C. In an attachment [1] to message 5 of the June archive, ABr summarizes what the spec says about streaming or non-streaming processing for various situations. I think the core content, for me, now (ignoring many details of importance to other people and to me at other times), is - a streaming processor evaluating guaranteed-streamable code is expected to stream - any processor facing code that's not guaranteed streamable may stream if it can - a non-streaming processor should attempt to process everything Given these principles, the simplest way to turn streaming off seems to me to be "tell the processor to be a non-streaming processor". Every streaming processor has the code necessary to be a non-streaming processor, I think, because it may encounter constructs it does not know how to stream but which it must (or wishes to?) attempt nevertheless. So I continue to think that a blanket --stream=YES|NO is the simplest solution to turning streaming off or on. [1] http://lists.w3.org/Archives/Public/public-xsl-wg/2016Jun/att-0005/Streamability_guarantees_and_invocation_rules_-_html.htm# ABr says it's not completely clear whether a (streaming) processor can complain if it sees code declared streamable that's not guaranteed streamable. I think checking for guaranteed streamability is a service any processor can offer, but since conforming stylesheets are not required to limit themselves to guaranteed-streamable constructs in code declared streamable, I think any processor must (or should) attempt to evaluate the code, unless the user has specified a "die when you encounter non-guaranteed-streamable code" option. (Again -- an invocation option, not a declaration in the stylesheet.) Again my apologies for the length of this mail. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Thursday, 16 June 2016 17:27:31 UTC