[Bug 29889] [xslt30] Add clarifications on stylesheet invocation options

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29889

--- Comment #1 from Michael Kay <mike@saxonica.com> ---
<div2 id="streaming-non-xml" diff="add" at="T-bug29889"> <head>Streaming of
non-XML data</head> 

<p>The facilities in this specification designed to enable large data sets to
be processed in a streaming manner are oriented almost entirely to XML data.
This does not mean that there is never a requirement to stream non-XML data, or
that the Working Group has ignored this requirement; rather, the Working Group
has concluded that for the most part, streaming of non-XML data can be achieved
by implementations without the need for specific language features in XSLT.</p> 

<p>To make streamed processing of unparsed text files easier, the function
<xfunction>unparsed-text-lines</xfunction> has been introduced. This is not
only more convenient for stylesheet authors than reading the entire input using
the <xfunction>unparsed-text</xfunction> and then tokenizing the result, it is
also easier for implementations to optimize, allowing each line of text to be
discarded from memory after it has been processed.</p> 

<p>For all functions that access external data, including
<function>document</function>, <xfunction>doc</xfunction>,
<xfunction>collection</xfunction>, <xfunction>unparsed-text</xfunction>,
<xfunction>unparsed-text-lines</xfunction>, and (in XPath 3.1) <xfunction
spec="FO31">json-doc</xfunction>, the requirements on determinism can now be
relaxed using <termref def="dt-implementation-defined"/> configuration options.
This is significant because it means that when a transformation reads the same
external resource more than once, it becomes legitimate for the contents of the
resource to be different on different invocations, and this eliminates the need
for the processor to cache the contents of the resource in memory.</p> 

<p>In the XDM data model, every value is a sequence, and (as with most
functional programming languages), processing of sequences of items is
pervasive throughout the XSLT and XPath languages and their function library.
Good performance of a functional programming language often depends on
sequence-based operations being pipelined, and being evaluated in a lazy
fashion (that is, many operations process items in a sequence one at a time, in
order; and many operations can deliver a result without processing the entire
sequence). The semantics of XSLT and XPath permit pipelined and lazy evaluation
(for example, the error handling semantics are carefully written to ensure
this), but they do not require it: the details are left to implementations.
Pipelined processing of a sequence is not the same thing as streamed processing
of a tree, and where the XSLT specification talks of operations being
"guaranteed streamable", this is always referring to processing of trees, not
of sequences.</p> 

<p>The facilities for streaming of XML trees include operations such as
<xfunction>copy-of</xfunction> and <xfunction>snapshot</xfunction> which are
able to take a sequence of streamed nodes as input,  and produce a sequence of
in-memory (unstreamed) nodes as output. It is also possible to generate a
sequence of strings or other atomic values through the process of atomization.
The actual memory usage of a streamed XSLT application may depend significantly
on whether the processing of the resulting sequence of in-memory nodes or
atomic values is pipelined or not. The specification, however, has nothing to
say on this matter: it is considered an area where implementors can exercise
their discretion and ingenuity.</p> 

<p>Streaming of JSON input receives little attention in this specification. One
can envisage an implementation of the <function>json-to-xml</function> function
in which the XML delivered by the function consists of streamed nodes; but the
Working Group has not researched the feasibility of such an implementation in
any detail.</p> 

</div2>

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Friday, 30 September 2016 11:55:22 UTC