- From: <bugzilla@jessica.w3.org>
- Date: Fri, 30 Sep 2016 11:54:59 +0000
- To: public-qt-comments@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=29889 --- Comment #1 from Michael Kay <mike@saxonica.com> --- <div2 id="streaming-non-xml" diff="add" at="T-bug29889"> <head>Streaming of non-XML data</head> <p>The facilities in this specification designed to enable large data sets to be processed in a streaming manner are oriented almost entirely to XML data. This does not mean that there is never a requirement to stream non-XML data, or that the Working Group has ignored this requirement; rather, the Working Group has concluded that for the most part, streaming of non-XML data can be achieved by implementations without the need for specific language features in XSLT.</p> <p>To make streamed processing of unparsed text files easier, the function <xfunction>unparsed-text-lines</xfunction> has been introduced. This is not only more convenient for stylesheet authors than reading the entire input using the <xfunction>unparsed-text</xfunction> and then tokenizing the result, it is also easier for implementations to optimize, allowing each line of text to be discarded from memory after it has been processed.</p> <p>For all functions that access external data, including <function>document</function>, <xfunction>doc</xfunction>, <xfunction>collection</xfunction>, <xfunction>unparsed-text</xfunction>, <xfunction>unparsed-text-lines</xfunction>, and (in XPath 3.1) <xfunction spec="FO31">json-doc</xfunction>, the requirements on determinism can now be relaxed using <termref def="dt-implementation-defined"/> configuration options. This is significant because it means that when a transformation reads the same external resource more than once, it becomes legitimate for the contents of the resource to be different on different invocations, and this eliminates the need for the processor to cache the contents of the resource in memory.</p> <p>In the XDM data model, every value is a sequence, and (as with most functional programming languages), processing of sequences of items is pervasive throughout the XSLT and XPath languages and their function library. Good performance of a functional programming language often depends on sequence-based operations being pipelined, and being evaluated in a lazy fashion (that is, many operations process items in a sequence one at a time, in order; and many operations can deliver a result without processing the entire sequence). The semantics of XSLT and XPath permit pipelined and lazy evaluation (for example, the error handling semantics are carefully written to ensure this), but they do not require it: the details are left to implementations. Pipelined processing of a sequence is not the same thing as streamed processing of a tree, and where the XSLT specification talks of operations being "guaranteed streamable", this is always referring to processing of trees, not of sequences.</p> <p>The facilities for streaming of XML trees include operations such as <xfunction>copy-of</xfunction> and <xfunction>snapshot</xfunction> which are able to take a sequence of streamed nodes as input, and produce a sequence of in-memory (unstreamed) nodes as output. It is also possible to generate a sequence of strings or other atomic values through the process of atomization. The actual memory usage of a streamed XSLT application may depend significantly on whether the processing of the resulting sequence of in-memory nodes or atomic values is pipelined or not. The specification, however, has nothing to say on this matter: it is considered an area where implementors can exercise their discretion and ingenuity.</p> <p>Streaming of JSON input receives little attention in this specification. One can envisage an implementation of the <function>json-to-xml</function> function in which the XML delivered by the function consists of streamed nodes; but the Working Group has not researched the feasibility of such an implementation in any detail.</p> </div2> -- You are receiving this mail because: You are the QA Contact for the bug.
Received on Friday, 30 September 2016 11:55:22 UTC