[Bug 29206] New: [xslt30] Streamed validation from bugzilla@jessica.w3.org on 2015-10-16 (public-qt-comments@w3.org from October 2015)

From: <bugzilla@jessica.w3.org>
Date: Fri, 16 Oct 2015 09:41:26 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-29206-523@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=29206

            Bug ID: 29206
           Summary: [xslt30] Streamed validation
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSLT 3.0
          Assignee: mike@saxonica.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org
  Target Milestone: ---

It has been pointed out that we ought to say something about streamed schema
validation. I proposed some text to add to section 2.10 at
https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0011.html; Michael
Sperberg-McQueen commented on this at
https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0012.html. Taking
these comments into account, I propose to add the following text:


Streaming can be combined with schema-aware processing: that is, the streamed
input to a transformation can be subjected to on-the-fly validation, a process
which typically accepts an input stream from the XML parser and delivers an
output stream (of type-annotated nodes) to the transformation processor. The
XSD specification is designed so that validation is, with one or two
exceptions, a streamable process. The exceptions include:

* There may be a need to allocate memory to hold keys, in order to enforce
uniqueness and referential integrity constraints (xs:unique, xs:key,
xs:keyref).

* In XSD 1.1, assertions can be defined by means of XPath expressions. These
are not constrained to be streamable; in the general case, any subtree of the
document that is validated using an assertion may need to be buffered in memory
while the assertion is processed.

Applications that need to run in finite memory may therefore need to avoid
these XSD features, or to use them with care.

XSD is designed so that the intended type of an element (the "governing type")
can be determined as soon as the start tag of the element is encountered: the
process of validation checks whether the content of the element actually
conforms to this type, and by the time the end tag is encountered, the process
will have established either that the element is valid against the governing
type, or that it is invalid. 

By default, dynamic errors occurring during streamed processing are fatal: they
typically cause the transformation to fail immediately. XSLT 3.0 introduces the
ability to catch dynamic errors and recover from them. Schema invalidity,
however, is treated as a dynamic error occurring in the instruction that
processes an entire input stream, so after a validation failure, no further
processing of that input stream is possible.

In consequence, a streamed validator that is running in tandem with a streamed
transformation can present the transformer with element nodes that carry a
provisional type annotation representing the type that the element will have if
it turns out to be valid. As soon as a node is encountered that violates this
assumption, the validator should stop the flow of data to the transformer, so
that the transformer never sees invalid data. This allows the stylesheet code
to be compiled with the assumption of type-safety: at run-time, all nodes seen
by the transformation will conform to their XSLT-declared types (for example, a
type declared implicitly using <code>match="schema-element(invoice)"</code> on
an xsl:template element).

A streamed transformation that only accesses part of the input document (for
example, a header at the start of a document) is not required to read the
entire document once the data it requires has been read. This means that XML
well-formedness or validity errors occurring in the unread part of the input
stream may go undetected.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Friday, 16 October 2015 09:41:30 UTC