A brief writeup of my validate-twice idea

Further to the discussion which sparked this idea, at Mandelieu [1],
here's the abstract of my writeup of this as a late-breaking talk at
XML Europe 2004 next week:

Versioning made easy with W3C XML Schema and Pipelines


There is a great deal of interest at the moment in managing evolution
and versioning of XML document types, for Web Services and other
application areas.  Eduardo Gutentag and Arofan Gregory, in their work
on versioning for UBL, have developed a very powerful methodology for
managing evolution using W3C XML Schema [1].

David Orchard, in his draft TAG finding on versioning and
extensibility [2], has concentrated on usage scenarios for which the
UBL approach is inappropriate, because the UBL approach requires all
document consumers to use up-to-date schemas.  That is, the UBL
approach describes how an application designed to handle version 1
documents can handle version 2 documents, _but_ it requires the
application to use the version 2 schema to validate the version 2
documents.  Orchard's scenarios on the other hand assume that a version
1 application either cannot or will not use anything other than a version
1 schema.  He also would prefer a 'passive' approach to versioning,
that is, one in which the version 1 schema does not contain any
explicit provision for extensibility such as wildcards.  What he would
like is an approach which allowed version 2 documents which differed
from version 1 documents only in that they contain _additional_
content (perhaps only at the end of content models) to be successfully
processed by version 1 applications none-the-less.  This kind of
scenario does indeed seem to be one likely to occur often as Web
Services are deployed and begin to evolve.

Taken together, Orchard's two requirements seem to render the problem
unsolvable without requiring special-purpose processing of the outcome
of validation -- processing which would have to interrogate the PSVI
(Post Schema-Validation Infoset) from version 2 documents (that is, in
practice, any document which failed validation with a version 1
schema) in detail to detect whether the failure was a real version 1
failure, or whether there was simply extraneous material which could
safely be ignored.  Such special processing would have to recapitulate
virtually all of schema content model validation, which seems a
particularly wasteful duplication of effort.

In this paper I present an solution to this problem which requires no
special processing, and demonstrate an implementation using Markup
Technology's implementation of the Sun XML Pipeline language [3].
This approach consists of a validation step, a step which strips out
all elements whose declarations were not found during schema
validation, and a further validation step.  Because the pipeline is
compiled and run as a whole by the pipeline engine, the double
validation is very efficient.

Not all schemas are suitable for use in this way -- I discuss the
design recommendations schema authors should follow to ensure this
will work properly, namely avoiding local element declarations, and
adding material only at the end (these trade off to a certain extent,
in fact).

Finally, I discuss the relevance of this approach to possible changes
to the interpretation of local element declarations in version 1.1 of
the W3C XML Schema specification -- making a change to interpreting
local element declarations more as declarations at the level of their
containing type definition, which can (indeed must) then be referenced
in the same way global declarations are referenced, would make the
validate-twice-with-intermediate-surgery approach cover a much wider
range of schemas.

[1] http://www.idealliance.org/papers/dx_xml03/papers/04-04-04/04-04-04.html
[2] http://www.w3.org/2001/tag/doc/versioning
[3] http://www.markup.co.uk/XML2003.html

[1] http://lists.w3.org/Archives/Public/www-ws-desc/2004Mar/0038.html
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Friday, 9 April 2004 10:01:45 UTC