[Bug 5293] Subsumption from bugzilla@wiggum.w3.org on 2007-11-29 (www-xml-schema-comments@w3.org from October to December 2007)

From: <bugzilla@wiggum.w3.org>
Date: Thu, 29 Nov 2007 04:30:41 +0000
To: www-xml-schema-comments@w3.org
CC:
Message-Id: <E1Ixb2v-0000mK-CE@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5293





------- Comment #1 from cmsmcq@w3.org  2007-11-29 04:30 -------
I think the spec would be much simpler if the semantics of restriction
were simply that if type R restricts type B, then any element governed
by R must be valid against B, as well as satisfying whatever constraints
are expressed by the declaration of R.  Call this 'restriction as
intersection'.  (If a processor can prove that anything satisfying the 
content model of R must necessarily also satisfy the content model of B, 
or can generate a content model for internal use that captures just the
legal sequences of children, then that may be a useful optimization at 
validation time.)  Alas, the WG decided instead to require that R be
provably subsumed by B, and to make it simple to check by imposing a few
simple rules on the structure of R's content model.  Somewhere along the
way, the rules grew out of the description 'simple' and became the 
thicket of conditions and case statements and exceptions and special
cases familiar to readers of 1.0.

If I understand you correctly, you are proposing something similar to
restriction-as-intersection, in the sense that a processor can validate 
by checking against both B and R, instead of attempting to prove that B 
subsumes R, but visibly different from it, in that it's still an error 
for R not to be subsumed by B -- it just happens to be an error
which processors are not obliged to notice until confronted at validation
time by an instance which illustrates the discrepancy).  

To take a simple example, if B's content model is (a,b?) and R's content 
model is (a,c?), then restriction-as-intersection would call R legal and 
the effective content model of R would be (a).  Your proposal, if I 
understand it correctly, would say that in principle, R is not legal, 
although a conforming validator is not obliged to notice the fact unless 
the instance contains an element governed by R, with children "a c".
Have I got that right?

My heart still yearns for the simplicity of restriction-as-intersection,
but I for one think I'd be happy to accept your proposal as a
substitute, unless the WG sees the light and wants to move to r-as-i.

I note in passing that if we define failure of R to be subsumed by B as
an error, but an error which conforming processors are not obligated to
detect, then our story about errors and error detection has changed from
the story we had converged on in Seattle.  There, the minutes say we had
consensus on the proposition that conforming processors should be obligated
to detect and report errors.  The proposal here seems to entail a partition
of errors into two classes:  those a processor is required to detect and
report, and those which a processor is allowed to fail to detect.  This
distinction has always made sense to me (I learned it from ISO 8879),
but we should not fall into it without seeing that it's a change from both of
our earlier positions (namely [1] that on error all bets are off and 
behavior is unconstrained, and [2] that conforming processors are obligated
to detect and report all errors in schemas and schema documents and may
recover from them, albeit not silently).

Still, all this said, I have to say that I don't think subsumption checking
is actually as bleeding-edge as the bug description suggests.  In cases with
large exponents, it may involve the construction of automata with large 
numbers of states (or the calculation of derivatives with large numbers of
terms) and require a lot of tedious calculation.  The algorithms cited in 
the spec try to make it unnecessary to unroll all the exponents, or to 
unroll them all the way, and while it's true that the authors in question 
have all published algorithms in which others have found bugs, still 
in the end content models define regular languages, and in the very worst 
case we know that you can get correct results by translating into standard 
representations of regular languages and using textbook algorithms from there.
Received on Thursday, 29 November 2007 04:30:48 UTC