- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 29 Nov 2007 04:30:41 +0000
- To: www-xml-schema-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5293 ------- Comment #1 from cmsmcq@w3.org 2007-11-29 04:30 ------- I think the spec would be much simpler if the semantics of restriction were simply that if type R restricts type B, then any element governed by R must be valid against B, as well as satisfying whatever constraints are expressed by the declaration of R. Call this 'restriction as intersection'. (If a processor can prove that anything satisfying the content model of R must necessarily also satisfy the content model of B, or can generate a content model for internal use that captures just the legal sequences of children, then that may be a useful optimization at validation time.) Alas, the WG decided instead to require that R be provably subsumed by B, and to make it simple to check by imposing a few simple rules on the structure of R's content model. Somewhere along the way, the rules grew out of the description 'simple' and became the thicket of conditions and case statements and exceptions and special cases familiar to readers of 1.0. If I understand you correctly, you are proposing something similar to restriction-as-intersection, in the sense that a processor can validate by checking against both B and R, instead of attempting to prove that B subsumes R, but visibly different from it, in that it's still an error for R not to be subsumed by B -- it just happens to be an error which processors are not obliged to notice until confronted at validation time by an instance which illustrates the discrepancy). To take a simple example, if B's content model is (a,b?) and R's content model is (a,c?), then restriction-as-intersection would call R legal and the effective content model of R would be (a). Your proposal, if I understand it correctly, would say that in principle, R is not legal, although a conforming validator is not obliged to notice the fact unless the instance contains an element governed by R, with children "a c". Have I got that right? My heart still yearns for the simplicity of restriction-as-intersection, but I for one think I'd be happy to accept your proposal as a substitute, unless the WG sees the light and wants to move to r-as-i. I note in passing that if we define failure of R to be subsumed by B as an error, but an error which conforming processors are not obligated to detect, then our story about errors and error detection has changed from the story we had converged on in Seattle. There, the minutes say we had consensus on the proposition that conforming processors should be obligated to detect and report errors. The proposal here seems to entail a partition of errors into two classes: those a processor is required to detect and report, and those which a processor is allowed to fail to detect. This distinction has always made sense to me (I learned it from ISO 8879), but we should not fall into it without seeing that it's a change from both of our earlier positions (namely [1] that on error all bets are off and behavior is unconstrained, and [2] that conforming processors are obligated to detect and report all errors in schemas and schema documents and may recover from them, albeit not silently). Still, all this said, I have to say that I don't think subsumption checking is actually as bleeding-edge as the bug description suggests. In cases with large exponents, it may involve the construction of automata with large numbers of states (or the calculation of derivatives with large numbers of terms) and require a lot of tedious calculation. The algorithms cited in the spec try to make it unnecessary to unroll all the exponents, or to unroll them all the way, and while it's true that the authors in question have all published algorithms in which others have found bugs, still in the end content models define regular languages, and in the very worst case we know that you can get correct results by translating into standard representations of regular languages and using textbook algorithms from there.
Received on Thursday, 29 November 2007 04:30:48 UTC