[Bug 5293] Subsumption from bugzilla@wiggum.w3.org on 2007-11-28 (www-xml-schema-comments@w3.org from October to December 2007)

From: <bugzilla@wiggum.w3.org>
Date: Wed, 28 Nov 2007 10:16:09 +0000
To: www-xml-schema-comments@w3.org
CC:
Message-Id: <E1IxJxh-0006Np-N0@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5293

           Summary: Subsumption
           Product: XML Schema
           Version: 1.1 only
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Structures: XSD Part 1
        AssignedTo: cmsmcq@w3.org
        ReportedBy: mike@saxonica.com
         QAContact: www-xml-schema-comments@w3.org


XSDL 1.1 defines subsumption by intent: D is a valid restriction of B if all
instances of D are valid instances of B.

On the whole I think this is a good thing; it encourages implementations to
devise the best possible algorithms for determining subsumption, in contrast to
1.0 which required implementations to reproduce an algorithm that was known to
be imperfect.

I wonder, however, whether we have set the bar too high. Consider a case like:

B: a(bc?)|b(ac?)|b(ca)|d|e|fc?

D: a&b

It is fairly easy to see by inspection that B subsumes D. But a glance at the 
references cited in the spec shows that the algorithms for doing this in the
general case are extremely complex. In fact, they seem to be right at the
bleeding edge of computer science; from reading the papers I would not be 100%
convinced that a complete and provably correct algorithm is known.

And even where an algorithm exists, then (a) it may have performance
characteristics that make it impractical, (b) it is extremely difficult to
test, and (c) the more complex cases that it has to deal with will never be
encountered in real life. 

At present a product is non-conformant it if fails to detect that B subsumes D
in the above case (and in similar but much more complex cases). I think this is
too demanding.

I think there is an alternative which is much more practical from an
engineering viewpoint. We should simply say that if the implementation is
unable to determine statically (one way or the other) whether or not B subsumes
D, then it must validate instances at run-time against both. It's then up to
implementations to decide how much effort to invest in static analysis, and
although some implementations may fail to detect some invalid schemas, they
will still agree on the set of valid instances.

Note that this is very similar to the solution we have adopted for handling
conditional type assignment, and it is also (in effect) the way we handle
assertions. So it's not something radically new.

Received on Wednesday, 28 November 2007 10:16:17 UTC