Re: Included <schema> from C. M. Sperberg-McQueen on 2013-02-14 (xmlschema-dev@w3.org from February 2013)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 14 Feb 2013 11:41:23 -0700
To: Michael Kay <mike@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
Message-Id: <2DEB2E89-9164-4A90-BEE2-296BC37B0BC7@blackmesatech.com>
On Feb 13, 2013, at 6:09 AM, Michael Kay wrote:

> 
> On 13/02/2013 08:10, George Cristian Bina wrote:
>> Thanks Henry,
>> 
>> Regarding the lazy component construction my understanding of that was that during the schema creation when a module is read it may contain references that cannot be be resolved at that time but they should still be resolved when all the modules are read.
> The specification says that all references required for validation must be resolvable. It doesn't say which references are required for validation. I think the designers of the language had in mind that the rules should be similar to those for DTDs (or for SGML), where absent references are allowed in content models provided the element in question is not encountered.

For what it's worth, I think that that is true.

Not clear (particularly at this distance in time) is whether everyone in the
WG had the same understanding of just what it would mean concretely for
the rules to be  "similar to those for DTDs". 

In particular, I suspect there may be different views on what was intended
to happen if we have a declaration of element E with optional children A, B, C,
and D, and if we also have declarations for A, B, and C, but no declaration for 
element D.  I believe some members of the WG would have said that we
can happily validate elements like <E><A/></E> and so on, and the only
difficulty come if we encounter a D element in the document instance.  I
suspect (but cannot say for certain) that other members of the WG would
 say we cannot validate any instances of E, because we don't have a fully
checkable declaration for E.  

> But the rules in XSD are much more complicated; for example you can't tell whether "element declarations consistent" is satisified without following all the references, so if a reference is unresolved, you have to assume this constraint is violated.

This is an excellent example of the kind of complication that come into
play.  My sense of the original idea was that if a reference is unresolved,
the idea was to assume that all relevant constraints were satisfied, not
violated, since the idea was to make validation produce useful results even
in the presence of transitory network outages.  Others in the WG could
well argue otherwise -- but in that case, the decision to impose constraints
like that, and to expect a pessimistic, rather than an optimistic, behavior in
the face of unresolved references should not have been taken without considering
its impact on the (allegedly) settled design principle that network failures
should not render a document invalid nor validation impossible.  

Unfortunately, it was never possible to generate a consistent consensus in
the WG in favor of facing up to such difficulties squarely, so the day was 
carried by those in the WG who simultaneously maintained (a) that we had
satisfied the requirement that missing components not cause problems, 
and (b) that missing components must cause an error in appropriate
circumstances, (c) that there is no need to specify more precisely exactly 
what those circumstances are, and (d) that XSD's interoperability record is 
satisfactory.

The result is the sad state of affairs now seen in the XSD 1.0 and 1.1 specs,
whose rules for schema construction are simultaneously underspecified and
contradictory.  

There was a brief period during the development of XSD 1.1 when it seemed it
might be possible to set the treatment of schema construction on a better
footing by attempting to find a set of clear and coherent rules that would be
consistent with whatever behavior we found to be implemented consistently by
existing processors and whatever assumptions we could find consistently
applied in existing schema documents.  But one vendor's representative
objected to basing further work on examination of behavior by existing
implementations and insisted that we must preserve compatibility with the
existing wording of 1.0 at all costs.  (By which, as events later showed, he
apparently meant not the text of 1.0 but his understanding of what he had 
wanted during the development of 1.0, quite independent of what the text
of 1.0 actually says.)  

We did as this vendor representative demanded, but the cost was unfortunately 
to leave a large number of contradictions in place in the text and to leave
users interested in interoperability without any serious help.  

Oh, well.  Live by consensus, die by consensus.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Thursday, 14 February 2013 18:41:56 UTC