comments on bug 6014 from C. M. Sperberg-McQueen on 2009-04-14 (www-xml-schema-comments@w3.org from April to June 2009)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 14 Apr 2009 13:01:25 -0600
To: John Arwe <johnarwe@us.ibm.com>, www-xml-schema-comments@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <E38F8740-E94E-4BEF-9120-2643C15677D6@blackmesatech.com>
On 2 September 2008, John Arwe filed bug 6014
(http://www.w3.org/Bugs/Public/show_bug.cgi?id=6014).

I apologize for the length of time it has taken to draft this
response.  I'm sending it as email to the originator and to the XML
Schema comments list, to avoid overburdening Bugzilla with it.
I hope this doesn't inconvenience anyone.

First, thank you again for your reading and comments.

 > 1.5 Documentation Conventions and Terminology - deprecated

 > from: although some processors may choose to issue
 > to : although some processors MAY choose to issue

Thanks.

 > 3.12.3 Constraints on XML Representations of Type Alternatives

 >     "No <alternative> element may have more than one of these, and
 >     each must have at least one of these. "

 > No...MAY seems destined to be mis-read.  Feels like it wants to be a
 > MUST (have at most one), prefer this, or MUST NOT (have >1)

Recast.

 > 4.2.1 Conditional inclusion

 >     "Where they appear, the attributes vc:minVersion and
 >     vc:maxVersion are treated ... then the element on which the
 >     attribute appears is to be ignored"

 > Does anyone really think describing things in terms of what it's not
 > (i.e.  negatively) is better than positively?

Not as a general principle, no.  But in this case, every positive
formulation I have come up with seems to me clunkier than the negative
formulation in the spec.  (That is, reading your comment I thought
"Oh, wow, that really needs to be fixed!" and turned to the spec to
recast the paragraph.  But faced with the text, my good intentions
dried up.  It may be to do with the fact that the normal action when
reading a document is to act on what it contains; *excluding* elements
is the marked case here, and so it's the one described.

 > Realizing where you are in the process, since I think this IS
 > correct as stated (though it took several passes for me to catch the
 > not's and un-'s I missed the first time), I'd settle for a
 > non-normative summary stated in the "include if" positive sense.

OK, that I think I can do.  I've proposed that we add

     The effect is that portions of the schema document marked with
     vc:minVersion and/or vc:maxVersion are retained if
     vc:minVersion <= V < vc:maxVersion.

 > 4.2.2 Assembling a schema <include> clause 1

 >     "It is not an error for the ·actual value· of the  
schemaLocation
 >     [attribute] to fail to resolve at all, in which case the
 >     corresponding inclusion must not be performed."

 > why maintain this unpleasant dark corner, if redefine and override
 > etc all want to mandate resolution?

The original motivation was quite simple.  As Andrew Layman, then the
Microsoft rep, argued, you really don't want to say that your document
is valid now (at 11:39 a.m.) because the network is up, but becomes
invalid (without having changed) at 11:46 because the router goes down
and the schema for one of the imported namespaces can't be
dereferenced.

Redefine seems to be a different case because in order to work
correctly it can need a fairly detailed knowledge of the internals of
the schema being overridden; flagging failure to resolve as an error
seems like a better choice there.

The other operations present other mixes of reasons to want resolution
failure to be an error and reasons to want it not to be an error; the
1.0 spec made its best effort to decide them on a case by case basis.

XSD 1.1 seems to me (speaking solely for myself here) very unlikely to
change in this area, for three reasons:

   (1) Compatibility with 1.0 is recognized as a desideratum, to be
       overridden only in the interests of even more desirable goals.
       I don't see making resolution failure on import be an error
       as such an overriding goal here.  (Complete cleanup of schema
       composition and all its dark corners might be such a goal, so
       I don't rule it out completely, but this change by itself is
       not worth an incompatibility.)

   (2) If we did want to make the treatment of resolution failure more
       parallel, making existing legal schemas illegal is the more
       painful way to achieve it; it would be easier to sell if the
       only change was to eliminate some current error messages.

   (3) Quixotic though it may be, I still like the original rationale
       for allowing the resolution to fail.  If a user wants a
       different behavior from a processor, a processor can offer
       a 'fail on import resolution failure' mode without being
       non-conforming -- processors have wide latitude here.  So
       for people who want the other decision, it can be treated as
       a processor quality of service issue.

   (4) The idea of changing this behavior may well have some support
       in the WG.  But counting noses, I don't see the idea generating
       consensus.  And if there is no consensus to change the status
       quo, then the status quo remains.


 > is this just FUD masquerading as backward compatibility, or are
 > there well-known concrete scenarios that depend on this behavior?

There is plenty of FUD to go around, and this may be another instance.
I'll let you decide whether you are persuaded by the argument that
validity should not depend on the reliability of the router.


 > 4.2.4 Overriding component definitions

 > Schema Representation Constraint: Override Constraints and Semantics
 > clause 4.1.1, 4.2.1

 > "Let D2' be a <schema> information item obtained by . Then ..." ???
 > something missing 'by .'

 > 4.2.4 Overriding component definitions Schema Representation
 > Constraint: Override Constraints and Semantics clause 4.1.2

 > "Note: One effect of the rule just given..." I cannot tell, due to
 > previous comment's missing text

I think this was a markup snafu in the draft of 20 June 2008, owing to
a clerical error (mea culpa).  The Last Call draft of 30 January has
the missing text:

     4.1.1 Let D2′ be a <schema> information item obtained by
     performing on D2 the transformation specified in Transformation
     for xs:override (§G.2). Then D2′ corresponds to a conforming
     schema (call it S2).

Thank you for catching this.

 > 4.2.4 Overriding component definitions

 > Schema Representation Constraint: Override Constraints and Semantics
 > clause 4.1.2

 > "Note: Another effect is..."  You handle A <override> B <override> C

 > It's not clear if there is deterministic behavior in the Y case:
 > both B and C <override> A with conflicting specifications.

The intent is to have deterministic behavior -- at least, to the
extent that the existing spec, without 'override', has it.  In this
case, if I have understood the example I think the result is clear.

Assume schema document A defines an element E with type xsd:anyURI,
and schema documents B and C redefine that element as having type
xsd:boolean and xsd:decimal respectively (no built-in type with an
initial 'C', sorry, 'deCimal' is the best I can do).

I am creating a document starting from schema document D, which does
nothing but include schema documents B and C.

(First question:  is this your scenario?)

Then the effect of the transformations prescribed by the spec are that
B corresponds to the same schema as B', where B' is just like B except
that instead of overriding A it includes A', a schema document just
like A except that the definition of E

   <element name="E" type="anyURI"/>

is replaced by the one specified in the override of A by B:

   <element name="E" type="boolean"/>

Analogously, C generates the same schema by overriding A as it would
be including A'', which contains

   <element name="E" type="decimal"/>

The effect on D is straightforward: by including B, it acquires a
top-level element E of type boolean, and by including C, it acquires a
top-level E of type decimal.  This pair violates the rule that there
can only be one top-level E.  Blammo.  Er, I mean, not a legal schema.

(Sorry, I'm saying top-level not global, because I haven't
internalized the rule that says 'top-level' just applies to the XML; I
think of it as a synonym for 'global'.)

 > <include> clause 3.1.2 in particular, if 2.1 fires, seems to tell me
 > I cannot "build up" a ns with components from several
 > non-overlapping <schema> items...since I think this is possible
 > today, not convinced I'm reading it right.

It shouldn't be telling you anything of the kind.  The 'with the
possible exception of the schema component' is just to avoid a problem
with the previous wording.  The spec used to say that if A included B,
then the schema for A included all the components of the schema for B.
But in the simple case (where B does not also include A), the schema
for B includes not just elements and types and so on, but also a
schema-description (or schema-as-a-whole) component of the kind
described in section 3.17.  Conclusion: the schema for A contains two
schema-description components.  Which is not allowed.  Blammo, illegal
schema.  Clearly not what was intended by 1.0, clearly not what
implementors have implemented -- but very clearly what the spec said.
So 1.1 excludes the schema component.  (The reference to a 'possible'
exception covers the case where A includes B and B includes A, where a
sufficiently devious reader can argue that there must be a kind of
fix-point semantics going on here, and so the schema for A and the
schema for B are the same schema, with the same schema-description
component.  And in that case, you can't say the schema for A includes
all the components in the schema for B except for the
schema-description component, because it does include that one, too.


 > It also sounds like it prescribes an order of processing, but I had
 > the impression that different orders were permissible (lazy
 > retrieval strategies) which seems to conflict with that impression.

It describes logical dependencies, but not strictly speaking an order
of processing, in the same way that the usual rules of arithmetic say
that (a + b) * (c + d) define the dependency of the multiplication
upon the sums which are its argument.  The usual way to handle those
dependencies is to perform the sums and then the multiplication, but
if you can't do that (because you're a compiler and you know the
values of c and d because they are constants, but not of a and b which
are variables) you may choose instead to transform the expression into
a * k + b * k where k = c + d.

In the case of override, if you have A include B, A include C, B
override D, C override E, E include F, you certainly have the choice
of taking B and C in either order, if and only if you currently have
that choice.

 > 5.2 Assessing Schema-Validity - strict

 > ..."if they do not identify any declaration or definition, then no
 > schema-validity assessment is performed. "

 > This appears to say that the result is implementation-dependent.  I
 > rather expected some prescribed output, either an error or values
 > for [validation attempted] etc.

Well, there is certainly a problem here.  The accompanying text says
the PSVI produced by lax wildcard validation and strict wildcard
validation is the same -- which means "no" should change to "lax" or
else the note should change.  (Hmm.  A one-word change or rewrite the
entire note. Wonder which I'll choose?)

But the analogy with strict wildcard handling is in fact exact.  If
the parent element E has a strict wildcard in its content model, and
the child element F matches that wildcard but has no declaration, then
N.B. F is not strictly validated (it can't be, we don't have a
declaration) and is not marked invalid (it can't be, we haven't
validated F, how can we mark F invalid?).  The invalidity is in the
parent E, which is supposed to have only properly declared children,
but has been caught red-handed trying to smuggle F in over the border.
Blammo.  Document is invalid, but it's E not F.

If F is the validation root of a validation in strict-wildcard mode,
then by analogy it is not F but the parent of F which needs to be
marked invalid.  F has no parent in the validation episode; the
closest we have to a parent is the calling application, which needs to
be told "you expected F to be declared -- it wasn't.  Here's the PSVI
I got, by the way".  Hence the expectation in the second paragraph of
the note that the invoking process will report an error to its
environment.

I hope this helps.  It would be nice if the spec were less dense here,
but since we wish NOT to constrain how processors face the world, we
don't have a lot of concrete information to go on.  Even the mention
of an invoking application strikes some WG members as a bit risqué.

On the plus side, it's a great topic for a blog entry or several.  And
maybe a conference poster.

The wording changes proposed in response to the comments above can be
seen in context at

     http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b6014.html

There are four of them, and the proposal elides unchanged sections.  I
hope the changes, and the comments above for points on which no change
in wording is proposed, will resolve the issue(s) to your
satisfaction.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Tuesday, 14 April 2009 19:02:07 UTC