Re: [Bug 12184] New: Circularity in xs:override from C. M. Sperberg-McQueen on 2011-02-26 (www-xml-schema-comments@w3.org from January to March 2011)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 25 Feb 2011 18:26:20 -0700
To: Michael Kay <mike@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xml-schema-comments@w3.org
Message-Id: <0934840A-26EC-4AA8-9814-BFA57407B7C9@blackmesatech.com>
On Feb 25, 2011, at 11:59 AM, Michael Kay wrote:

> 
>> Should one take the filing of this bug report as a signal that you
>> do not find the analysis in
>> 
>>  http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2011Feb/0014.html
>> 
>> persuasive?
>> 
> 
> I think what it boils down to is how one reads this:
> 
> If a schema document P contains an <override> element E pointing to some schema document Q, then schema(P) contains not only the components in immed(P), but also the components in schema(override(E,Q))
> 
> My reading is that this definition is circular with no terminating condition.

That's certainly a plausible reading.  

What I do not see is how you can read this sentence as having no termination
condition and the corresponding sentence for cyclic inclusion as having a 
terminating condition.

    If a ·schema document· D1 contains one or more <include> elements, 
    then schema(D1) contains not only immed(D1) but also all the components 
    of schema(D2), for each ·schema document· D2 identified by an <include> 
    element child of D1.

When D2 in turn has an inclusion pointing to D1, this amounts to saying
that schema(D1) is defined in terms of schema(D2) and schema(D2) is
defined in terms of schema(D1). This is also clearly circular, and also has
no obvious terminating condition.

Some implementations have said "if schema(D1) includes schema(D2)
and schema(D2) includes schema(D1), then they must be the same schema",
which does seem to follow from basic set theory. (Is set theory applicable 
to components?  It must be, given that the spec says that schemas are 
sets of components, but it's hard to apply it in some cases, since set 
theory relies on identity being a primitive notion but the spec defines no 
identity criteria for components.)

In operational terms, that amounts to saying to a stack-based processor: 
if you see an inclusion for a document that's already been read, or that's
already on the open-and-being-read stack, then you don't need to add it to
the stack again.  Or saying to a queue-based processor:  if you see an
inclusion for a document that's already in the queue to be read, you don't
need to add it again.

(Note in passing that this is a safe decision only in cases where the order
in which schema documents are processed does not matter.  Neither the
1.0 nor the 1.1 spec addresses the question directly, and processors are
predictably not uniform in their treatment of it.  In cases where order does
matter, this does not seem to be at all a safe way to resolve the question.)

 
> But MSMcQ appears able to read it in a sense where it's somehow obvious that you stop when you reach closure. I'm not sure whether or not that latter reading depends on the paragraph "If applying the override transformation specified in Transformation for xs:override (§F.2) to D and E results in a schema document equivalent to D" to supply a terminating condition. Also, I'm not sure whether or not that paragraph is part of a Note, and therefore non-normative.

Bear in mind that I'm not arguing from the text of the spec.  As it happens,
I believe I remember the design issues we addressed when we decided to 
include override processing, and I'm arguing from the design we adopted as I
understand it.  I'm not really particularly interested in saving the current
wording or deciphering what it means; I'm interested in getting WG consensus
on a design -- either by reminding people of what we thought things meant
when we adopted the wording we've got (it's not as though we didn't
explicitly discuss cycles then!) or by getting a coherent design now.  Once
we have agreement on a design, then we should make sure the wording of
the spec reflects that design and conveys it to readers.

It's a long-established principle in the WG, however, that "the status quo"
is defined as the words of the spec.  So if I'm the only one in the WG who
has any sense of "the design we adopted" as something independent of the
words of the spec, or finds it helpful to review our earlier discussions of
circularity (and on this question it feels as if I were the only one), then 
perhaps I should just shut up about it.

For what it's worth, I believe from the layout (the paragraph is indented, 
and there is no textual construct around that would cause that indenting other
than the note) that the paragraph you quote is in a note.  (Looking at the
source, I find that the source is explicit that the paragraph is part of a
note.)  I think it's correctly in a note because I think the consequence it 
states follows without further ado from the normative statements in Schema 
Representation Constraint: Override Constraints and Semantics and
our general rules about multiple inclusions.

> 
> I think MSMcQ is reading it rather like one might read the statement "The income of a married person is to be taken as including the income of their spouse". It would take a moron or a computer (or a particularly evil taxman) to try to evaluate this as income(John) + income(wife(John)) + income(husband(wife(John))) + .... Trouble is, a computer is the only tool I have, and that's how it's interpreting it.

I don't deny that this is a potential issue.  I raised this issue myself, in email
to you and others, which is archived at 

   http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html
   http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html
   http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0058.html

and which was called to the WG's attention in 

   http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2009Jan/0004.html

See in particular section 3.8 Case 8 Cyclic Inclusions in 
http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html
and cases ST-3 and ST-4 in 
http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0054.html

As I read the decision record, we decided not to outlaw cycles in
override, and to make explicit that cycles are legal both in override
and include, after we observed that the cycle can be detected, 
most simply in the case of cyclic include by keeping track of the URIs
being dereferenced and in the tax case by knowing that for all X,
husband(wife(X)) is either X or null, and similarly wife(husband(X)) 
is either X or null for all X.

In the case of overrides, keeping track of literal URIs is not sufficient, since
what is to be included is not the document at the URI but a preprocessed form
of that document.  In particular, for all URIs U, V and all elements E, F, if
U = V and E = F (I assume fairly simple deep equality as the meaning 
of '=' for elements, here), then override(E,document(U)) =
override(F,document(V)) and if you are about to include one of them
and detect that the other has already been included, or is on your 
read stack if you have one, or on your read queue if you have one,
then you don't need to do anything.  

I believe this is exactly analogous to detecting that document(U) is
already on your stack, in your queue, or in your cache, and therefore
that <xsd:include schemaLocation="V"/> is a nop.



> 
> So, where do I inject the intelligence that tells the computer when to stop? It isn't just a question of recognizing that husband(wife(John)) is John (the identity problem). It's also a question of recognizing that we are evaluating a union (and not, for example, a sum), and that therefore repeated evaluation of the same function has no effect on the outcome: the rule we are instinctively applying is union(income(S), income(S)) = income(S), not sum(income(S), income(S)) = 2*income(S).

If we take the words of XSD 1.0 and 1.1 seriously, then all schema composition
by means of include, import, or redefine involves union and pretty much nothing 
else.  

The spec says that a schema is a set of components; adding anything to a set
involves union, no?

> 
> With this understanding, I think I can now read the sentence above and see how I was supposed to read it. Next question: how can I rephrase it so we don't put all our readers through the same agony?

The best way to describe our schema composition story clearly
would be (1) to have a coherent story, and (2) to delete section
4 in its entirety and start over.  Unfortunately, the WG has declined
even to try to achieve (1), and has explicitly turned down proposals
to do (2).  The wording of section 4.2.5 on override is opaque,
badly worded, and unhelpful because it has been carefully 
constructed to be parallel to the opaque, badly worded, and
unhelpful words used to describe include, import, and redefine.

I think the consequence is that nothing we do can possibly avoid
at least a certain amount of pain for the reader.  

If we can find any way to reduce the pain, I will favor it.  But 
band-aids will not fix broken bones or stop arterial bleeding.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Saturday, 26 February 2011 01:26:56 UTC