- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 25 Feb 2011 18:26:20 -0700
- To: Michael Kay <mike@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xml-schema-comments@w3.org
On Feb 25, 2011, at 11:59 AM, Michael Kay wrote: > >> Should one take the filing of this bug report as a signal that you >> do not find the analysis in >> >> http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2011Feb/0014.html >> >> persuasive? >> > > I think what it boils down to is how one reads this: > > If a schema document P contains an <override> element E pointing to some schema document Q, then schema(P) contains not only the components in immed(P), but also the components in schema(override(E,Q)) > > My reading is that this definition is circular with no terminating condition. That's certainly a plausible reading. What I do not see is how you can read this sentence as having no termination condition and the corresponding sentence for cyclic inclusion as having a terminating condition. If a ·schema document· D1 contains one or more <include> elements, then schema(D1) contains not only immed(D1) but also all the components of schema(D2), for each ·schema document· D2 identified by an <include> element child of D1. When D2 in turn has an inclusion pointing to D1, this amounts to saying that schema(D1) is defined in terms of schema(D2) and schema(D2) is defined in terms of schema(D1). This is also clearly circular, and also has no obvious terminating condition. Some implementations have said "if schema(D1) includes schema(D2) and schema(D2) includes schema(D1), then they must be the same schema", which does seem to follow from basic set theory. (Is set theory applicable to components? It must be, given that the spec says that schemas are sets of components, but it's hard to apply it in some cases, since set theory relies on identity being a primitive notion but the spec defines no identity criteria for components.) In operational terms, that amounts to saying to a stack-based processor: if you see an inclusion for a document that's already been read, or that's already on the open-and-being-read stack, then you don't need to add it to the stack again. Or saying to a queue-based processor: if you see an inclusion for a document that's already in the queue to be read, you don't need to add it again. (Note in passing that this is a safe decision only in cases where the order in which schema documents are processed does not matter. Neither the 1.0 nor the 1.1 spec addresses the question directly, and processors are predictably not uniform in their treatment of it. In cases where order does matter, this does not seem to be at all a safe way to resolve the question.) > But MSMcQ appears able to read it in a sense where it's somehow obvious that you stop when you reach closure. I'm not sure whether or not that latter reading depends on the paragraph "If applying the override transformation specified in Transformation for xs:override (§F.2) to D and E results in a schema document equivalent to D" to supply a terminating condition. Also, I'm not sure whether or not that paragraph is part of a Note, and therefore non-normative. Bear in mind that I'm not arguing from the text of the spec. As it happens, I believe I remember the design issues we addressed when we decided to include override processing, and I'm arguing from the design we adopted as I understand it. I'm not really particularly interested in saving the current wording or deciphering what it means; I'm interested in getting WG consensus on a design -- either by reminding people of what we thought things meant when we adopted the wording we've got (it's not as though we didn't explicitly discuss cycles then!) or by getting a coherent design now. Once we have agreement on a design, then we should make sure the wording of the spec reflects that design and conveys it to readers. It's a long-established principle in the WG, however, that "the status quo" is defined as the words of the spec. So if I'm the only one in the WG who has any sense of "the design we adopted" as something independent of the words of the spec, or finds it helpful to review our earlier discussions of circularity (and on this question it feels as if I were the only one), then perhaps I should just shut up about it. For what it's worth, I believe from the layout (the paragraph is indented, and there is no textual construct around that would cause that indenting other than the note) that the paragraph you quote is in a note. (Looking at the source, I find that the source is explicit that the paragraph is part of a note.) I think it's correctly in a note because I think the consequence it states follows without further ado from the normative statements in Schema Representation Constraint: Override Constraints and Semantics and our general rules about multiple inclusions. > > I think MSMcQ is reading it rather like one might read the statement "The income of a married person is to be taken as including the income of their spouse". It would take a moron or a computer (or a particularly evil taxman) to try to evaluate this as income(John) + income(wife(John)) + income(husband(wife(John))) + .... Trouble is, a computer is the only tool I have, and that's how it's interpreting it. I don't deny that this is a potential issue. I raised this issue myself, in email to you and others, which is archived at http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0058.html and which was called to the WG's attention in http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2009Jan/0004.html See in particular section 3.8 Case 8 Cyclic Inclusions in http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html and cases ST-3 and ST-4 in http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0054.html As I read the decision record, we decided not to outlaw cycles in override, and to make explicit that cycles are legal both in override and include, after we observed that the cycle can be detected, most simply in the case of cyclic include by keeping track of the URIs being dereferenced and in the tax case by knowing that for all X, husband(wife(X)) is either X or null, and similarly wife(husband(X)) is either X or null for all X. In the case of overrides, keeping track of literal URIs is not sufficient, since what is to be included is not the document at the URI but a preprocessed form of that document. In particular, for all URIs U, V and all elements E, F, if U = V and E = F (I assume fairly simple deep equality as the meaning of '=' for elements, here), then override(E,document(U)) = override(F,document(V)) and if you are about to include one of them and detect that the other has already been included, or is on your read stack if you have one, or on your read queue if you have one, then you don't need to do anything. I believe this is exactly analogous to detecting that document(U) is already on your stack, in your queue, or in your cache, and therefore that <xsd:include schemaLocation="V"/> is a nop. > > So, where do I inject the intelligence that tells the computer when to stop? It isn't just a question of recognizing that husband(wife(John)) is John (the identity problem). It's also a question of recognizing that we are evaluating a union (and not, for example, a sum), and that therefore repeated evaluation of the same function has no effect on the outcome: the rule we are instinctively applying is union(income(S), income(S)) = income(S), not sum(income(S), income(S)) = 2*income(S). If we take the words of XSD 1.0 and 1.1 seriously, then all schema composition by means of include, import, or redefine involves union and pretty much nothing else. The spec says that a schema is a set of components; adding anything to a set involves union, no? > > With this understanding, I think I can now read the sentence above and see how I was supposed to read it. Next question: how can I rephrase it so we don't put all our readers through the same agony? The best way to describe our schema composition story clearly would be (1) to have a coherent story, and (2) to delete section 4 in its entirety and start over. Unfortunately, the WG has declined even to try to achieve (1), and has explicitly turned down proposals to do (2). The wording of section 4.2.5 on override is opaque, badly worded, and unhelpful because it has been carefully constructed to be parallel to the opaque, badly worded, and unhelpful words used to describe include, import, and redefine. I think the consequence is that nothing we do can possibly avoid at least a certain amount of pain for the reader. If we can find any way to reduce the pain, I will favor it. But band-aids will not fix broken bones or stop arterial bleeding. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Saturday, 26 February 2011 01:26:56 UTC