- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 11 Mar 2011 14:22:11 -0700
- To: ht@inf.ed.ac.uk (Henry S. Thompson)
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xml-schema-comments@w3.org
On Mar 11, 2011, at 10:47 AM, Henry S. Thompson wrote: >> ... > > In this connection, rereading the status quo section 4.2.5 [1], I am > struck by a apparent (?) contradiction between the newly added text, > beginning "If the above definition is naively translated. . ." (now > the first real Note after the override element tableau) and the _last_ > (third) Note after the definition of *target set*, which appears to > _also_ be intended to address to the problem of circular chains of > overrides. This is the third in a series of three replies to Henry's message. It takes up a tangential issue, namely whether the note Henry mentions is inappropriately narrow in scope. On re-reading the note carefully today, I was struck by the narrow formulation of the conditional in the second paragraph. The note says If applying the override transformation ... to D and E results in a schema document equivalent to D ..., then the effect is the same as for a cyclic set of <include> references, or as for multiple inclusions of the same document (as described in ... §4.2.3). But in the general case, if we have a document cycle D1, D2, ... Dn, D1 where D1 contains an override element E1, and we are trying to calculate schema(D1), it will often happen that the target set of E1 will include not override(E1, D1) but override(E', D1). This will happen if any of the override elements in D2 ... Dn have children distinct from every child of E1. E' will be the result of overlaying E2 with E1, overlaying E3 with that result, overlaying E4 with that, etc. until you have overlaid the override element of Dn with the result of all the previous overlays. Under some conditions, as illustrated in test 23 (starting at schema document A), I think this situation is harmless, should be allowed by our spec, and is in fact allowed by the status quo. In other cases (e.g. test 23 starting at schema document B) it won't be harmless, should not be allowed, and is not allowed. It might be nice if the note were a little clearer on the subject, but given the difficulty we are having explaining things to each other in the WG, that may prove hard to do. On the other hand, there may be a good reason for the note to call out the edge case where the target set of E includes override(D,E), instead of discussing the general case where it includes override(E',D): it is precisely the case which arises whenever an override element E points into a set of schema documents with one or more include cycles. To take a concrete example, assume we have schema documents D1, D2, D3, D4. D1 overrides D2 with override element E. D2 includes D3. D3 includes D4. D4 includes D3. The result of applying the override transform here will be a cycle of the form described in the note: D1 overrides D2 with override element E. D2 overrides D3 with override element E. D3 overrides D4 with override element E. D4 overrides D3 with override element E. So the target set of E in schema document D1 is: override(E, D2) override(E, D3) override(E, D4) Each override element in the cycle (i.e. the versions of E in override(E,D3) and override(E,D4) will have the same children (for some definition of sameness), and once a naive queuing or stacking implementation has been around the cycle once, there is no need to go around more times. At the risk of flogging a dead horse, here is a more procedural account of how a simple document-queuing implementation might handle this case. (To skip the example, look for the end-tag.) <flogging> Our implementation will keep track of what schema documents it has to handle, and what documents it has already handled, by means of sets called QUEUE and FINISHED, and it will keep track of what it's reading right now with a variable called CURRENT. For concreteness, we'll assume that these variables all use 'schema document designators', defined as follows. - A URI is a schema document designator. - If D is a schema document designator and E is an override element, then "override(E,D)" is a schema document designator. - If D is a schema document designator and N is a namespace name, then "chameleon(N,D)" is a schema document designator. 1 We start by trying to calculate schema(D1). So CURRENT = null QUEUE = { D1 } FINISHED = {} 2 We process D1 by taking the builtins, and the components defined by D1 itself, and queuing any other schema documents D1 refers to. D1 has just the override of D2 by element E, so as soon as we read that override element, we add "override(E,D2)" to the queue, so we have CURRENT = D1 QUEUE = { override(E,D2) } FINISHED = {} Algebraically: schema(D1) = b + immed(D1) + schema(override(E,D2)) 3 When we're done with D1, we can add D1 to the list of schema documents we've already processed. CURRENT = null QUEUE = { override(E,D2) } FINISHED = { D1 } 4 Now we turn to the next item in the queue, namely override(E,D2). Its corresponding schema includes the builtins, but we already have those. It includes the components described in the document, which are unproblematic as long as they don't conflict with anything we've already got. And since D2 has an include of D3, override(E,D2) has an override element with the same children as E, but pointing at D3. So when we start processing the schema document, we have CURRENT = override(E,D2) QUEUE = {} FINISHED = { D1 } When we see the override element, we add it to QUEUE and when we're done with the document, we have CURRENT = null QUEUE = { override(E,D3) } FINISHED = { D1, override(E,D2) } Algebraically: schema(D1) = b + immed(D1) + schema(override(E,D2)) = b + immed(D1) + b + immed(override(E,D2)) + schema(override(E,D3)) = b + immed(D1) + immed(override(E,D2)) + schema(override(E,D3)) 5 The next item in the queue (the override of D3 by E) is similar. When we have finished processing it, we have CURRENT = null QUEUE = { override(E,D4) } FINISHED = { D1, override(E,D2), override(E,D3) } Algebraically: schema(D1) = b + immed(D1) + schema(override(E,D2)) = b + immed(D1) + b + immed(override(E,D2)) + schema(override(E,D3)) = b + immed(D1) + immed(override(E,D2)) + b + immed(override(E,D3)) + schema(override(E,D4)) 6 Now we process the next item in the queue, the override of D4 by E. Recall that D4 includes D3, so override(E,D4) will include an override element pointing to D3 with the same children as E, i.e. override(E,D3). Instead of adding override(E, D3) to the QUEUE, however, the processor notices that we have already included it (it's in the FINISHED set). So we don't need to add it to the queue. When we finish processing the override of D4, therefore, the variables have this state: CURRENT = null QUEUE = {} FINISHED = { D1, override(E,D2), override(E,D3), override(E,D4) } The queue is empty, so we're done. Algebraically: schema(D1) = b + immed(D1) + schema(override(E,D2)) = b + immed(D1) + b + immed(override(E,D2)) + schema(override(E,D3)) = b + immed(D1) + immed(override(E,D2)) + b + immed(override(E,D3)) + schema(override(E,D4)) = b + immed(D1) + immed(override(E,D2)) + b + immed(override(E,D3)) + b + immed(override(E,D4)) + schema(override(E,D3)) = b + immed(D1) + immed(override(E,D2)) + immed(override(E,D3)) + immed(override(E,D4)) The last step (elimination of duplicate b and of the trailing item schema(override(E,D3))) is justified in the usual way: we know from step 2 in the development that schema(D1) includes schema(override(E,D3)), and therefore it's already taken care of. This is exactly the same logic required to eliminate the trailing term in the algebraic development of a cyclic include. See, for example, section 3.8 in the posting http://lists.w3.org/Archives/Member/w3c-archive/2009Jan/0053.html of 12 January 2009. If it makes sense as a legal move for cyclic includes, it must necessarily make sense as a legal move for cyclic overrides. And the WG agreed some time ago that the spec should make it explicit that it's a legal move for cyclic includes. The note Henry points us to simply observes that the move is also legal for cyclic redefines. </flogging> In our discussions of cyclic overrides, we have several times observed that it would simplify things (for us, if not for users) to forbid cycles of overrides. But we have shied away from that because most members of the working group do not wish to forbid cycles of includes. Override cycles and include cycles need to be treated in parallel, though, for two reasons: first, they present essentially the same logical problem (as illustrated by the algebraic treatments of parallel examples) and treating them differently would be a horrible design errors. And second our handling of 'scoped transformation' turns cycles of includes into cycles of overrides. ('Scoped transformation', it may be helpful to recall, is the term we invented to describe our rule that the user of an override should not have to point to the specific document containing the declaration to be overridden, but should be able to point to a top-level driver that includes it, possibly at a remove.) In our recent teleconferences, I have sometimes thought it might be helpful to clone the discussion about cyclic includes at the end of 4.2.3 and modify it slightly to cover the case of cyclic includes which are turned into cyclic overrides in a scoped transformation. What I discover, looking carefully at the note Henry points us to, is that we already have just such a note: the narrow case covered by this note is precisely the case which will be generated by an override pointing into a cluster of schema documents with a cyclic include. (As illustrated, the immediate target of the starting override [D2 in the example] doesn't have to be part of the cycle.) So perhaps the note's focus on the edge case is justified after all. As for the more general case, I think it is covered by the general rule specified in 4.2.3 concerning 'include' elements: If [two include elements] specify different schema locations, then they refer to different schema documents, unless the implementation is able to determine that the two URIs are references to the same resource. Here the spec explicitly gives permission to a processor to determine that two schema-document references are references to the same schema document. I think it follows automatically that a processor is allowed to detect a situation where (a) a schema document D contains an override element E, and the target set of E includes not override(E,D) but override(E',D), and (b) the resource denoted by override(E',D) is "the same resource" as D. (At this point, the ineffable mysteries of web architecture block us from further reasoning.) And once I have detected that D and override(E', D) are "the same resource", I am allowed to snip the cycle, by (a) the rules governing include and (b) the statement in the spec that the meaning of an override of a schema document SD is the same as that of an include for a different schema document override(E,SD). By the same token, a processor written to be less aggressive about detecting resource identity is allowed to fail to detect the identity. Unfortunately, there is as you know no agreement in the WG about what they are allowed or required to do then: some read the XML mapping rules and the spec's remarks about forestalling "the need to establish identity component by component" as making clear that the processor ought to detect, when processing override(E',D) that it is not seeing new components but only redundant descriptions of components already known. Others tell me that they read the spec as allowing processors to avoid "the need to establish identity component by component" just by closing their eyes and clicking their heels three times. I believe that processors built along those lines will often object to override(E',D) on the grounds that it provides duplicate declarations for top-level items. I conclude that the note Henry points to is not in fact unduly narrow in scope: it deals with an important special case, and the general case is already covered by the rules for include. It may be possible to make the note clearer; having a better understanding of where people are having trouble understanding it might help. I hope this helps. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Friday, 11 March 2011 21:22:43 UTC