- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Tue, 1 Jul 2008 16:17:16 -0600
- To: "Pete Cordell" <petexmldev@codalogic.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, <xmlschema-dev@w3.org>
On 1 Jul 2008, at 15:11 , Pete Cordell wrote: > > Original Message From: "C. M. Sperberg-McQueen" > > >> I think explaining things to users would be somewhat simpler if >> we lost the Unique Particle Attribution rule entirely, but I have >> not succeeded in persuading the rest of the working group. > > > Hi Michael, > > You've mentioned removing the UPA rule before. Presumably there > must be some sort of Particle Attribution process, so what would > you suggest would happen if UPA was removed (as opposed to, say, > the UPA rules being changed)? Well, my mental model of things is based on the way non-deterministic finite state automata are described in textbooks: if there is a path through the FSA that matches the input and ends in an accept state, then the FSA accepts the input. A simple way to apply that to content-model-based validation would be to say that checking input against a content model traces a certain path through the model; if the model is non-deterministic, the same input may have multiple paths through the model. If one of them leads to an accept state, then the content is valid against the content model. How this interacts with type assignment, etc., is an interesting design question. I think a variety of answers are possible, but off the top of my head only the following come to mind: - We could say that each successful path through the content model produces appropriate PSVI properties. If more than one path produces a valid result, then each child has more than one type assignment and validity property (plus whatever others are path-dependent). This would make it harder to do type assignment in a single-pass pre-order traversal of the document; systems that require the ability to know a single possible type as soon as the element is encountered would need to support not all schemas, but only schemas that provided the required property. (In XSD 1.0 and 1.1, what has happened is that those who require determinism have effectively prohibited non-determinism because they would rather deny the expressive power to users and other developers than have to say that they support only some schemas.) - We could say that if there is more than one path through a content model, the implementation must pick at least one, but need not pick more than one. (If you care, you'll need to avoid such non-determinism.) - We could say that while the content model may be non-deterministic, the relevant PSVI properties must nevertheless be deterministic. For type assignment, Element Declarations Consistent already achieves this, though it does not enforce consistency on other properties of element declarations; it would need to be strengthened so that its content agreed with its name. To take an example: consider the content model ((a as T1)*, (a as T2) *)*, which effectively accepts any sequence of 'a' elements, typing each as either T1 or T2. If presented with a sequence of 'a' elements, each valid against either T1 or T2, but not both, then the first rule would accept the content model and type each child element in the way that works. If some children are valid against both T1 and T2, then there would be more than one annotation of the input that would work, just as there is more than one parse for the regular expression a*a*. (In the case of a child valid against neither T1 nor T2, my instinct is to provide both invalid parses, but it's only an instinct.) The second rule would accept the content model but an implementation that chose randomly might not have a good chance of recognizing good input. The third rule would reject the content model because the type assignment is non-deterministic, but would accept a similar model with two (a as T1) particles. I hope this goes a little way to answering your question. Michael
Received on Tuesday, 1 July 2008 22:17:52 UTC