Re: UPA example from C. M. Sperberg-McQueen on 2008-07-01 (xmlschema-dev@w3.org from July 2008)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Tue, 1 Jul 2008 16:17:16 -0600
To: "Pete Cordell" <petexmldev@codalogic.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, <xmlschema-dev@w3.org>
Message-Id: <4AA456E4-5178-425A-A9E9-3E6B459C672B@acm.org>
On 1 Jul 2008, at 15:11 , Pete Cordell wrote:

>
> Original Message From: "C. M. Sperberg-McQueen"
>
>
>> I think explaining things to users would be somewhat simpler if
>> we lost the Unique Particle Attribution rule entirely, but I have
>> not succeeded in persuading the rest of the working group.
>
>
> Hi Michael,
>
> You've mentioned removing the UPA rule before.  Presumably there  
> must be some sort of Particle Attribution process, so what would  
> you suggest would happen if UPA was removed (as opposed to, say,  
> the UPA rules being changed)?

Well, my mental model of things is based on the way non-deterministic
finite state automata are described in textbooks:  if there is a path
through the FSA that matches the input and ends in an accept state,
then the FSA accepts the input.  A simple way to apply that to
content-model-based validation would be to say that checking input
against a content model traces a certain path through the model;
if the model is non-deterministic, the same input may have multiple
paths through the model.  If one of them leads to an accept state,
then the content is valid against the content model.

How this interacts with type assignment, etc., is an interesting
design question.  I think a variety of answers are possible, but
off the top of my head only the following come to mind:

   - We could say that each successful path through the content
     model produces appropriate PSVI properties.  If more than
     one path produces a valid result, then each child has more
     than one type assignment and validity property (plus whatever
     others are path-dependent).

     This would make it harder to do type assignment in a single-pass
     pre-order traversal of the document; systems that require
     the ability to know a single possible type as soon as the element
     is encountered would need to support not all schemas, but only
     schemas that provided the required property.  (In XSD 1.0 and
     1.1, what has happened is that those who require determinism
     have effectively prohibited non-determinism because they would
     rather deny the expressive power to users and other developers
     than have to say that they support only some schemas.)

   - We could say that if there is more than one path through a
     content model, the implementation must pick at least one, but
     need not pick more than one.  (If you care, you'll need to
     avoid such non-determinism.)

   - We could say that while the content model may be non-deterministic,
     the relevant PSVI properties must nevertheless be deterministic.
     For type assignment, Element Declarations Consistent already
     achieves this, though it does not enforce consistency on other
     properties of element declarations; it would need to be  
strengthened
     so that its content agreed with its name.

To take an example: consider the content model ((a as T1)*, (a as T2) 
*)*,
which effectively accepts any sequence of 'a' elements, typing each as
either T1 or T2.  If presented with a sequence of 'a' elements, each
valid against either T1 or T2, but not both, then the first rule
would accept the content model and type each child element in the
way that works.  If some children are valid against both T1 and T2,
then there would be more than one annotation of the input that would
work, just as there is more than one parse for the regular expression
a*a*.  (In the case of a child valid against neither T1 nor T2, my
instinct is to provide both invalid parses, but it's only an instinct.)
The second rule would accept the content model but an implementation
that chose randomly might not have a good chance of recognizing good
input.  The third rule would reject the content model because the
type assignment is non-deterministic, but would accept a similar
model with two (a as T1) particles.

I hope this goes a little way to answering your question.

Michael
Received on Tuesday, 1 July 2008 22:17:52 UTC