Re: UPA example from Pete Cordell on 2008-07-03 (xmlschema-dev@w3.org from July 2008)

From: Pete Cordell <petexmldev@codalogic.com>
Date: Thu, 3 Jul 2008 17:49:53 +0100
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: <xmlschema-dev@w3.org>
Message-ID: <03aa01c8dd2c$d4c5ee20$ea00a8c0@Codalogic>
----- Original Message From: "C. M. Sperberg-McQueen"

Thanks Michael.  Observations below...

>   - We could say that each successful path through the content
>     model produces appropriate PSVI properties.  If more than
>     one path produces a valid result, then each child has more
>     than one type assignment and validity property (plus whatever
>     others are path-dependent).
>
>     This would make it harder to do type assignment in a single-pass
>     pre-order traversal of the document; systems that require
>     the ability to know a single possible type as soon as the element
>     is encountered would need to support not all schemas, but only
>     schemas that provided the required property.  (In XSD 1.0 and
>     1.1, what has happened is that those who require determinism
>     have effectively prohibited non-determinism because they would
>     rather deny the expressive power to users and other developers
>     than have to say that they support only some schemas.)

Presumably that would require some form of RE-like backtracking.  Personally 
I would like to avoid that.  Admittedly that's because currently XSD doesn't 
require it and consequently our architecture doesn't support it.  (In effect 
our FSM is implemented in the generated code and backtracking would require 
a completely new architecture.)

Another reason is that I think backtracking opens the way to 
non-deterministic execution times (or rather potentially unbounded execution 
times).  Admittedly this could be a case of "buyer beware!"

I also don't know how often this feature would be useful.  Maybe I live a 
too sheltered life!  But it would seem to me that if somebody really wanted 
this feature they could use Relax-NG, which would be very good at this sort 
of thing.

>   - We could say that if there is more than one path through a
>     content model, the implementation must pick at least one, but
>     need not pick more than one.  (If you care, you'll need to
>     avoid such non-determinism.)

It does make me feel uncomfortable that different implementations could 
choose different paths.  Some would validate OK and others would not.  At 
firts glance that seems undesirable.

>   - We could say that while the content model may be non-deterministic,
>     the relevant PSVI properties must nevertheless be deterministic.
>     For type assignment, Element Declarations Consistent already
>     achieves this, though it does not enforce consistency on other
>     properties of element declarations; it would need to be  strengthened
>     so that its content agreed with its name.
>
> To take an example: consider the content model ((a as T1)*, (a as T2) *)*,
> which effectively accepts any sequence of 'a' elements, typing each as
> either T1 or T2.  If presented with a sequence of 'a' elements, each
> valid against either T1 or T2, but not both, then the first rule
> would accept the content model and type each child element in the
> way that works.  If some children are valid against both T1 and T2,
> then there would be more than one annotation of the input that would
> work, just as there is more than one parse for the regular expression
> a*a*.  (In the case of a child valid against neither T1 nor T2, my
> instinct is to provide both invalid parses, but it's only an instinct.)
> The second rule would accept the content model but an implementation
> that chose randomly might not have a good chance of recognizing good
> input.  The third rule would reject the content model because the
> type assignment is non-deterministic, but would accept a similar
> model with two (a as T1) particles.
>
> I hope this goes a little way to answering your question.

Yes, thanks.

Pete Cordell
Codalogic
For XML C++ data binding visit http://www.codalogic.com/lmx/
Received on Thursday, 3 July 2008 16:50:42 UTC