Re: UPA example from C. M. Sperberg-McQueen on 2008-07-01 (xmlschema-dev@w3.org from July 2008)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Tue, 1 Jul 2008 14:08:58 -0600
To: Pete Cordell <petexmldev@codalogic.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "Michael Kay" <mike@saxonica.com>, "'Boris Kolpackov'" <boris@codesynthesis.com>, <xmlschema-dev@w3.org>
Message-Id: <B77EE31F-1845-418F-B716-F2D58EECCBE2@acm.org>

On 25 Jun 2008, at 04:49 , Pete Cordell wrote:
> ...
> So I take it that under the XSD 1.1 rules, the instance would be  
> valid and have particle assigment corresponding to:
>
> <apple/> validated by element
> <apple/> validated by any
> <apple/> validated by any
>
> Rather than:
>
> <apple/> validated by element
> <apple/> validated by any
> <apple/> validated by element

No, I don't think so.

The matching up of elements in the input with particles in the
content model does not rely on lookahead.  When the third
input element is encountered, the automaton has a choice between
matching the element and matching the wildcard.  The element
declaration has a higher priority, so it wins, and the second
sequence of attributions results.  If there are only three elements
in the input, the content model is not satisfied, and the parent
element will be invalid.

> Personally I think that, subject to occurrence constraints, the  
> particle that is currently gobbling up input, should have priority  
> (i.e. they're greedy).

In a sense, I think this matches the SGML rule that specifies explicitly
that when either an inner or an outer occurrence indicator could fire,
it's the inner one that fires.  Ironically, in SGML having the rule
can never make a difference in determining whether an element is valid
or not -- it only makes a difference if different semantic rules are
attached to different occurrence indicators (which is such a bad idea
that it hardly merits contemplation).  But in XSD 1.0, where it does
make a difference, the spec fails to make the interpretation of the
regular expression deterministic.  The Unique Particle Attribution
rule determinizes some situations (like the one here). But when a  
content
model is (in the terminology of Brüggemann-Klein and Wood) weakly
deterministic, but not strongly deterministic, then the UPA doesn't
help at all.

> That seems a lot easier to implement, it's a lot easier for schema  
> authors/users to understand and has similarities to how regular  
> expressions behave.  I think it might give the wrong result in some  
> situations, but I don't think it will be wrong in any more  
> situations than the current 1.1 rules.  I also think that in  
> situations where behaviour similar to 1.1 rules is preferred the  
> xs:any notQName attribute can come to the rescue.

You may be right.  But to judge whether a given rule gives the right or
the wrong result in a given situation, how does one determine which
result is right and which is wrong?

Michael Sperberg-McQueen

Received on Tuesday, 1 July 2008 20:09:39 UTC