Re: the UPA-constraint and danish word division from Xan Gregg on 2006-09-18 (xmlschema-dev@w3.org from September 2006)

From: Xan Gregg <xan.gregg@jmp.com>
Date: Mon, 18 Sep 2006 16:56:14 -0400
To: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Cc: Marie Bilde Rasmussen <mariebilderas@gmail.com>, xmlschema-dev@w3.org
Message-Id: <7B4ACF26-9492-4BD0-B722-F825E88DE201@jmp.com>

I don't quite have a UPA friendly version of this grammar, but the  
optionality of hyphens and blanks between wordparts makes it a little  
different from the chess example. Imagine if black moves were  
optional in chess; then the model could be (W, B?)*, which obeys UPA.

In this case, though, there is a twist in that the final wordpart can  
be followed by a hyphen but not a blank, which is different from the  
other wordpart gaps. If ignore this difference, we get a little  
stricter model in the spirit of Michael's approach #2 (come as close  
as possible and rely on a subsequent processor to catch the missed  
cases. Here's a model that allows a trailing blank and/or hyphen:

     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
     |    (w,  ((h, b?) | (b, h?))){2, unbounded}
     )

where "{2, unbounded}" is the quantifier "minOccurs = 2, maxOccurs =  
unbounded".

xan

On Sep 15, 2006, at 11:34 PM, C. M. Sperberg-McQueen wrote:

>
> I can see three approaches to your problem in practice:
>
>
> (2) define an XSD rule that comes as close as you can to
> restricting the data without violating UPA, and use
> Schematron to supply the additional check.  Your rule
> might be:
>
>     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
>     |    (w,  ((h, b?) | (b, h?))?, w, (h | b | w)*)
>     )
>
> and Schematron rules can check that
>
>   each b is followed on the right either by a wordpart
>     or by a hyphen and then a wordpart
>   each hyphen is followed on the right by
>     (a) a wordpart, or (b) a blank and then a
>     wordpart, or (c) nothing

Received on Monday, 18 September 2006 20:56:25 UTC