Re: the UPA-constraint and danish word division from Marie Bilde Rasmussen on 2006-09-19 (xmlschema-dev@w3.org from September 2006)

From: Marie Bilde Rasmussen <mariebilderas@gmail.com>
Date: Tue, 19 Sep 2006 23:43:17 +0200
To: "Xan Gregg" <xan.gregg@jmp.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, xmlschema-dev@w3.org
Message-ID: <c36097090609191443p7728d36bpe2bb5baa55743d10@mail.gmail.com>

First of all, I was happy to read that my grammar actually could NOT be
implemented without violating the UPA-consraint :o)

As far as I can judge, the workarounds proposed by Michael and Xan are very
good solutions: data can be validated - maybe corrected - and sent further
on in the production pipeline.

But when the schema is designed to support the authoring process,
these workarounds are maybe not very helpfull. This leads to a more
principal discussion of the UPA-restriction. So I followed Mchaels advice
and raised the issue with the W3C schema WG. I cite a few paragraphs here:

"For document authoring purposes, it is of the greatest importance, that
authors feel confident, that the underlying schema actually tells him
exactly what he is allowed to – or what possiblities he has. Running a
post-editing process to find out that the insertion you made of some
element  is actually invalid (and you made it because the schema-aware
software actually proposed this operation to you!), would possibly weaken
your confidence in the schema as being a precise and
trust-worthy implementation of the editorial principles, that rules the type
of text, you work with.

Furthermore, the renaming strategy might seem neat to the designer and the
data consumer (e.g. a processing engineer). But on the other hand, calling
the same thing by two different names will blur an otherwise precise
terminology of a grammar. In other words: why claim that a rose is not a
rose is not a rose?"

I would like to know, if others are actually using W3C xml schema for human
document authoring purposes? Who? How many are we? Should it be used for
this purpose?

:o)Marie

2006/9/18, Xan Gregg <xan.gregg@jmp.com>:

> I don't quite have a UPA friendly version of this grammar, but the
> optionality of hyphens and blanks between wordparts makes it a little
> different from the chess example. Imagine if black moves were
> optional in chess; then the model could be (W, B?)*, which obeys UPA.
>
> In this case, though, there is a twist in that the final wordpart can
> be followed by a hyphen but not a blank, which is different from the
> other wordpart gaps. If ignore this difference, we get a little
> stricter model in the spirit of Michael's approach #2 (come as close
> as possible and rely on a subsequent processor to catch the missed
> cases. Here's a model that allows a trailing blank and/or hyphen:
>
>     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
>     |    (w,  ((h, b?) | (b, h?))){2, unbounded}
>     )
>
> where "{2, unbounded}" is the quantifier "minOccurs = 2, maxOccurs =
> unbounded".
>
> xan
>
> On Sep 15, 2006, at 11:34 PM, C. M. Sperberg-McQueen wrote:
>
> >
> > I can see three approaches to your problem in practice:
> >
> >
> > (2) define an XSD rule that comes as close as you can to
> > restricting the data without violating UPA, and use
> > Schematron to supply the additional check.  Your rule
> > might be:
> >
> >     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
> >     |    (w,  ((h, b?) | (b, h?))?, w, (h | b | w)*)
> >     )
> >
> > and Schematron rules can check that
> >
> >   each b is followed on the right either by a wordpart
> >     or by a hyphen and then a wordpart
> >   each hyphen is followed on the right by
> >     (a) a wordpart, or (b) a blank and then a
> >     wordpart, or (c) nothing
>
>

Received on Tuesday, 19 September 2006 21:43:28 UTC