W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2006

Re: the UPA-constraint and danish word division

From: Marie Bilde Rasmussen <mariebilderas@gmail.com>
Date: Tue, 19 Sep 2006 23:43:17 +0200
Message-ID: <c36097090609191443p7728d36bpe2bb5baa55743d10@mail.gmail.com>
To: "Xan Gregg" <xan.gregg@jmp.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, xmlschema-dev@w3.org
First of all, I was happy to read that my grammar actually could NOT be
implemented without violating the UPA-consraint :o)

As far as I can judge, the workarounds proposed by Michael and Xan are very
good solutions: data can be validated - maybe corrected - and sent further
on in the production pipeline.

But when the schema is designed to support the authoring process,
these workarounds are maybe not very helpfull. This leads to a more
principal discussion of the UPA-restriction. So I followed Mchaels advice
and raised the issue with the W3C schema WG. I cite a few paragraphs here:

"For document authoring purposes, it is of the greatest importance, that
authors feel confident, that the underlying schema actually tells him
exactly what he is allowed to  or what possiblities he has. Running a
post-editing process to find out that the insertion you made of some
element  is actually invalid (and you made it because the schema-aware
software actually proposed this operation to you!), would possibly weaken
your confidence in the schema as being a precise and
trust-worthy implementation of the editorial principles, that rules the type
of text, you work with.

Furthermore, the renaming strategy might seem neat to the designer and the
data consumer (e.g. a processing engineer). But on the other hand, calling
the same thing by two different names will blur an otherwise precise
terminology of a grammar. In other words: why claim that a rose is not a
rose is not a rose?"

I would like to know, if others are actually using W3C xml schema for human
document authoring purposes? Who? How many are we? Should it be used for
this purpose?


2006/9/18, Xan Gregg <xan.gregg@jmp.com>:

> I don't quite have a UPA friendly version of this grammar, but the
> optionality of hyphens and blanks between wordparts makes it a little
> different from the chess example. Imagine if black moves were
> optional in chess; then the model could be (W, B?)*, which obeys UPA.
> In this case, though, there is a twist in that the final wordpart can
> be followed by a hyphen but not a blank, which is different from the
> other wordpart gaps. If ignore this difference, we get a little
> stricter model in the spirit of Michael's approach #2 (come as close
> as possible and rely on a subsequent processor to catch the missed
> cases. Here's a model that allows a trailing blank and/or hyphen:
>     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
>     |    (w,  ((h, b?) | (b, h?))){2, unbounded}
>     )
> where "{2, unbounded}" is the quantifier "minOccurs = 2, maxOccurs =
> unbounded".
> xan
> On Sep 15, 2006, at 11:34 PM, C. M. Sperberg-McQueen wrote:
> >
> > I can see three approaches to your problem in practice:
> >
> >
> > (2) define an XSD rule that comes as close as you can to
> > restricting the data without violating UPA, and use
> > Schematron to supply the additional check.  Your rule
> > might be:
> >
> >     ((h, (w, (((h, b?) | (b, h?))?, w)+ ))
> >     |    (w,  ((h, b?) | (b, h?))?, w, (h | b | w)*)
> >     )
> >
> > and Schematron rules can check that
> >
> >   each b is followed on the right either by a wordpart
> >     or by a hyphen and then a wordpart
> >   each hyphen is followed on the right by
> >     (a) a wordpart, or (b) a blank and then a
> >     wordpart, or (c) nothing
Received on Tuesday, 19 September 2006 21:43:28 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:10 UTC