- From: Marie Bilde Rasmussen <mariebilderas@gmail.com>
- Date: Tue, 19 Sep 2006 23:43:17 +0200
- To: "Xan Gregg" <xan.gregg@jmp.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, xmlschema-dev@w3.org
- Message-ID: <c36097090609191443p7728d36bpe2bb5baa55743d10@mail.gmail.com>
First of all, I was happy to read that my grammar actually could NOT be implemented without violating the UPA-consraint :o) As far as I can judge, the workarounds proposed by Michael and Xan are very good solutions: data can be validated - maybe corrected - and sent further on in the production pipeline. But when the schema is designed to support the authoring process, these workarounds are maybe not very helpfull. This leads to a more principal discussion of the UPA-restriction. So I followed Mchaels advice and raised the issue with the W3C schema WG. I cite a few paragraphs here: "For document authoring purposes, it is of the greatest importance, that authors feel confident, that the underlying schema actually tells him exactly what he is allowed to – or what possiblities he has. Running a post-editing process to find out that the insertion you made of some element is actually invalid (and you made it because the schema-aware software actually proposed this operation to you!), would possibly weaken your confidence in the schema as being a precise and trust-worthy implementation of the editorial principles, that rules the type of text, you work with. Furthermore, the renaming strategy might seem neat to the designer and the data consumer (e.g. a processing engineer). But on the other hand, calling the same thing by two different names will blur an otherwise precise terminology of a grammar. In other words: why claim that a rose is not a rose is not a rose?" I would like to know, if others are actually using W3C xml schema for human document authoring purposes? Who? How many are we? Should it be used for this purpose? :o)Marie 2006/9/18, Xan Gregg <xan.gregg@jmp.com>: > I don't quite have a UPA friendly version of this grammar, but the > optionality of hyphens and blanks between wordparts makes it a little > different from the chess example. Imagine if black moves were > optional in chess; then the model could be (W, B?)*, which obeys UPA. > > In this case, though, there is a twist in that the final wordpart can > be followed by a hyphen but not a blank, which is different from the > other wordpart gaps. If ignore this difference, we get a little > stricter model in the spirit of Michael's approach #2 (come as close > as possible and rely on a subsequent processor to catch the missed > cases. Here's a model that allows a trailing blank and/or hyphen: > > ((h, (w, (((h, b?) | (b, h?))?, w)+ )) > | (w, ((h, b?) | (b, h?))){2, unbounded} > ) > > where "{2, unbounded}" is the quantifier "minOccurs = 2, maxOccurs = > unbounded". > > xan > > On Sep 15, 2006, at 11:34 PM, C. M. Sperberg-McQueen wrote: > > > > > I can see three approaches to your problem in practice: > > > > > > (2) define an XSD rule that comes as close as you can to > > restricting the data without violating UPA, and use > > Schematron to supply the additional check. Your rule > > might be: > > > > ((h, (w, (((h, b?) | (b, h?))?, w)+ )) > > | (w, ((h, b?) | (b, h?))?, w, (h | b | w)*) > > ) > > > > and Schematron rules can check that > > > > each b is followed on the right either by a wordpart > > or by a hyphen and then a wordpart > > each hyphen is followed on the right by > > (a) a wordpart, or (b) a blank and then a > > wordpart, or (c) nothing > >
Received on Tuesday, 19 September 2006 21:43:28 UTC