the UPA-constraint and danish word division from Marie Bilde Rasmussen on 2006-09-15 (xmlschema-dev@w3.org from September 2006)

From: Marie Bilde Rasmussen <mariebilderas@gmail.com>
Date: Fri, 15 Sep 2006 22:54:21 +0200
To: xmlschema-dev@w3.org
Message-ID: <c36097090609151354r5d9fef8dieefea05582529108@mail.gmail.com>

Hello everybody.

I can't represent the grammar that I need in aW3C schema without violating
the UPA-constraint.
My task is to represent hyphenation (acceptable word division) of danish
words.
This is my grammar expressed as an EBNF:

( hyphen, ( wordpart, ( ( ( hyphen, blank? ) | (blank, hyphen?) )?
wordpart )+ ) ) | ( ( wordpart, ( ( ( hyphen, blank? ) | (blank, hyphen?) )?
wordpart )+ ), hyphen? )

In (my somewhat broken) english this could be formulated as:
- each represented word consist of at least 2 word parts
- between two word parts, there may occur (at most) one hyphen and (at most)
one blank, their order is not significant and none of them are obligatory
-  a word can have an initial OR a trailing hyphen (suffixes and prefixes) -
a wordcan't have both, and most words have neither the initial nor the
trailing hyphen.

The hyphens represented as elements are NOT a representation of word
division points - they are part of the ortography of the word.

I can see, that my EBNF-representation violates the UPA-constraint in the
sense that it is not unambiguos which branch in the gramar tree is to be
used, when a hyphen is encountered immediately following a wordpart in the
input data.

Can anybody help me reformulating this rule or tell me why this
isn'tpossible witout violating the UPA-constraint. If so, I would be very
grateful :o)

Marie Bilde Rasmussen
Gyldendal Publishers,
Copenhagen (Denmark)

Received on Saturday, 16 September 2006 00:36:11 UTC