- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Fri, 15 Sep 2006 21:34:11 -0600
- To: Marie Bilde Rasmussen <mariebilderas@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, xmlschema-dev@w3.org
On 15 Sep 2006, at 14:54 , Marie Bilde Rasmussen wrote: > Hello everybody. > I can't represent the grammar that I need in aW3C schema > without violating the UPA-constraint. ... > This is my grammar expressed as an EBNF: [names reduced to initials, for brevity -MSM] > ( h, (w, (((h, b?) | (b, h?))? w)+ )) > | ((w, (((h, b?) | (b, h?))? w)+ ), h?) > ... > I can see, that my EBNF-representation violates the > UPA-constraint in the sense that it is not unambiguos which > branch in the gramar tree is to be used, when a hyphen is > encountered immediately following a wordpart in the input data. From a first visual examination I think the problem is solely with the second branch of the outer 'or'. The first branch looks fine. (Software I've consulted confirms this.) But within the second branch, you are exactly right: once the sub-expression (((h, b?) | (b, h?))? w)+ has been satisfied once, the next hyphen could match either the one at the beginning of the expression, or the one after the expression. > Can anybody help me reformulating this rule or tell me why this > isn'tpossible witout violating the UPA-constraint. If so, I > would be very grateful :o) I thought for a moment that I could solve this by putting hyphen first in the repetition, and writing something like ((h, (b? w)?) | ((b, h?)?, w))+ but that, of course, also violates UPA: the w following the hyphen needs to be optional, in case the hyphen is the final hyphen of the word, and that means a w following an h can match either of the two w tokens in the content model. Working with the grammar a bit has made me believe that your content model is, for purposes of this discussion, analogous to the chess-game problem: using b and w for black moves and white moves, write a content model for a chess game. One obvious solution is ((w, b)*, w?), but it violates UPA, and if I am correctly informed so does every regular expression for this language. (At least, the chess game problem is often cited as a well known case of a regular language without a deterministic regular expression. In your problem, the interaction between hyphens and blanks complicates things a bit, and the fact that hyphens are optional between word parts also complicates things, but when hyphens are used, the pattern is an alternation of wordpart and separator which can end after either part of the alternation. I can see three approaches to your problem in practice: (1) decide that a word-final hyphen is a special kind of hyphen, and give it a different element name. Then your grammar rule becomes ((h, (w, (((h, b?) | (b, h?))?, w)+ )) | ((w, (((h, b?) | (b, h?))?, w)+ ), hf?)) and there is no UPA violation. (2) define an XSD rule that comes as close as you can to restricting the data without violating UPA, and use Schematron to supply the additional check. Your rule might be: ((h, (w, (((h, b?) | (b, h?))?, w)+ )) | (w, ((h, b?) | (b, h?))?, w, (h | b | w)*) ) and Schematron rules can check that each b is followed on the right either by a wordpart or by a hyphen and then a wordpart each hyphen is followed on the right by (a) a wordpart, or (b) a blank and then a wordpart, or (c) nothing (3) (It kills me to say this) Use Relax NG, which does not have the UPA rule. Or speak to your schema vendor about providing a mode of operation which does not check the UPA rule -- I am reliably informed that at least one widely deployed schema validator has such a mode, which is turned on using a switch the vendor tells you about only when you ask. And in any case, raise an issue with the XML Schema Working Group making sure they know that the UPA rule is causing problems for you. (Some of my colleagues on the WG are tired of hearing me tell them this, and will be glad to hear it in a different voice. I suspect also that some of them don't really believe me, but they may be more apt to believe a user who is actually paying someone for schema-aware software. Paying customers are always worth listening to.) Ideally, issues are best raised by entering a bug report into the Bugzilla bug-tracking system -- instructions are at http://www.w3.org/XML/2006/01/public-bugzilla (let me know if anything in them is unclear -- they haven't been well debugged). Or if that is too cumbersome, send email to www-xml-schema-comments@w3.org Thank you! --C. M. Sperberg-McQueen Staff contact, W3C XML Schema Working Group
Received on Saturday, 16 September 2006 03:38:24 UTC