- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 27 Jun 2022 08:46:33 -0600
- To: John Lumley <john@saxonica.com>
- Cc: public-ixml@w3.org
John Lumley writes: > In working thought the ixml grammar for XPath I of course have > potential ambiguity between 'element()' as a node kind test and > 'element()' as a function call. In the XPath spec 'element' (and the > like) is a reserved name for function calls. I /think/ we cannot > express such a reservation in iXML, but am not entirely sure, and I'll > have to live with the ambiguity (and its concommitant ambiguity > complexity). I believe that that is true. The same issue arises in the Oberon grammar we just added to the samples directory, for the keywords NIL, TRUE, and FALSE. (It seems not to arise for other keywords since they cannot appear in places where a variable might appear.) And for that matter it also arises in the vCard grammar Dave Pawson and I just wrote, for which see Dave's inquiry on the xsl-list [1] and the ensuing thread; to cut to the chase and see the grammar, go to [2]. [1] https://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/202206/msg00098.html [2] https://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/202206/msg00115.html One point of potential interest in that grammar is that it writes around the ambiguity by defining a 'name' as a sequence of letters, digits, or hyphens which (a) begins with a letter, (b) does not begin with 'X-' or 'x-', (c) does not begin with the string 'BEGIN", and (d) does not begin with the string "END". { In principle name could be very simple. But we want to distinguish normal names from x-names, and we want to ensure that BEGIN and END are not recognized as names but as keywords. So we have a more complicated definition. } @name = not-an-x-name | not-begin | not-end | normal-name | x-name . { not-an-x-name, though it begins with X } -not-an-x-name = ["Xx"], (~["-"], (ALPHA | DIGIT | "-")*)?. { not-begin, though it begins with B... } -not-begin = "BEGI", (~["nN"], (ALPHA | DIGIT | "-")*)? | "BEG", (~["iI"], (ALPHA | DIGIT | "-")*)? | "BE", (~["gG"], (ALPHA | DIGIT | "-")*)? | "B", (~["eE"], (ALPHA | DIGIT | "-")*)? . { not-end, though it begins with E or EN } -not-end = ["Ee"], ["Nn"], (~["Dd"], (ALPHA | DIGIT | "-")*)? | ["Ee"], (~["Nn"], (ALPHA | DIGIT | "-")*)? . { normal-name: does not look like x-name, begin, or end at any point } -normal-name = ~["XxBbEe"], (ALPHA | DIGIT | "-")*. > Any enlightenment would be appreciated First, contemplate the sound of five regular expressions clapping. Since regular languages are closed under set difference, it must be theoretically possible to define a nonterminal that recognizes anything in a particular regular set with the exception of reserved words. In the case of the vCard grammar, the task was simple enough to do by hand, but the pattern is simple enough that I suppose it might be automatable. Even if it's automatable, however, very few people are going to be willing to contemplate either the task or the result. So I think it would be nice to find a way to handle grammars with reserved words or ambiguities of the element() / element() sort. An obvious approach that comes to mind would be to allow a grammar writer to assign a priority to the different top-level alts of a rule, with the meaning "If there is a choice between a parse using alt 1 and a parse using alt 2, for the same string, choose alt 1." So instead of the definitions above, 'name' could be defined using priorities to prefer @name = {10} reserved-word | {5} extension | {1} ALPHA, (ALPHA|DIGIT|"-")+. extension = ["xX"], "-", (ALPHA|DIGIT|"-")+. reserved-word = "BEGIN"; "END". But note that this does not completely solve the problem: "BEGIN" and "END" are still accepted as names; they are just marked specially. So a simple priority scheme is not going to do the trick. Rats. I had hopes for that. (What was that noise? It sounded like five regular expressions falling on the floor in a heap.) Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Monday, 27 June 2022 14:46:58 UTC