- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 27 Jun 2022 09:37:22 -0600
- To: John Lumley <john@saxonica.com>
- Cc: public-ixml@w3.org
C. M. Sperberg-McQueen writes: > ... > > An obvious approach that comes to mind would be to allow a grammar > writer to assign a priority to the different top-level alts of a rule, > with the meaning "If there is a choice between a parse using alt 1 and a > parse using alt 2, for the same string, choose alt 1." > > ... > > But note that this does not completely solve the problem: "BEGIN" and > "END" are still accepted as names; they are just marked specially. > > So a simple priority scheme is not going to do the trick. Rats. I had > hopes for that. From the point of view of notation, it occurs to me that the language-subtraction notation used by the XML spec (namely F - G for two expressions F and G) expresses very neatly what we want: { a name is a standard name which is not a keyword or an x-name, or else it is an x-name. } name = -standard-name - (keyword; x-name) ; x-name . { standard-name, x-name, and keyword are all straightforward. } standard-name = ALPHA, (ALPHA; DIGIT; "-")*. x-name = ["Xx"], "-", standard-name. keyword = "BEGIN"; "END". If E = F - G, then if I remember my lesssons correctly: - If F and G are regular, E is regular (since regular languages are closed under subtraction). - If F is context-free and G is regular, E is context-free. - If F and G are context-free, E is not guaranteed context-free. That last one is a killer: it would move ixml out of the context-free languages business and into a broader field where some very cold winds may blow. And I suspect that ANY construct with the meaning "anything that matches A but does not match B" is going to be problematic for the same reason. Either you place restrictions on what B can represent, or you are not guaranteed to be describing a context-free language. Operationally, of course, it's simple to describe how a processor could handle expressions like F - G: 1 Try to recognize an F starting at the current position. 2 If step 1 succeeds, try to recognize a G starting and ending at the same positions as the F just recognized. Otherwise (step 1 failed) nothing matches (F - G) at the current location. 3 If step 2 succeeds, then we have both an F and a G covering the same section of the input. That section of the input does not match (F - G). Otherwise (step 2 fails), we have an F but not a G for that section of the input. That section matches the expression (F - G). I wonder whether a PEG parser could just rewrite the expression as (G, fail; F) and be happy. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Monday, 27 June 2022 15:37:39 UTC