- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sun, 13 Feb 2022 16:03:04 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
Norm Tovey-Walsh writes: > Consider this grammar: > list: word + -',' . > word: c, v, c ; c, v, v . > -c: ["bcdfghjklmnpqrstvwxyz"] . > -v: ["aeiouy" ]. > and this input: “hey,bee”. > By one reckoning, that’s ambiguous: “hey” can be either cvc or cvv > because I’ve identified “y” as both a consonant and a vowel. > But “c” and “v” are both elided from the output, so the generated XML > is identical for both parses. By that reckoning, it isn’t ambiguous. Nice example. > I expect our intent is that it *isn’t* ambiguous…but I thought I’d > check. For what it's worth ... If we as a group have an intent here, I don't know what it is. Issue 26 asks how we wish to define "ambiguity", but the upshot of the discussion starting from your message of 5 January [1] seems to me to have been only that some members of the CG do not wish to define it; without a definition of what counts as ambiguity, I don't think we can have a coherent intent. If I had to predict what the CG will end up doing, my money would be on leaving the effective definition of ambiguity implementation-defined and specifying that processors MAY report ambiguity, rather than MUST, or possibly specifying that IF processors detect ambiguity they MUST report it (modulo user option to suppress the ambiguity flag in the output), but not requiring that they detect ambiguity whenever it exists. (I predict there will be CG members who would like to require the detection of ambiguity, but without a crisp definition of ambiguity that requirement lacks teeth.) One difficulty is that we have already seen that different implementations of ixml use different underlying parsing methods, even when the implementors all say they are using Earley parsing. But: - Implementations that parse using the ixml grammar directly and those which translate the ixml grammar to BNF for parsing are working with different grammars; their raw parse trees will differ and ambiguity in one doesn't always mean ambiguity in another. - Implementations which translate the ixml grammar to BNF will not necessarily use the same translations -- the obvious requirement is that the grammar be equivalent and allow the construction of the XML abstract syntax tree, and there is more than one BNF that meets those criteria. - I think that some ways of recording parsing results may make it easy to see whether there is more than one XML AST for a given sentence, but I'm not sure that's true for every possible approach. We don't want to constrain the internals of any implementation, and we want interoperability, and we want ambiguity to be flagged. I don't think all of those three can be combined in their pure form; we are going to have to weaken one or more of them. My two cents. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com [1] https://lists.w3.org/Archives/Public/public-ixml/2022Jan/0030.html
Received on Sunday, 13 February 2022 23:03:23 UTC