- From: Bethan Tovey-Walsh <accounts@bethan.wales>
- Date: Mon, 14 Feb 2022 00:24:34 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, public-ixml@w3.org
I would tentatively suggest we consider saying that implementations must report that there is more than one possible valid vxml output. That wouldn’t require committing to a single definition of ambiguity, and would (I suspect) be the most useful information for users. I don’t imagine a user being particularly interested in the intermediate (and potentially ambiguous) states their grammar goes through in order for my implementation to parse an input. I do imagine their being interested in knowing that the vxml output they’ve received is one amongst a larger number of possible outputs. But there’s a good chance I’m overlooking something obvious which will make this suggestion unworkable. > On 13 Feb 2022, at 23:03, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > > Norm Tovey-Walsh writes: > >> Consider this grammar: > >> list: word + -',' . >> word: c, v, c ; c, v, v . >> -c: ["bcdfghjklmnpqrstvwxyz"] . >> -v: ["aeiouy" ]. > >> and this input: “hey,bee”. > >> By one reckoning, that’s ambiguous: “hey” can be either cvc or cvv >> because I’ve identified “y” as both a consonant and a vowel. > >> But “c” and “v” are both elided from the output, so the generated XML >> is identical for both parses. By that reckoning, it isn’t ambiguous. > > Nice example. > >> I expect our intent is that it *isn’t* ambiguous…but I thought I’d >> check. > > For what it's worth ... > > If we as a group have an intent here, I don't know what it is. Issue 26 > asks how we wish to define "ambiguity", but the upshot of the discussion > starting from your message of 5 January [1] seems to me to have been > only that some members of the CG do not wish to define it; without a > definition of what counts as ambiguity, I don't think we can have a > coherent intent. > > If I had to predict what the CG will end up doing, my money would be on > leaving the effective definition of ambiguity implementation-defined and > specifying that processors MAY report ambiguity, rather than MUST, or > possibly specifying that IF processors detect ambiguity they MUST report > it (modulo user option to suppress the ambiguity flag in the output), > but not requiring that they detect ambiguity whenever it exists. (I > predict there will be CG members who would like to require the detection > of ambiguity, but without a crisp definition of ambiguity that > requirement lacks teeth.) > > One difficulty is that we have already seen that different > implementations of ixml use different underlying parsing methods, even > when the implementors all say they are using Earley parsing. But: > > - Implementations that parse using the ixml grammar directly and those > which translate the ixml grammar to BNF for parsing are working with > different grammars; their raw parse trees will differ and ambiguity in > one doesn't always mean ambiguity in another. > > - Implementations which translate the ixml grammar to BNF will not > necessarily use the same translations -- the obvious requirement is > that the grammar be equivalent and allow the construction of the XML > abstract syntax tree, and there is more than one BNF that meets those > criteria. > > - I think that some ways of recording parsing results may make it easy > to see whether there is more than one XML AST for a given sentence, > but I'm not sure that's true for every possible approach. > > We don't want to constrain the internals of any implementation, and we > want interoperability, and we want ambiguity to be flagged. I don't > think all of those three can be combined in their pure form; we are > going to have to weaken one or more of them. > > My two cents. > > Michael > > -- > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > http://blackmesatech.com > > [1] https://lists.w3.org/Archives/Public/public-ixml/2022Jan/0030.html >
Received on Monday, 14 February 2022 00:24:52 UTC