- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 5 Jan 2022 13:08:43 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
> On 5,Jan2022, at 12:43 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > > Approach C: EBNF (derivation by expressions) > > We could say there are obviously two ways to derive the sentence in > the EBNF: > > 1 S > 2 'a'* > 3 '' > > and another is > > 1 S > 2 'b'* > 3 ‘' These should probably be different: 1 S 2 ‘a’* | ‘b’* 3 ‘a’* 4 ‘' and similarly for the other one. These derivations above appear to rely on a rule that we can replace an expression E in a sentential form with another expression which recognizes some non-empty subset of L(E). But given that rule, since L(‘a’* | ‘b’*) includes the empty string, we could simply write: 1 S 2 ‘a’* | ‘b’* 3 '' Rewriting them in this way seems to show that derivations in this style really don’t resemble parse trees the way that derivations using a BNF grammar do. One possible solution that seems relatively clean would be to say that for purposes of ixml, ambiguity is production of two or more different XML outputs (different after canonicalization, if we have to specify). Detecting that will not necessarily be easy or cheap, since multiple raw parse trees may turn into the same XML. And since it’s not easy or cheap, detecting ambiguity maybe needs to be downgraded to a SHOULD or MAY. It also ignores the fact that an ambiguous grammar in which all the ambiguities involve whitespace I really don’t care about will still cause unnecessary work for a parser, so I probably do want to hear about those ambiguities. So another possible solution would be to say that if the processor detects more than one way to parse the input using the grammar, the parser may report ambiguity, and since parsers are allowed to rewrite the grammar in any way that preserves the form of the XML output, the result is that on any given case, conforming some parsers may detect ambiguity where others do not. I continue to dislike both of these solutions, and every other solution I have thought of. Michael
Received on Wednesday, 5 January 2022 20:14:52 UTC