- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 19 Mar 2024 07:56:38 -0600
- To: "Liam R. E. Quin" <liam@fromoldbooks.org>
- Cc: LdBeth <andpuke@foxmail.com>, public-ixml@w3.org
Thank you. I think you lay out some of the reasons it can be challenging to create in ixml (or in any purely grammatical notation) a grammar that captures exactly the same set of sentences, and assigns the same (or: structurally similar) parse trees to them, as existing tools for existing languages of any complexity. I think there is a general question possibly worth enunciating and possibly of interest from a computer-science point of view, as well as a practical specific instance of that question: (1) Given an ambiguous context-free grammar G and a rule R which chooses, for any ambiguous sentence in L(G), one of the possible parse trees, under what circumstances is it possible to create an unambiguous context-free grammar G' such that (a) L(G') = L(G) and (b) the parse trees produced by G' are compatible with those produced by G? (There's a little hand-waving here, because I don't have a clear definition of 'compatible' -- operationally, what I mean is that without extraordinary efforts we can derive the same abstract syntax tree from them that we would derive from the tree selected by R. But I don't know of any general formal definition for the mapping from parse trees to ASTs.) It's clear that for some (G, R) pairs the task is impossible. Any inherently ambiguous language will defeat the attempt to create a suitable G'. There may be ways of formulating the rule R that will always defeat any attempt to replicate the results grammatically in G'. It's also clear that for some (G, R) pairs the task is straightforward. (2) A more concrete formulation of the question: given a yacc grammar G with some operator-precedence and -associativity declarations, can we construct an unambiguous grammar G' that accepts the same sentences and produces compatible parse trees? If the only operator declarations are for left-associative binary operators, then we can clearly construct G' mechanically from G and the precedence declarations. If we add unary operators which all bind more tightly than any of the binary operators, the task is still straightforward. When some unary operators bind less tightly than some binary operators, then can the task be performed or not? I don't know the answer; I only know that my attempts so far have failed, either by rejecting some inputs accepted by yacc and yacc-style parsers or by rendering some sentences ambiguous. .... In the case of the comment "--- hi mom", the difficulty is precisely that of finding a way to say, with purely grammatical means (ie in ixml without extensions or pragmas), that it must be parsed as a comment and not as "-" followed by a comment. (Similar ambiguities could be constructed with slash, for comments beginning // or bounded by /* ... */.) Michael "Liam R. E. Quin" <liam@fromoldbooks.org> writes: > On Mon, 2024-03-18 at 22:09 -0600, C. M. Sperberg-McQueen wrote: >> Thank you. I'm n > > Some systems (e.g. lex) resolve ambiguities like - vs. -- with a > longest matching token rule. But this relies on a separate tokenisation > pass. This is used e.g. for a +++ b and ---c in the C programming > language (both legal, and equivalent to a + (++b) and - (--c), where ++ > and -- are unary autoincrement/decrement operators). > > On the other hand sock ** shoe; relies on knowing wither "sock" has > been declared earlier in the input as a variable or as a type, so there > is also feedback from the parser to the lexer. > > Both CSS and XPath have the problem that identifiers can contain > hyphens, so that space is needed in a subtraction a - b to distinguish > it from an identifier called "a-b". > > Hmm, not sure i’m helping. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Tuesday, 19 March 2024 14:16:09 UTC