- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 17 Oct 2023 13:14:46 -0600
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: public-ixml@w3.org
Steven Pemberton <steven.pemberton@cwi.nl> writes: > With almost all grammars it is easy to identify the root. It's the > only unused rule. "Many" I might buy. But 'almost all' seems implausible to me. It's an empirical statement, but difficult to test. My subjective impression is that interesting grammars quite often have a recursive root symbol. > In a tiny number of extreme cases where the root is used in the body > of the grammar, you can add: > > -root: program. > > (where 'program' is the real root') and all is well. That's what I had in mind as a form of interfering in a big way with the design of the grammar. It also feels very odd to me in a language one of whose big selling points is that a processor can handle any context-free grammar, not just a subset of them. "Any context free grammar, as long as the start symbol is not recursive" just doesn't have the same ring for me. I suppose we might ease the usability issue by a more complicate rule for implementors, along the lines of: - If the processor supports an invocation-time option to give the start symbol, use that symbol. - Otherwise, if the start symbol is explicitly declared in the prolog, use that symbol. - Otherwise, if there is one and only one nonterminal not referred to from any other nonterminal, use that nonterminal as the start symbol. - Otherwise, use the first nonterminal in the grammar. If the user with the declare-before-use instinct had a non-recursive start symbol, and no other unreferenced symbols, then this would have worked for them. Michael > Steven > > On Tuesday 17 October 2023 15:39:22 (+02:00), C. M. Sperberg-McQueen wrote: > >> > Steven Pemberton <steven.pemberton@cwi.nl> writes: >> > > I noted a user of my implementation having lots of trouble this > week, >> > which they were unable to resolve. >> > >> > My original implementation identified the top-level rule by analysing >> > the grammar, but we later resolved that the top-level rule had to be >> > the first rule in the grammar. >> > Is there a reliable way to determine the start symbol by > analysing the >> productions? >> > I think there is not. So either there must be a convention like > the one >> we use (parallel, if memory serves, to conventions in some other parsing >> systems), or there must be additional syntax for identifying the start >> symbol. >> > > This user apparently comes from a define-before-use background, > and so >> > consistently had the root rule as the last in the file. As a result, >> > they didn't manage to get a single successful result. >> > >> > I'm not sure what to make of this. On the one hand, the spec > clearly says: >> > >> > "The root symbol of the grammar is the name of the first rule >> > in the grammar." >> > >> > >> > On the other hand, I feel bad for the user; I think notations should >> > try to serve users, and not the other way round: usability >> > first. >> > I feel bad for them, too. If I remember correctly, the Algol 60 > report >> also works bottom up, with a definition-before-use organizing principle. >> > > Which is why I did my original implementation that way. >> > I wonder whether you had any user who defined a grammar like >> > A = B; 'a'. >> B = A; 'b'. >> > If our plan is to identify the start symbol by selecting a > terminal not >> referred to, we are doomed to disappointment in this or in any grammar >> where the start symbol is recursive. If we try to avoid that >> disappointment by requiring that the start symbol not be recursive, we >> interfere with the design of the grammar in a really big way. >> > > Anyway, it's a potential discussion point. >> > Yes, agreed. >> -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Tuesday, 17 October 2023 19:26:24 UTC