Re: Identification of root rule

Steven Pemberton <steven.pemberton@cwi.nl> writes:

> With almost all grammars it is easy to identify the root. It's the
> only unused rule.

"Many" I might buy.  But 'almost all' seems implausible to me.  It's an
empirical statement, but difficult to test.  My subjective impression is
that interesting grammars quite often have a recursive root symbol.

> In a tiny number of extreme cases where the root is used in the body
> of the grammar, you can add:
>
> 	-root: program.
>
> (where 'program' is the real root') and all is well.

That's what I had in mind as a form of interfering in a big way with the
design of the grammar.  It also feels very odd to me in a language one
of whose big selling points is that a processor can handle any
context-free grammar, not just a subset of them.  "Any context free
grammar, as long as the start symbol is not recursive" just doesn't have
the same ring for me.

I suppose we might ease the usability issue by a more complicate rule
for implementors, along the lines of:

  - If the processor supports an invocation-time option to give the
    start symbol, use that symbol.
    
  - Otherwise, if the start symbol is explicitly declared in the prolog,
    use that symbol.

  - Otherwise, if there is one and only one nonterminal not referred to
    from any other nonterminal, use that nonterminal as the start
    symbol.

  - Otherwise, use the first nonterminal in the grammar.

If the user with the declare-before-use instinct had a non-recursive
start symbol, and no other unreferenced symbols, then this would have
worked for them.

Michael


> Steven
>
> On Tuesday 17 October 2023 15:39:22 (+02:00), C. M. Sperberg-McQueen wrote:
>
>>  > Steven Pemberton <steven.pemberton@cwi.nl> writes:
>>  > > I noted a user of my implementation having lots of trouble this
>       week,
>> > which they were unable to resolve.
>> >
>> > My original implementation identified the top-level rule by analysing
>> > the grammar, but we later resolved that the top-level rule had to be
>> > the first rule in the grammar.
>>  > Is there a reliable way to determine the start symbol by
>     analysing the
>> productions?
>>  > I think there is not.  So either there must be a convention like
>     the one
>> we use (parallel, if memory serves, to conventions in some other parsing
>> systems), or there must be additional syntax for identifying the start
>> symbol.
>>  > > This user apparently comes from a define-before-use background,
>       and so
>> > consistently had the root rule as the last in the file. As a result,
>> > they didn't manage to get a single successful result.
>> >
>> > I'm not sure what to make of this. On the one hand, the spec
>     clearly says:
>> >
>> > 	"The root symbol of the grammar is the name of the first rule
>> > 	in the grammar."
>> >
>> >
>> > On the other hand, I feel bad for the user; I think notations should
>> > try to serve users, and not the other way round: usability
>> > first.
>>  > I feel bad for them, too.  If I remember correctly, the Algol 60
>     report
>> also works bottom up, with a definition-before-use organizing principle.
>>  > > Which is why I did my original implementation that way.
>>  > I wonder whether you had any user who defined a grammar like
>>  >     A = B; 'a'.
>>     B = A; 'b'.
>>  > If our plan is to identify the start symbol by selecting a
>     terminal not
>> referred to, we are doomed to disappointment in this or in any grammar
>> where the start symbol is recursive.  If we try to avoid that
>> disappointment by requiring that the start symbol not be recursive, we
>> interfere with the design of the grammar in a really big way.
>>  > > Anyway, it's a potential discussion point.
>>  > Yes, agreed.
>> 


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 17 October 2023 19:26:24 UTC