- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 17 Oct 2023 13:14:46 -0600
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: public-ixml@w3.org
Steven Pemberton <steven.pemberton@cwi.nl> writes:
> With almost all grammars it is easy to identify the root. It's the
> only unused rule.
"Many" I might buy. But 'almost all' seems implausible to me. It's an
empirical statement, but difficult to test. My subjective impression is
that interesting grammars quite often have a recursive root symbol.
> In a tiny number of extreme cases where the root is used in the body
> of the grammar, you can add:
>
> -root: program.
>
> (where 'program' is the real root') and all is well.
That's what I had in mind as a form of interfering in a big way with the
design of the grammar. It also feels very odd to me in a language one
of whose big selling points is that a processor can handle any
context-free grammar, not just a subset of them. "Any context free
grammar, as long as the start symbol is not recursive" just doesn't have
the same ring for me.
I suppose we might ease the usability issue by a more complicate rule
for implementors, along the lines of:
- If the processor supports an invocation-time option to give the
start symbol, use that symbol.
- Otherwise, if the start symbol is explicitly declared in the prolog,
use that symbol.
- Otherwise, if there is one and only one nonterminal not referred to
from any other nonterminal, use that nonterminal as the start
symbol.
- Otherwise, use the first nonterminal in the grammar.
If the user with the declare-before-use instinct had a non-recursive
start symbol, and no other unreferenced symbols, then this would have
worked for them.
Michael
> Steven
>
> On Tuesday 17 October 2023 15:39:22 (+02:00), C. M. Sperberg-McQueen wrote:
>
>> > Steven Pemberton <steven.pemberton@cwi.nl> writes:
>> > > I noted a user of my implementation having lots of trouble this
> week,
>> > which they were unable to resolve.
>> >
>> > My original implementation identified the top-level rule by analysing
>> > the grammar, but we later resolved that the top-level rule had to be
>> > the first rule in the grammar.
>> > Is there a reliable way to determine the start symbol by
> analysing the
>> productions?
>> > I think there is not. So either there must be a convention like
> the one
>> we use (parallel, if memory serves, to conventions in some other parsing
>> systems), or there must be additional syntax for identifying the start
>> symbol.
>> > > This user apparently comes from a define-before-use background,
> and so
>> > consistently had the root rule as the last in the file. As a result,
>> > they didn't manage to get a single successful result.
>> >
>> > I'm not sure what to make of this. On the one hand, the spec
> clearly says:
>> >
>> > "The root symbol of the grammar is the name of the first rule
>> > in the grammar."
>> >
>> >
>> > On the other hand, I feel bad for the user; I think notations should
>> > try to serve users, and not the other way round: usability
>> > first.
>> > I feel bad for them, too. If I remember correctly, the Algol 60
> report
>> also works bottom up, with a definition-before-use organizing principle.
>> > > Which is why I did my original implementation that way.
>> > I wonder whether you had any user who defined a grammar like
>> > A = B; 'a'.
>> B = A; 'b'.
>> > If our plan is to identify the start symbol by selecting a
> terminal not
>> referred to, we are doomed to disappointment in this or in any grammar
>> where the start symbol is recursive. If we try to avoid that
>> disappointment by requiring that the start symbol not be recursive, we
>> interfere with the design of the grammar in a really big way.
>> > > Anyway, it's a potential discussion point.
>> > Yes, agreed.
>>
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Tuesday, 17 October 2023 19:26:24 UTC