Combining grammars

Hello,

In the interest of having at least a straw man to discuss on Tuesday,
here’s a sketch for how combining grammars could work. John was
interested in working on this too, but between the holidays and a bunch
of work in my day job, I haven’t had time to coordinate with him.

My initial thoughts for the bare minimum needed to declare victory are:

1. There’s only one set of nonterminal names.
2. Importing a grammar exposes the names of all the nonterminals
   in the imported grammar.
3. If the *importing* grammar declares nonterminals with the same
   name, those replace the names imported.
4. This is recursive, if A imports B which imports C, then what
   B exposes to A is the result of importing C and adding/replacing
   names in the set that are defined in B.

For example, consider the grammar, t.ixml:

  T = a | b .
  a = 'a' .
  b = 'b' .

Imagine that I import it like so:

  %%import "t.ixml"  { imaginary and bad syntax }

  S = T | c .
  b = 'B' .
  c = 'c' .

The effective grammar is:

  S = T | c .
  T = a | b .
  a = 'a' .
  b = 'B' .
  c = 'c' .

It’s easy to imagine more complicated semantics where symbol names are
always distinct and if there are two “versions” of a symbol that becomes
a choice, but I’m not sure that’s necessary.

One obvious problem is *accidental* name collisions. Suppose I didn’t
intende to replace 'b' in the example above. Then we could have an
explicit renaming mechanism (akin to Python’s mechanism for doing this).

  %%import "t.ixml" as "X."

  S = X.T | c .
  b = 'B' .
  c = 'c' .

The effective grammar is:

  S = X.T | c .
  X.T = X.a | X.b .
  X.a = 'a' .
  X.b = 'b' .
  b = 'B' .
  c = 'c' .

This requires the implementation to remember that the serialized names
of the symbols are the “unprefixed” names. An option for serializing
them with the fully qualified names might be useful.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Sunday, 8 January 2023 16:01:32 UTC