*which* alternative that matches nothing? (was Re: repetition)

> On 16,Dec2021, at 4:07 AM, Norm Tovey-Walsh <norm@saxonica.com> wrote:
> 
> 
> I find “the alternative that matches nothing” in the grammar quite hard
> to read. I almost wish we had a special symbol for it, like ∅. Then we’d
> have:
> 
> X: Y .
> -Y: Z ; ∅ .
> -Z: a, A .
> -A: "#", a, A ; ∅ .
> 
> Unless I’ve got something wrong along the way, of course.

If we do want a special symbol for “the alternative that matches nothing”,
we need to be careful about the two meanings of that phrase.

- An alternative that matches the empty sequence (a sequence 
consisting of nothing, the sequence of length 0) is one thing.  I usually 
write a comment in that alternative to make it easier to see, and also 
usually write it first, so

  X: Y.
  -Y: {nil}; Z;.

It can also be written in ixml as empty parens or as []?, but since
the latter re-introduces a question mark, it’s not a good candidate
for a rewriting system, which will promptly rewrite is as ([]; ).  In
computer science books, I believe it’s usually written as an epsilon.

- An alternative that matches no sequences at all, which no thing
matches.  This is an expression which denotes the language with
no sentences, i.e. the empty set.  And this is the meaning most
naturally associated with the symbol “∅”.

This can be written in ixml as [].  An inclusion is matched if the
next input symbol matches at least one of its members; since []
has no members, it cannot be matched.  

When this was discussed earlier, I think the prevailing opinion was that
if we introduce special symbols like ε and ∅ we have to explain
what they mean to readers who aren’t familiar with them, some of
whom at least will see them and start writing off ixml as too 
complicated and mathy.  If we don’t have special symbols for them,
those who know they want to write expressions with those meanings
will find expressions like () and [] and ~[] and {empty} and be
perfectly happy.

Michael

Received on Thursday, 16 December 2021 15:21:08 UTC