Re: *which* alternative that matches nothing? (was Re: repetition) from C. M. Sperberg-McQueen on 2021-12-16 (public-ixml@w3.org from December 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 16 Dec 2021 09:46:47 -0700
To: Dave Pawson <dave.pawson@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Message-Id: <2D81C454-B9E0-4828-882E-2470F846CB98@blackmesatech.com>

> On 16,Dec2021, at 8:30 AM, Dave Pawson <dave.pawson@gmail.com> wrote:
> 
> On Thu, 16 Dec 2021 at 15:20, C. M. Sperberg-McQueen
> <cmsmcq@blackmesatech.com> wrote:
> 
>> If we do want a special symbol for “the alternative that matches nothing”,
>> we need to be careful about the two meanings of that phrase.
>> 
>> - An alternative that matches the empty sequence (a sequence
>> consisting of nothing, the sequence of length 0) is one thing.  I usually
>> write a comment in that alternative to make it easier to see, and also
>> usually write it first, so
>> 
>>  X: Y.
>>  -Y: {nil}; Z;.
>> 
>> It can also be written in ixml as empty parens or as []?, but since
>> the latter re-introduces a question mark, it’s not a good candidate
>> for a rewriting system, which will promptly rewrite is as ([]; ).
> 
> Empty parens, makes sense. Empty parens  with question mark?
> I don't see where that comes from? Either empty or not?
> 

I can try to explain.  Please bear with me.

The expression () seems not to be a problem for you.  OK.
I won’t try to explain it then, on the principle of least said,
soonest mended.

Why does the expression []? denote the language consisting
of the empty sequence?

Empty brackets are a character inclusion with no members.
The meaning of a character inclusion is that at least one of the
members of the inclusion must match the current input
character (I hope people know what I mean by ’the current
input character', because I am not in a position to offer a simple 
clear definition).  An empty inclusion cannot satisfy that condition,
so no sequence of input symbols can match an empty inclusion.
So the language recognized by [] is the empty set.  In the
set notation I learned in school:  {} or ∅.

Adding a question mark to [] so as to form []? gives us an
expression recognizing a different and larger language.  In general,
for any expression E, the expression E? recognizes a language 
which contains (a) all the sentences of the language of E,
plus (b) the empty sequence of symbols.  As an operation on
sets of sentences, the operation ‘?’ adds the empty sequence
(often written ε) to the set.  

Since [] denotes the set ∅, the expression []? denotes the
union of {} and {ε}.  Now, it’s a theorem of set theory that 
the union of any set S with {} is S itself.  The union of {} and 
{ε} is {ε}, the set containing only the empty sequence.  

That is why I say that []? is a way of writing the empty sequence.
It recognizes a set of input sequences that contains the empty
sequence and no other sequences.

> 
> In
>> computer science books, I believe it’s usually written as an epsilon.
>> 
> 
> No thanks. Please keep it to an ASCII keyboard character.
> 
> 
>> - An alternative that matches no sequences at all, which no thing
>> matches.  This is an expression which denotes the language with
>> no sentences, i.e. the empty set.  And this is the meaning most
>> naturally associated with the symbol “∅”.
> 
> Any reason why [] should not be given this definition?

It does have that definition.  Not as an ad hoc rule, but as a 
consequence of the way character inclusions are defined.

Michael

Received on Thursday, 16 December 2021 16:47:18 UTC