Re: Grammar analysis from C. M. Sperberg-McQueen on 2023-08-27 (public-ixml@w3.org from August 2023)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Sun, 27 Aug 2023 11:44:57 -0600
To: John Lumley <john@saxonica.com>
Cc: Norm Tovey-Walsh <norm@saxonica.com>, "Liam R. E. Quin" <liam@fromoldbooks.org>, graydonish@gmail.com, public-ixml@w3.org
Message-ID: <87wmxgwdrg.fsf@blackmesatech.com>

John Lumley <john@saxonica.com> writes:

> Sent from my iPad
>
>> On 27 Aug 2023, at 10:20, Norm Tovey-Walsh <norm@saxonica.com> wrote:
>> 
>> Just using () helps:
>> 
>>  rule: name, "=", value; () .
>> 
>> And you can certainly define your own terminal “empty”, but that won’t
>> help with examples like the one that started this thread where the
>> author chose not to do that.
>
> I think this is the simplest and most obvious. We could even modify
> the syntax (non-backwards-compatible) to mandate the use of an empty
> bracket pair in place of a whitespace-only alt…

While I'm still a little worried about bells, whistles, and slippery
slopes, John has touched my puzzle-solving button successfully and my
"How would you do that?" instinct has successfully kicked in.  What
would we need to make this possible?  I think it's just:

1 Change

    alt: term**(-",", s).

to: 

    alt: term++(-",", s).

2 Change

    -term: factor; option; repeat0; repeat1.

to

    -term: factor; option; repeat0; repeat1; empty-sequence.
    -empty-sequence: '()'.

3 Add a prose specification that empty-sequence, i.e. '()', matches the
empty sequence in the input string.

I dislike having to define () as magic in step 3.

................,

A magic-free alternative would be

1 Change

    alt: term**(-",", s).

to: 

    alt: term++(-",", s).

2 Observe in the prose that the empty string can be matched by writing

    []?

Something in the simplicity of this appeals to me, but I don't think
anyone particularly wants to write []? to denote the empty string.

If it is not obvious why the term []? has the required meaning, I
recommend it as a puzzle for a lazy Sunday afternoon or evening.

................

Oh, wait!  Another magic-free alternative has just occurred to me:

1 Change

    alt: term**(-",", s).

to: 

    alt: term++(-",", s).

2 Change

    -factor: terminal;
             nonterminal;
             insertion;
             -"(", s, alts, -")", s.

to

    -factor: terminal;
             nonterminal;
             insertion;
             -"(", s, alts?, -")", s.

or (if we wanted to forbid whitespace and comments between the
parentheses) to

    -factor: terminal;
             nonterminal;
             insertion;
             -"(", (s, alts)?, -")", s.

3 Note in the prose that as a consequence of the grammar rules (and in a
change from the 1.0 spec), a term matching the empty string cannot be
written using the empty string but must be written explicitly in the
grammar, for example using ().

................

We should bear in mind that we can define explicit syntax like 'ε' or
'EMPTY-STRING' or '$empty_string' to mean the language consisting of the
empty string, but we cannot require people to use it, since there is no
way to prevent people from finding other ways to expres the same idea.

But I think we can successfully define the grammar so that the empty
string in an ixml grammar cannot be used to match the empty string in
the input string.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Sunday, 27 August 2023 18:18:06 UTC