Re: [invisibleXML/ixml] Dynamic naming / name from the input data (Issue #168) from C. M. Sperberg-McQueen on 2023-01-11 (public-ixml@w3.org from January 2023)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Wed, 11 Jan 2023 10:21:42 -0700
To: ixml <public-ixml@w3.org>, invisibleXML/ixml <reply+ABFB6WLFR54U6SVDU5JREHOBZQGPVEVBNHHFSFVXAI@reply.github.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq+github@blackmesatech.com>, Steven Pemberton <notifications@github.com>
Message-ID: <874jswzwa2.fsf@blackmesatech.com>
I'm finding it a little hard to follow Steven's comments here, given
that either his mail user agent seems not to distinguish in any visible
way between quoted material and new material, or some process in the
middle (maybe Github?) is stripping the distinction out.  Github's
refusal to allow post-hoc editing of email comments on an issue also
doesn't help.  In case Github is the culprit, I am sending this mail to
public-ixml as well as to the reply address on Steven's mail.

Steven Pemberton <notifications@github.com> writes:

> On Tuesday 13 December 2022 17:39:02 (+01:00), C. M. Sperberg-McQueen wrote:
>
>> Issue 13 suggests allowing different nonterminals to be serialized
>> with the same element or attribute name, in a way that allows the
>> expected name to be determined by inspection of the grammar.
>
> Yes, issue 13 addresses the problem of naming in the serialisation
> being bound to the input syntax.

I'm not quite sure what "bound to the input syntax" means.  Or rather, I
guess it must mean that the names used in the serialization can be found
by inspecting the ixml grammar without reference to the input (although
my internal parser for English doesn't quite see how that meaning
emerges from those words).  So, so far we seem to be in agreement on the
subject of issue #13.

> Issue 13 is about recognising that the input syntax is different, but
> the output serialisation is the same. It is only about static
> renaming.

>> Experience with ixml grammars for parsing XML suggests it may be
>> helpful to contemplate allowing elements and attributes to carry
>> names given in (or more generally derived from) the input stream.

> Dynamic renaming is a whole other kettle of fish,

Good.  We are in agreement again:  issue #168 and issue #13 are usefully
distinct.

> and once you add variables, you open a whole can of worms.

I'm not sure anyone has suggested variables, but I agree that adding
them has great potential for worminess.

> Just suggesting it puts us on the slippery slope already, and should
> be approached with care. The end of the slope is Turing-completeness,
> and is reached very quickly.

I'm glad to see you agree.

> But for people interested, take a look at Affix Grammars, which
> address the issue.
>
> https://en.wikipedia.org/wiki/Affix_grammar in particular the section
> headed Types.

Thank you for the pointer.

Since it may not be obvious to all readers how VW grammars would be used
to implement dynamic naming, perhaps it would be helpful to have a
worked example showing how two-level grammars could make this work.  I
append one to this mail.

Steven is right, I think, to suggest that VW grammars show very
convincingly that Turing completeness may be achieved with great economy
of mechanism (which in turn means that any mechanism we invent might end
us at the bottom of that fabled slippery slope).  But I do not suggest
VW grammars as a solution to the use cases described here; I think these
use cases can be supported by mechanisms which are weaker and easier to
work with than VW grammars (easier to work with both for grammar writers
and for processor developers).


Example

Consider an ixml grammar for a simple approximation of XML, similar in
spirit but simpler than the one given in the paper on pragmas by Hillman
et al. in the proceedings of last year's Balisage.  Unlike that one, the
grammar I have in mind omits attributes, comments, and processing
instructions.  It would recognize input like the following:

    <haiku>
      <author>Basho</date>
      <date>1686</author>
      
      <l>When the old pond</l>
      <l>gets a new frog</l>
      <l>it's a new pond.</l>
    </haiku>

But also

    <haiku>
      <author>Basho</author>
      <date>1686</date>
      
      <line>When the old pond</line>
      <line>gets a new frog</line>
      <line>it's a new pond.</line>
    </uhuru>

And of course it does not generate XML that looks like its input.

[1] https://balisage.net/Proceedings/vol27/html/Sperberg-McQueen01/BalisageVol27-Sperberg-McQueen01.html#d9306e986

If we imagine a parser for VW grammars which works like an ixml
processor in serializing a parse tree (specifically the parse tree
against the first-level context-free grammar generated by the two-level
input grammar), then I believe Steven must have some mechanism roughly
similar to the following in mind.

First, the VW grammar has hyperrules, which by convention use :: to
separate left- and right-hand sides, and no commas between terms.

    { NAME will be used for nonterminals }
    NAME :: LETTER; LETTER NAMECHARS.
    NAMECHARS :: NAMECHAR NAMECHARS.
    LETTER :: a; b; c; d; ... ; z.
    DIGIT :: 0; 1; 2; ... ; 9.
    NAMECHAR :: LETTER; DIGIT; _.

Note that NAME defines an infinite set of strings like a, ab, abc,
aaa994, l, haiku, and so on.  As does NAMECHARS (which includes
additional strings like 994 and _994).

Second, the VW grammar has metarules, which can be thought of as
patterns for rules in a context-free grammar, made up of fragments of a
conventional context-free grammar and hypernotions (the things defined
by hyperrules).  I'm going to use ixml syntax for the meta-rules, more
or less, and to simplify my own life I'm not going to try to rewrite our
quoted literals and other terminals using the 'letter x' convention.

    document: ws?, element, ws? .
    -element: NAME.
    NAME: starttag.NAME, content, endtag.NAME; soletag.NAME .
    -starttag.NAME:  -"<", gi.NAME, ws?, -">".
    -endtag.NAME:  -"</", gi.NAME, ws?, -">"
    -soletag.NAME:  -"<", gi.NAME, ws?, -"/>".
    -content: pcdata?, (element**pcdata, pcdata?)?.
    -pcdata:  (~["<>&"]; "&amp;"; "&lt;"; "&gt;"; "&apos;"; "&quot;")+.
    -ws:  -(#20; #A; #C; #9)+.
    
    -gi.NAMECHAR = letter NAMECHAR.
    -gi.NAMECHAR NAMECHARS = letter NAMECHAR, gi.NAMECHARS.

Note that in this imaginary VW-flavored ixml, whitespace in nonterminals
is ignored.  So the last meta-rule syntactically OK, not an error.

Note also that by convention (or fiat), a symbol of the form 'letter' +
anything appear in the generated context-free grammar denotes a terminal
symbol, just as in ixml a character enclosed in quotation marks denotes
a terminal symbol.

Here, the second metarule defines an infinite number of rules in the
first-level context-free grammar, including

    element: a.
    element: ab.
    element: abc.
    element: aaa994.
    element: l.
    element: haiku.

The third rule similarly generates an infinite number of rules,
including:

    a: starttag.a, content, endtag.a; soletag.a .
    ab: starttag.ab, content, endtag.ab; soletag.ab .
    abc: starttag.abc, content, endtag.abc; soletag.abc .
    aaa994: starttag.aaa994, content, endtag.aaa994; soletag.aaa994 .
    l: starttag.l, content, endtag.l; soletag.l .
    haiku: starttag.haiku, content, endtag.haiku; soletag.haiku .

Note that when a first-level rule is generated from the third meta-rule,
NAME is replaced by the same string in all occurrences.  So the third
meta-rule does NOT generate anything like 

    {NOT} haiku: starttag.a, content, endtag.ab; soletag.aaa994 . {NOT}

The fourth metarule, meanwhile, generates rules like these:

    -starttag.ab:  -"<", gi.ab, ws?, -">".
    -starttag.aaa994:  -"<", gi.aaa994, ws?, -">".
    -starttag.haiku:  -"<", gi.haiku, ws?, -">".
    -starttag.l:  -"<", gi.l, ws?, -">".

The nonterminals gi.ab, gi.aaa994, gi.haiku, and gi.l rely on
first-level rules which are generate by the last two meta-rules.  Each
of these meta-rules generates an infinite number of first-level rules,
including the following, which are important for the derivation of the
well-formed example above:

    -gi.haiku = letter h, gi.aiku.
    -gi.aiku = letter a, gi.iku.
    -gi.iku = letter i, gi.ku.
    -gi.ku = letter k, gi.u.
    -gi.u = letter u.

    -gi.l = letter l.

End of example.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Wednesday, 11 January 2023 19:49:59 UTC