Re: [invisibleXML/ixml] Dynamic naming / name from the input data (Issue #168) from C. M. Sperberg-McQueen on 2023-01-11 (public-ixml@w3.org from January 2023)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Wed, 11 Jan 2023 15:57:39 -0700
To: ixml <public-ixml@w3.org>, invisibleXML/ixml <reply+ABFB6WLFR54U6SVDU5JREHOBZQGPVEVBNHHFSFVXAI@reply.github.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq+github@blackmesatech.com>, Steven Pemberton <notifications@github.com>
Message-ID: <87v8lcy6j9.fsf@blackmesatech.com>

"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes:

I see that towards the end of my preceding comment I lost focus enough
that I failed to say explicitly some things that probably should be made
explicit.

First: the VW grammar given works, on the given input, to produce XML of
the same form as the input because the 'haiku' element is recognized by
a nonterminal named 'haiku', the 'l' or 'line' elements by a nonterminal
named 'l' or 'line', and so on.

The crucial idea is that the VW grammar given as input grammar generates
an (infinite) ixml grammar which is used to parse the input string.  In
practice, parsers for VW grammars generate a finite subset of the
infinite ixml grammar sufficiently large to handle the input.

To produce an element with a given name N, the requirement is to
generate a grammar in which a nonterminal named N generates the desired
element, and similarly also for attributes.  When the same name may be
used for both elements and attributes, some indirection and possibly
some cleverness in writing the grammar will be required.  Since VW
grammars are Turing complete, there is guaranteed to be a way, but it is
not guaranteed to be pretty.

Second:  the specific finite subset needed to parse the input will vary
with the input.  Consider the following sample input:

    <haiku>
      <author>Basho</date>
      <date>1686</author>
      
      <l>When the old pond</l>
      <l>gets a new frog</l>
      <l>it's a new pond.</l>
    </haiku>

One of the infinite grammar's sufficiently large subsets is given below.

    -document: ws?, element, ws? .
    -element: haiku.
    -element: author.
    -element: date.
    -element: l.
    haiku: starttag.haiku, content, endtag.haiku; soletag.haiku .
    author: starttag.author, content, endtag.author; soletag.author .
    date: starttag.date, content, endtag.date; soletag.date .
    l: starttag.l, content, endtag.l; soletag.l .
    -starttag.haiku:  -"<", gi.haiku, ws?, -">".
    -starttag.author:  -"<", gi.author, ws?, -">".
    -starttag.date:  -"<", gi.date, ws?, -">".
    -starttag.l:  -"<", gi.l, ws?, -">".
    -endtag.haiku:  -"</", gi.haiku, ws?, -">"
    -endtag.author:  -"</", gi.author, ws?, -">"
    -endtag.date:  -"</", gi.date, ws?, -">"
    -endtag.l:  -"</", gi.l, ws?, -">"

    -content: pcdata?, (element**pcdata, pcdata?)?.
    -pcdata:  (~["<>&"]; "&amp;"; "&lt;"; "&gt;"; "&apos;"; "&quot;")+.
    -ws:  -(#20; #A; #C; #9)+.
    
    -gi.haiku = letter h, gi.aiku.
    -gi.aiku = letter a, gi.iku.
    -gi.iku = letter i, gi.ku.
    -gi.ku = letter k, gi.u.
    -gi.u = letter u.
    -gi.author = letter a, gi.uthor.
    -gi.uthor = letter u, gi.thor.
    -gi.thor = letter t, gi.hor.
    -gi.hor = letter h, gi.or.
    -gi.or = letter o, gi.r.
    -gi.r = letter r.
    -gi.date = letter d, gi.ate.
    -gi.ate = letter a, gi.te.
    -gi.te = letter t, gi.e.
    -gi.e = letter e.
    -gi.l = letter l.

In writing it, I have used a modified ixml syntax.  As given, the
grammar violates the ixml spec's rule against multiple definitions of
the same nonterminal symbol; when the same nonterminal is defined
multiply (as for 'element'), each definition is an alternative.  So
'element' could also be defined thus:

    -element: haiku; author; date; l.
 
The grammar just given also uses a mixture of ixml and VW conventions
for terminal symbols.  Each occurrence of 'letter X' for any X could be
written as a quoted string literal, so the final rule would be:

    -gi.l - 'l'.

I leave reformulation of the grammar in pure conformant ixml as an
exercise for the reader.

Third: to make the behavior of an affix grammar reliably predictable,
some grammar writers take care to place hypernotions in metarules next
to characters which won't occur in the hypernotions.  In the following
metarule, the VW grammar given earlier uses '.' as a sort of delimiter
between the hypernotion NAME and the rest of the nonterminal of which it
forms a part.

    -starttag.NAME:  -"<", gi.NAME, ws?, -">".

In attribute grammars, a similar simplification is achieved by making
inherited and synthesized attributes be syntactically distinct from the
nonterminals they decorate.  My limited experience with attribute and
affix grammars is that attribute grammars are much easier to write,
read, understand, and reason about than unrestricted affix grammars.  I
suspect (although I cannot offer any argument) that attribute grammars
are easier to constrain in ways that limit their expressive power that
affix grammars, and that we have a better hope of avoiding the slippery
slope towards a Turing-complete grammar formalism if we think about
mechanisms for this use case in terms of highly restricted atribute
grammars than if we think about them in terms of affix grammars.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Wednesday, 11 January 2023 23:51:24 UTC