- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 11 Jan 2023 15:57:39 -0700
- To: ixml <public-ixml@w3.org>, invisibleXML/ixml <reply+ABFB6WLFR54U6SVDU5JREHOBZQGPVEVBNHHFSFVXAI@reply.github.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq+github@blackmesatech.com>, Steven Pemberton <notifications@github.com>
"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes: I see that towards the end of my preceding comment I lost focus enough that I failed to say explicitly some things that probably should be made explicit. First: the VW grammar given works, on the given input, to produce XML of the same form as the input because the 'haiku' element is recognized by a nonterminal named 'haiku', the 'l' or 'line' elements by a nonterminal named 'l' or 'line', and so on. The crucial idea is that the VW grammar given as input grammar generates an (infinite) ixml grammar which is used to parse the input string. In practice, parsers for VW grammars generate a finite subset of the infinite ixml grammar sufficiently large to handle the input. To produce an element with a given name N, the requirement is to generate a grammar in which a nonterminal named N generates the desired element, and similarly also for attributes. When the same name may be used for both elements and attributes, some indirection and possibly some cleverness in writing the grammar will be required. Since VW grammars are Turing complete, there is guaranteed to be a way, but it is not guaranteed to be pretty. Second: the specific finite subset needed to parse the input will vary with the input. Consider the following sample input: <haiku> <author>Basho</date> <date>1686</author> <l>When the old pond</l> <l>gets a new frog</l> <l>it's a new pond.</l> </haiku> One of the infinite grammar's sufficiently large subsets is given below. -document: ws?, element, ws? . -element: haiku. -element: author. -element: date. -element: l. haiku: starttag.haiku, content, endtag.haiku; soletag.haiku . author: starttag.author, content, endtag.author; soletag.author . date: starttag.date, content, endtag.date; soletag.date . l: starttag.l, content, endtag.l; soletag.l . -starttag.haiku: -"<", gi.haiku, ws?, -">". -starttag.author: -"<", gi.author, ws?, -">". -starttag.date: -"<", gi.date, ws?, -">". -starttag.l: -"<", gi.l, ws?, -">". -endtag.haiku: -"</", gi.haiku, ws?, -">" -endtag.author: -"</", gi.author, ws?, -">" -endtag.date: -"</", gi.date, ws?, -">" -endtag.l: -"</", gi.l, ws?, -">" -content: pcdata?, (element**pcdata, pcdata?)?. -pcdata: (~["<>&"]; "&"; "<"; ">"; "'"; """)+. -ws: -(#20; #A; #C; #9)+. -gi.haiku = letter h, gi.aiku. -gi.aiku = letter a, gi.iku. -gi.iku = letter i, gi.ku. -gi.ku = letter k, gi.u. -gi.u = letter u. -gi.author = letter a, gi.uthor. -gi.uthor = letter u, gi.thor. -gi.thor = letter t, gi.hor. -gi.hor = letter h, gi.or. -gi.or = letter o, gi.r. -gi.r = letter r. -gi.date = letter d, gi.ate. -gi.ate = letter a, gi.te. -gi.te = letter t, gi.e. -gi.e = letter e. -gi.l = letter l. In writing it, I have used a modified ixml syntax. As given, the grammar violates the ixml spec's rule against multiple definitions of the same nonterminal symbol; when the same nonterminal is defined multiply (as for 'element'), each definition is an alternative. So 'element' could also be defined thus: -element: haiku; author; date; l. The grammar just given also uses a mixture of ixml and VW conventions for terminal symbols. Each occurrence of 'letter X' for any X could be written as a quoted string literal, so the final rule would be: -gi.l - 'l'. I leave reformulation of the grammar in pure conformant ixml as an exercise for the reader. Third: to make the behavior of an affix grammar reliably predictable, some grammar writers take care to place hypernotions in metarules next to characters which won't occur in the hypernotions. In the following metarule, the VW grammar given earlier uses '.' as a sort of delimiter between the hypernotion NAME and the rest of the nonterminal of which it forms a part. -starttag.NAME: -"<", gi.NAME, ws?, -">". In attribute grammars, a similar simplification is achieved by making inherited and synthesized attributes be syntactically distinct from the nonterminals they decorate. My limited experience with attribute and affix grammars is that attribute grammars are much easier to write, read, understand, and reason about than unrestricted affix grammars. I suspect (although I cannot offer any argument) that attribute grammars are easier to constrain in ways that limit their expressive power that affix grammars, and that we have a better hope of avoiding the slippery slope towards a Turing-complete grammar formalism if we think about mechanisms for this use case in terms of highly restricted atribute grammars than if we think about them in terms of affix grammars. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 11 January 2023 23:51:24 UTC