- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 11 Jan 2023 10:21:42 -0700
- To: ixml <public-ixml@w3.org>, invisibleXML/ixml <reply+ABFB6WLFR54U6SVDU5JREHOBZQGPVEVBNHHFSFVXAI@reply.github.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq+github@blackmesatech.com>, Steven Pemberton <notifications@github.com>
I'm finding it a little hard to follow Steven's comments here, given that either his mail user agent seems not to distinguish in any visible way between quoted material and new material, or some process in the middle (maybe Github?) is stripping the distinction out. Github's refusal to allow post-hoc editing of email comments on an issue also doesn't help. In case Github is the culprit, I am sending this mail to public-ixml as well as to the reply address on Steven's mail. Steven Pemberton <notifications@github.com> writes: > On Tuesday 13 December 2022 17:39:02 (+01:00), C. M. Sperberg-McQueen wrote: > >> Issue 13 suggests allowing different nonterminals to be serialized >> with the same element or attribute name, in a way that allows the >> expected name to be determined by inspection of the grammar. > > Yes, issue 13 addresses the problem of naming in the serialisation > being bound to the input syntax. I'm not quite sure what "bound to the input syntax" means. Or rather, I guess it must mean that the names used in the serialization can be found by inspecting the ixml grammar without reference to the input (although my internal parser for English doesn't quite see how that meaning emerges from those words). So, so far we seem to be in agreement on the subject of issue #13. > Issue 13 is about recognising that the input syntax is different, but > the output serialisation is the same. It is only about static > renaming. >> Experience with ixml grammars for parsing XML suggests it may be >> helpful to contemplate allowing elements and attributes to carry >> names given in (or more generally derived from) the input stream. > Dynamic renaming is a whole other kettle of fish, Good. We are in agreement again: issue #168 and issue #13 are usefully distinct. > and once you add variables, you open a whole can of worms. I'm not sure anyone has suggested variables, but I agree that adding them has great potential for worminess. > Just suggesting it puts us on the slippery slope already, and should > be approached with care. The end of the slope is Turing-completeness, > and is reached very quickly. I'm glad to see you agree. > But for people interested, take a look at Affix Grammars, which > address the issue. > > https://en.wikipedia.org/wiki/Affix_grammar in particular the section > headed Types. Thank you for the pointer. Since it may not be obvious to all readers how VW grammars would be used to implement dynamic naming, perhaps it would be helpful to have a worked example showing how two-level grammars could make this work. I append one to this mail. Steven is right, I think, to suggest that VW grammars show very convincingly that Turing completeness may be achieved with great economy of mechanism (which in turn means that any mechanism we invent might end us at the bottom of that fabled slippery slope). But I do not suggest VW grammars as a solution to the use cases described here; I think these use cases can be supported by mechanisms which are weaker and easier to work with than VW grammars (easier to work with both for grammar writers and for processor developers). Example Consider an ixml grammar for a simple approximation of XML, similar in spirit but simpler than the one given in the paper on pragmas by Hillman et al. in the proceedings of last year's Balisage. Unlike that one, the grammar I have in mind omits attributes, comments, and processing instructions. It would recognize input like the following: <haiku> <author>Basho</date> <date>1686</author> <l>When the old pond</l> <l>gets a new frog</l> <l>it's a new pond.</l> </haiku> But also <haiku> <author>Basho</author> <date>1686</date> <line>When the old pond</line> <line>gets a new frog</line> <line>it's a new pond.</line> </uhuru> And of course it does not generate XML that looks like its input. [1] https://balisage.net/Proceedings/vol27/html/Sperberg-McQueen01/BalisageVol27-Sperberg-McQueen01.html#d9306e986 If we imagine a parser for VW grammars which works like an ixml processor in serializing a parse tree (specifically the parse tree against the first-level context-free grammar generated by the two-level input grammar), then I believe Steven must have some mechanism roughly similar to the following in mind. First, the VW grammar has hyperrules, which by convention use :: to separate left- and right-hand sides, and no commas between terms. { NAME will be used for nonterminals } NAME :: LETTER; LETTER NAMECHARS. NAMECHARS :: NAMECHAR NAMECHARS. LETTER :: a; b; c; d; ... ; z. DIGIT :: 0; 1; 2; ... ; 9. NAMECHAR :: LETTER; DIGIT; _. Note that NAME defines an infinite set of strings like a, ab, abc, aaa994, l, haiku, and so on. As does NAMECHARS (which includes additional strings like 994 and _994). Second, the VW grammar has metarules, which can be thought of as patterns for rules in a context-free grammar, made up of fragments of a conventional context-free grammar and hypernotions (the things defined by hyperrules). I'm going to use ixml syntax for the meta-rules, more or less, and to simplify my own life I'm not going to try to rewrite our quoted literals and other terminals using the 'letter x' convention. document: ws?, element, ws? . -element: NAME. NAME: starttag.NAME, content, endtag.NAME; soletag.NAME . -starttag.NAME: -"<", gi.NAME, ws?, -">". -endtag.NAME: -"</", gi.NAME, ws?, -">" -soletag.NAME: -"<", gi.NAME, ws?, -"/>". -content: pcdata?, (element**pcdata, pcdata?)?. -pcdata: (~["<>&"]; "&"; "<"; ">"; "'"; """)+. -ws: -(#20; #A; #C; #9)+. -gi.NAMECHAR = letter NAMECHAR. -gi.NAMECHAR NAMECHARS = letter NAMECHAR, gi.NAMECHARS. Note that in this imaginary VW-flavored ixml, whitespace in nonterminals is ignored. So the last meta-rule syntactically OK, not an error. Note also that by convention (or fiat), a symbol of the form 'letter' + anything appear in the generated context-free grammar denotes a terminal symbol, just as in ixml a character enclosed in quotation marks denotes a terminal symbol. Here, the second metarule defines an infinite number of rules in the first-level context-free grammar, including element: a. element: ab. element: abc. element: aaa994. element: l. element: haiku. The third rule similarly generates an infinite number of rules, including: a: starttag.a, content, endtag.a; soletag.a . ab: starttag.ab, content, endtag.ab; soletag.ab . abc: starttag.abc, content, endtag.abc; soletag.abc . aaa994: starttag.aaa994, content, endtag.aaa994; soletag.aaa994 . l: starttag.l, content, endtag.l; soletag.l . haiku: starttag.haiku, content, endtag.haiku; soletag.haiku . Note that when a first-level rule is generated from the third meta-rule, NAME is replaced by the same string in all occurrences. So the third meta-rule does NOT generate anything like {NOT} haiku: starttag.a, content, endtag.ab; soletag.aaa994 . {NOT} The fourth metarule, meanwhile, generates rules like these: -starttag.ab: -"<", gi.ab, ws?, -">". -starttag.aaa994: -"<", gi.aaa994, ws?, -">". -starttag.haiku: -"<", gi.haiku, ws?, -">". -starttag.l: -"<", gi.l, ws?, -">". The nonterminals gi.ab, gi.aaa994, gi.haiku, and gi.l rely on first-level rules which are generate by the last two meta-rules. Each of these meta-rules generates an infinite number of first-level rules, including the following, which are important for the derivation of the well-formed example above: -gi.haiku = letter h, gi.aiku. -gi.aiku = letter a, gi.iku. -gi.iku = letter i, gi.ku. -gi.ku = letter k, gi.u. -gi.u = letter u. -gi.l = letter l. End of example. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 11 January 2023 19:49:59 UTC