W3C home > Mailing lists > Public > public-ixml@w3.org > April 2021

Re: design question? or puzzle?

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Tue, 13 Apr 2021 13:02:46 +0000
Message-Id: <1618317745926.3322051882.2335600179@cwi.nl>
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-ixml@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
> This may be a design question for ixml, or it may be just a puzzle about how to use ixml to achieve the goal. I’m not sure.
I think the latter. People are pressurising me to do something on ixml for Declarative Amsterdam (advert: 4 and 5 November 2021), and maybe I should do a tutorial. Which means I have to finish a releasable implementation. Oh well, at least I have something to do over the summer.


I was sure I already published an example of doing the sort of thing you are asking for here, but I don't seem to be able to find it. Did I dream it? Or do I have to search harder. I'll do the latter.

> First, consider a simple grammar for arithmetic expressions:
>
> expression: sum.
> sum: product+addop.
> product: factor+mulop.
> factor: number; identifier; ‘(‘, expression, ‘).
> addop: ‘+’; ‘-‘.
> mulop: ‘/‘; ‘*’; ‘×’; ‘÷’.
> …
>
> An XML representation of this, as an ixml parser would produce it, works nicely to exhibit the structure of an expression like 2 x^3 + 17 x^2 -5x + 7:

The above grammar neither recognises this expression, nor produces the following serialisation, since it doesn't accept an implicit multiply sign, nor convert x^2 to x x.



> But this XML representation for the number 3 might feel ... a bit heavy in some contexts:
>
> <expression>
> <sum>
> <product>
> <factor>
> <number>3</number>
> </factor>
> </product>
> </sum>
> </expression>

True, but you can elide all of the unnecessary bits: you never need product or factor, since they are there for syntactic reasons, not semantic:


>
> Over time, I have come to believe that there are two approaches to the design of an XML vocabulary for situations like this. One is conscious design, which requires careful thought about which nodes in the raw parse tree to make visible and which to make invisible, and which when it works produces nice results. A nice mechanical fallback I have sometimes suggested in the past is to say that by default, the XML representation only includes leaf elements wrapping tokens, and elements with at least two children. Any element with a single element child is omitted. (This is similar, in its way, to the rule in some normal forms for grammars, such as Chomsky Normal Form, that there be no unit rules, of the form L = R.)
>
> The result is a somewhat more slender representation for the cubic expression:
>
> <sum>
> <product>
> <number>2</number>
> <identifier>x</identifier>
> <identifier>x</identifier>
> <identifier>x</identifier>
> </product>
>
> <addop>+</addop>
>
> <product>
> <number>17</number>
> <identifier>x</identifier>
> <identifier>x</identifier>
> </product>
>
> <addop>-</addop>
>
> <product>
> <number>5</number>
> <identifier>x</identifier>
> </product>
>
> <addop>+</addop>
>
> <number>7</number>
>
> </sum>
>
> And of course the ’number’ element containing the integer 3 is not now burdened with four ancestors which tell us that this is a sum, and a product, and a factor — all of which are true only as edge cases.
>
> <number>3</number>
>
> Now, of course ixml has annotations like - and @ and ^ to allow us to control the XML serialization.
>
> However: ixml annotations control the appearance or non-appearance of an element by its context, not by its content.
>
> Is there a way to write the ixml grammar so as to achieve the goal of serializing a nonterminal as an element if and only if it has two element children, and not to serialize it when it has only one element child?
>
> Does this constitute a case for re-thinking the control of serialization?
>
> Both of these are real questions, by the way, although I am beginning to have inklings of possible answers.
>
>
> Michael
>
>
>
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> cmsmcq@blackmesatech.com
> http://www.blackmesatech.com
> ********************************************
>
>
>
Received on Tuesday, 13 April 2021 13:03:07 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 13 April 2021 13:03:09 UTC