W3C home > Mailing lists > Public > public-ixml@w3.org > April 2021

design question? or puzzle?

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Sun, 11 Apr 2021 18:25:16 -0600
Message-Id: <6173DE4E-1FB0-4D8B-87F7-AAFE5B750ED3@blackmesatech.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: public-ixml@w3.org
Greetings, friends.

This may be a design question for ixml, or it may be just a puzzle about how to use ixml to achieve the goal.  I’m not sure.

First, consider a simple grammar for arithmetic expressions:

expression:  sum.
sum:  product+addop.
product:  factor+mulop.
factor:  number; identifier; ‘(‘, expression, ‘).
addop:  ‘+’; ‘-‘.
mulop: ‘/‘; ‘*’; ‘×’; ‘÷’.
…

An XML representation of this, as an ixml parser would produce it, works nicely to exhibit the structure of an expression like 2 x^3 + 17 x^2 -5x + 7:

 <expression>
   <sum>
     <product>
       <factor>
         <number>2</number>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
     </product>

     <addop>+</addop>

     <product>
       <factor>
         <number>17</number>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
     </product>

     <addop>-</addop>
     <product>
       <factor>
         <number>5</number>
       </factor>
       <factor>
         <identifier>x</identifier>
       </factor>
     </product>

     <addop>+</addop>

     <product>
       <factor>
         <number>7</number>
       </factor>
     </product>        
   </sum>
 </expression>


But this XML representation for the number 3 might feel ... a bit heavy in some contexts:

 <expression>
   <sum>
     <product>
       <factor>
         <number>3</number>
       </factor>
     </product>        
   </sum>
 </expression>


Over time, I have come to believe that there are two approaches to the design of an XML vocabulary for situations like this.  One is conscious design, which requires careful thought about which nodes in the raw parse tree to make visible and which to make invisible, and which when it works produces nice results.  A nice mechanical fallback I have sometimes suggested in the past is to say that by default, the XML representation only includes leaf elements wrapping tokens, and elements with at least two children.  Any element with a single element child is omitted.  (This is similar, in its way, to the rule in some normal forms for grammars, such as Chomsky Normal Form, that there be no unit rules, of the form L = R.)

The result is a somewhat more slender representation for the cubic expression: 

 <sum>
   <product>
     <number>2</number>
     <identifier>x</identifier>
     <identifier>x</identifier>
     <identifier>x</identifier>
   </product>

   <addop>+</addop>

   <product>
     <number>17</number>
     <identifier>x</identifier>
     <identifier>x</identifier>
   </product>

   <addop>-</addop>

   <product>
     <number>5</number>
     <identifier>x</identifier>
   </product>

   <addop>+</addop>

   <number>7</number>

 </sum>

And of course the ’number’ element containing the integer 3 is not now burdened with four ancestors which tell us that this is a sum, and a product, and a factor — all of which are true only as edge cases.

 <number>3</number>

Now, of course ixml has annotations like - and @ and ^ to allow us to control the XML serialization.

However:  ixml annotations control the appearance or non-appearance of an element by its context, not by its content.

Is there a way to write the ixml grammar so as to achieve the goal of serializing a nonterminal as an element if and only if it has two element children, and not to serialize it when it has only one element child?  

Does this constitute a case for re-thinking the control of serialization?  

Both of these are real questions, by the way, although I am beginning to have inklings of possible answers.


Michael




********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
Received on Monday, 12 April 2021 00:25:37 UTC

This archive was generated by hypermail 2.4.0 : Monday, 12 April 2021 00:25:38 UTC