ACTION 2023-05-09-d: Steven to produce a discussion document for renaming (issue #13)

REQUIREMENT

To rename rules on serialisation




BACKGROUND

By definition, a rule accepts a single syntax on input. Thus different 
syntaxes have different element serialisations.




However there are use-cases where you would like to rename a rule on 
serialisation. The simplest example is with different date formats:




 31 December 1999

 1999-12-31




In this case the year and day have the same syntax, but the two forms have 
different month syntaxes, so on serialisation, only one can be called 
'month'. For instance: 




 dates: s?, date**s, s?.

 date: day,  -" ", month,  -" ", year;

       year, -"-", nmonth, -"-", day.

 day: d, d?.

 year: d, d, d, d.

 month: -"January", +"01"; -"February", +"02"; -"March", +"03";

        -"April", +"04"; -"May", +"05"; -"June", +"06"; 

        -"July", +"07"; -"August", +"08"; -"September", +"09";

        -"October", +"10"; -"November", +"11"; -"December", +"12".

 nmonth: "0"?, d; "1", ["0-2"].

 -d: ["0"-"9"].

 -s: -[" "; #9; #a; #d]+.




With input

 31 December 1999

 1999-12-31




gives:

 <dates>

    <date>

       <day>31</day>

       <month>12</month>

       <year>1999</year>

    </date>

    <date>

       <year>1999</year>

       <nmonth>12</nmonth>

       <day>31</day>

    </date>

 </dates>




The requirement is for a notation that says "The rule name is X, but on 
serialisation should be called Y". This applies to nonterminals, both 
elements and attributes, and should be usable on both definition and use.




POSSIBLE SYNTAXES




Noting that for 'renaming' of terminals, as with months above, you have the 
pattern 

 -"May", +"05"




One possibility might be:




 -nmonth+month: "0"?, d; "1", ["0-2"].




but that leading "-" is misleading, because the rule *will* be serialised, 
and it doesn't generalise to attributes.




The following visually suggests a renaming:




 nmonth>month: "0"?, d; "1", ["0-2"].




Or

 nmonth^month: "0"?, d; "1", ["0-2"].




but this latter one doesn't work well for attributes, unless we used a 
different renaming operator for those (which I think is overkill).




So my current preference falls to ">" to represent a renaming:




                 rule: (mark, s)?, naming, -["=:"], s, -alts, -".".

   nonterminal: (mark, s)?, naming.

       -naming: name, s, (">", s, rename, s)?.

         @name: namestart, namefollower*.

       @rename: name.




Or simplifying by factoring the mark into the naming rule:




          rule: naming, -["=:"], s, -alts, -".".

   nonterminal: naming.

       -naming: (mark, s)?, name, s, (">", s, rename, s)?.

         @name: namestart, namefollower*.

       @rename: name.







This means that the serialised version of the grammar remains the same, 
except that <rule> and <nonterminal> can now also carry a @rename.




   <rule name="nmonth" rename="month">




ROUNDTRIPPING




In passing, it is worth noting that although we haven't yet addressed 
roundtripping, this requirement does potentially introduce ambiguity in 
returning from the XML serialisation to the original form. In the dates 
example above the two can be distinguished by the order of day, month, 
year, but in the general case there can be two parses:




 <month>05</month> 




could be roundtripped as 

 05

or as

 May




Steven

Received on Saturday, 24 June 2023 13:38:00 UTC