- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Sat, 24 Jun 2023 13:37:53 +0000
- To: ixml <public-ixml@w3.org>
- Message-Id: <1687613524840.1400920388.3110888934@cwi.nl>
REQUIREMENT
To rename rules on serialisation
BACKGROUND
By definition, a rule accepts a single syntax on input. Thus different
syntaxes have different element serialisations.
However there are use-cases where you would like to rename a rule on
serialisation. The simplest example is with different date formats:
31 December 1999
1999-12-31
In this case the year and day have the same syntax, but the two forms have
different month syntaxes, so on serialisation, only one can be called
'month'. For instance:
dates: s?, date**s, s?.
date: day, -" ", month, -" ", year;
year, -"-", nmonth, -"-", day.
day: d, d?.
year: d, d, d, d.
month: -"January", +"01"; -"February", +"02"; -"March", +"03";
-"April", +"04"; -"May", +"05"; -"June", +"06";
-"July", +"07"; -"August", +"08"; -"September", +"09";
-"October", +"10"; -"November", +"11"; -"December", +"12".
nmonth: "0"?, d; "1", ["0-2"].
-d: ["0"-"9"].
-s: -[" "; #9; #a; #d]+.
With input
31 December 1999
1999-12-31
gives:
<dates>
<date>
<day>31</day>
<month>12</month>
<year>1999</year>
</date>
<date>
<year>1999</year>
<nmonth>12</nmonth>
<day>31</day>
</date>
</dates>
The requirement is for a notation that says "The rule name is X, but on
serialisation should be called Y". This applies to nonterminals, both
elements and attributes, and should be usable on both definition and use.
POSSIBLE SYNTAXES
Noting that for 'renaming' of terminals, as with months above, you have the
pattern
-"May", +"05"
One possibility might be:
-nmonth+month: "0"?, d; "1", ["0-2"].
but that leading "-" is misleading, because the rule *will* be serialised,
and it doesn't generalise to attributes.
The following visually suggests a renaming:
nmonth>month: "0"?, d; "1", ["0-2"].
Or
nmonth^month: "0"?, d; "1", ["0-2"].
but this latter one doesn't work well for attributes, unless we used a
different renaming operator for those (which I think is overkill).
So my current preference falls to ">" to represent a renaming:
rule: (mark, s)?, naming, -["=:"], s, -alts, -".".
nonterminal: (mark, s)?, naming.
-naming: name, s, (">", s, rename, s)?.
@name: namestart, namefollower*.
@rename: name.
Or simplifying by factoring the mark into the naming rule:
rule: naming, -["=:"], s, -alts, -".".
nonterminal: naming.
-naming: (mark, s)?, name, s, (">", s, rename, s)?.
@name: namestart, namefollower*.
@rename: name.
This means that the serialised version of the grammar remains the same,
except that <rule> and <nonterminal> can now also carry a @rename.
<rule name="nmonth" rename="month">
ROUNDTRIPPING
In passing, it is worth noting that although we haven't yet addressed
roundtripping, this requirement does potentially introduce ambiguity in
returning from the XML serialisation to the original form. In the dates
example above the two can be distinguished by the order of day, month,
year, but in the general case there can be two parses:
<month>05</month>
could be roundtripped as
05
or as
May
Steven
Received on Saturday, 24 June 2023 13:38:00 UTC