Renaming and ambiguity

Good morning,

I spent a few minutes this morning exploring how the ISO8601:2004
grammar[1] might be extended to support ISO8601:2019. I don’t actually
*have* a copy of ISO8601:2019, of course, but (at least some of) the
extensions are described in Extended Date/Time Format (EDTF)[2].

This exercise really drove home that we need to sort out renaming. It’s
usually possible to fake it, here’s a fragment of the grammar that deals
with uncertain and approximate dates:

-certain-reduced-accuracy-calendar-date = year, -'-', month; year; century .
-uncertain-reduced-accuracy-calendar-date = certain-reduced-accuracy-calendar-date, @certainty .
certainty = uncertain ; approximate ; uncertain-and-approximate .
uncertain = +"uncertain", -'?' .
approximate = +"approximate", -'~' .
uncertain-and-approximate = +"uncertain-and-approximate", -'%' .

That parses “1967-06%” into:

<date certainty="uncertain-and-approximate">
   <year>1967</year>
   <month>06</month>
</date>

But if I needed to express a slightly different measure of certainty
somewhere else, I’d be unable to spell it “certainty”. Also, for
seasonal dates, what I initially wanted to do was this:

season-date = year, -'-', @spring, -'21'
            ; year, -'-', @summer, -'22'
            ; year, -'-', @autumn -'23'
            ; year, -'-', @winter, -'24'
-spring = +"spring" .
-summer = +"summer" .
-autumn = +"autumn" .
-winter = +"winter" .

But I wanted the attribute to be named ‘season’ in each case. The
simplest thing would have been a mechanism for renaming the attribute.
In this case, it was possible to reorganize the grammar to achieve what
I wanted, along the same lines as the uncertainty case:

season-date = year, -'-', @season .
season = spring ; summer ; autumn ; winter .
-spring = -"21", +"spring" .
-summer = -"22", +"summer" .
-autumn = -"23", +"autumn" .
-winter = -"24", +"winter" .

But again that only works as long as I have no other need for
nonterminals with those names. It also, I think, depends on the fact
that I’m suppressing the numbers. And I’m not convinced the resulting
grammar is as clear (though that’s partly a matter of taste and opinion,
I expect).

The next thing I noticed is that the grammar is necessarily ambiguous.
For better or worse, 1934-12 is either the time 7:34pm with an offset of
-12 hours or December, 1934. It isn’t possible to change the grammar so
that “1934-12” isn’t ambiguous, so I start to feel like I want a way of
expressing that one form should be prefered over the other. But maybe
that’s an implementation/API issue, not a specification issue. I’m not
sure.

                                        Be seeing you,
                                          norm

[1] https://github.com/invisibleXML/ixml/tree/master/samples/ISO-8601-2004
[2] https://www.loc.gov/standards/datetime/

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 20 April 2023 09:04:13 UTC