W3C home > Mailing lists > Public > public-ixml@w3.org > October 2022

Re: Round-tripping ixml?

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Tue, 18 Oct 2022 18:52:00 +0000
Message-Id: <1666119036943.4149247338.607397384@cwi.nl>
To: Michal Měchura <michmech@lexiconista.com>, public-ixml@w3.org
This is exactly what I suggest in my paper: unparsing is just parsing in reverse, and in a case like this you would get an ambiguous parse from which one serialisation would get chosen.


On Tuesday 18 October 2022 18:45:48 (+02:00), Michal Měchura wrote:

In the fully general case, the problem is intractable.

Grammars can lose information. Consider:

S = 'a', -'.'
  | 'a', -'?'
  | 'a', -'!' .

Given <S>a</S>, it’s impossible to know what the input was.

I think this would not be a problem. For the use cases I have in mind, it would be OK to round-trip into any one linearization, even if it isn’t exactly the one from which the XML had been parsed. For example, let’s say we have an iXML grammar which parses any one of these:

7 November
7 Nov
07 Nov

into this:

<date @day=”7” @month=”11”/>

and then linearizes it back into this:

7 November

That would be acceptable. We could say that this linearization is the “canonical” one while the others are “tolerated” for parsing but never output in linearization. There could be some heuristics to choose which linearization is canonical, let’s say always the shortest one (= smallest number of terminals) and/or always the first one listed in the rule.

Well, these are just suggestions from an outsider and a potential iXML user. Take it or leave it. :-)


Received on Tuesday, 18 October 2022 18:52:18 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 18 October 2022 18:52:19 UTC