W3C home > Mailing lists > Public > public-ixml@w3.org > October 2022

Re: Round-tripping ixml?

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Wed, 19 Oct 2022 15:25:11 +0100
Message-Id: <B6F7E9C4-A0BE-4972-B975-EC1595ED21F4@bethan.wales>
Cc: Michal Měchura <michmech@lexiconista.com>, public-ixml@w3.org
To: Steven Pemberton <steven.pemberton@cwi.nl>
Yes - the only thing in Michal’s description which isn’t possible in iXML v1.0 is the use of heuristics to choose the canonical form, I think? 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 18 Oct 2022, at 19:52, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
> This is exactly what I suggest in my paper: unparsing is just parsing in reverse, and in a case like this you would get an ambiguous parse from which one serialisation would get chosen.
> Steven
> On Tuesday 18 October 2022 18:45:48 (+02:00), Michal Měchura wrote:
> In the fully general case, the problem is intractable.
> Grammars can lose information. Consider:
> S = 'a', -'.'
>   | 'a', -'?'
>   | 'a', -'!' .
> Given <S>a</S>, it’s impossible to know what the input was.
> I think this would not be a problem. For the use cases I have in mind, it would be OK to round-trip into any one linearization, even if it isn’t exactly the one from which the XML had been parsed. For example, let’s say we have an iXML grammar which parses any one of these:
> 7 November
> 7 Nov
> 07 Nov
> into this:
> <date @day=”7” @month=”11”/>
> and then linearizes it back into this:
> 7 November
> That would be acceptable. We could say that this linearization is the “canonical” one while the others are “tolerated” for parsing but never output in linearization. There could be some heuristics to choose which linearization is canonical, let’s say always the shortest one (= smallest number of terminals) and/or always the first one listed in the rule.
> Well, these are just suggestions from an outsider and a potential iXML user. Take it or leave it. :-)
> M.

Received on Wednesday, 19 October 2022 14:25:30 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 19 October 2022 14:25:31 UTC