Re: Round-tripping ixml? from Michal Měchura on 2022-10-18 (public-ixml@w3.org from October 2022)

From: Michal Měchura <michmech@lexiconista.com>
Date: Tue, 18 Oct 2022 18:45:48 +0200
To: public-ixml@w3.org
Message-ID: <d9b538cc-87cd-32a1-e159-f2acc86a8c38@lexiconista.com>

    In the fully general case, the problem is intractable.

    Grammars can lose information. Consider:

    |S = 'a', -'.' | 'a', -'?' | 'a', -'!' . |

    Given |<S>a</S>|, it’s impossible to know what the input was.

I think this would not be a problem. For the use cases I have in mind, 
it would be OK to round-trip into /any one/ linearization, even if it 
isn’t exactly the one from which the XML had been parsed. For example, 
let’s say we have an iXML grammar which parses any one of these:

|7 November 7 Nov 07 Nov |

into this:

|<date @day=”7” @month=”11”/> |

and then linearizes it back into this:

|7 November |

That would be acceptable. We could say that this linearization is the 
“canonical” one while the others are “tolerated” for parsing but never 
output in linearization. There could be some heuristics to choose which 
linearization is canonical, let’s say always the shortest one (= 
smallest number of terminals) and/or always the first one listed in the 
rule.

Well, these are just suggestions from an outsider and a potential iXML 
user. Take it or leave it. :-)

M.

Received on Tuesday, 18 October 2022 16:46:07 UTC