- From: Michal Měchura <michmech@lexiconista.com>
- Date: Tue, 18 Oct 2022 18:45:48 +0200
- To: public-ixml@w3.org
- Message-ID: <d9b538cc-87cd-32a1-e159-f2acc86a8c38@lexiconista.com>
In the fully general case, the problem is intractable. Grammars can lose information. Consider: |S = 'a', -'.' | 'a', -'?' | 'a', -'!' . | Given |<S>a</S>|, it’s impossible to know what the input was. I think this would not be a problem. For the use cases I have in mind, it would be OK to round-trip into /any one/ linearization, even if it isn’t exactly the one from which the XML had been parsed. For example, let’s say we have an iXML grammar which parses any one of these: |7 November 7 Nov 07 Nov | into this: |<date @day=”7” @month=”11”/> | and then linearizes it back into this: |7 November | That would be acceptable. We could say that this linearization is the “canonical” one while the others are “tolerated” for parsing but never output in linearization. There could be some heuristics to choose which linearization is canonical, let’s say always the shortest one (= smallest number of terminals) and/or always the first one listed in the rule. Well, these are just suggestions from an outsider and a potential iXML user. Take it or leave it. :-) M.
Received on Tuesday, 18 October 2022 16:46:07 UTC