- From: Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com>
- Date: Fri, 4 Mar 2016 11:07:38 +0000
- To: Takuki Kamiya <tkamiya@us.fujitsu.com>, "public-exi@w3.org" <public-exi@w3.org>
Hi Taki, my proposal is much simpler. I would purely warn the readers of our spec document about xml:space="preserve" and the consequences (maybe suggesting to use Preserve.LexicalValues option). Technically I propose doing the following: When lexicalPreserve is true CH[typed] can and SHOULD be used in all cases given that the restricted character sets can represent any string value. When lexicalPreserve is not true first try [typed] production and only if this fails use [untyped] production. I think this matches with what OpenEXI, EXIficient and other implementations would naturally do... Hope this clarifies my proposal, -- Daniel ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Freitag, 4. März 2016 00:05 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi Daniel, In both plan A and B, when xml:space="preserve" is in effect, are implementations supposed to pick up different productions depending on whether lexicalPreserve is used or not? When lexicalPreserve is true, CH [typed] is used, and otherwise CH [untyped] is used. Regarding the differences between plan A and B, B allows the use of CH [typed] when it is strict mode. Is this a correct understanding? If so, I think this makes the processing more complex because it essentially reverses the natural production selection order. OpenEXI, for example, first try to use CH [typed], then only if it fails, fallbacks to CH [untyped]. However, plan B requires it to do the other way around when xml:space="preserve" is in effect (i.e. try CH [untyped] first, then fallback to CH [typed] if CH [untyped] was not available). Thank you, Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>] Sent: Thursday, March 03, 2016 1:34 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode Hi Taki, all, I do see the issues you raised and I am not sure if I have the "one" answer. I think we can follow different paths which all provide pros and cons. A) If xml:space="preserve" is in effect the only thing that can really guarantee the requested preservation is either lexicalPreserve set to true OR using AT[untyped] or CH[untyped] productions. This also means that in strict mode you would only have lexicalPreserve feature remaining and if this is not set encoding fails. B) A less accurate approach is to "warn" users in the document about xml:space="preserve" and inform that lexicalPreserve should be used. If it is not the case we still allow (and in Canonical EXI require) to use CH/AT[typed] production as long as possible. This means that all your list examples are mapped to the same canonical EXI representation. List-1. "A⬚B⬚C" List-2. "⬚A⬚B⬚C⬚" List-3. "A⬚⬚B⬚⬚C" CH/AT[untyped] would be used if the type does not match at all (e.g., value "X12" for xsd:int) C) Yet another possibility is to require implementations to follow a given behavior like collapsing whitespaces and doing all other checks. Practically I think this is not feasible. Having said that, I tend to be in favor of the most simple approach B). I believe this is also the approach followed by most implementations so far. Any thoughts/opinions? Thanks, -- Daniel ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Dienstag, 1. März 2016 22:27 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi, Assume we are encoding schema-informed EXI with the following settings. - Non-strict mode. - xml:space="preserve" is in effect. When the associated type is xsd:int, the followings are examples of valid instances. Int-1. <A>123</A> Int-2. <A>⬚123⬚⬚</A> CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2. What distinguishes the two cases? One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number. Let's next take a look at another example using a list datatype. List-1. "A⬚B⬚C" List-2. "⬚A⬚B⬚C⬚" List-3. "A⬚⬚B⬚⬚C" CH [typed] can be used for List-1. One has to use CH [untyped] for List-2 because it has surrounding whitespaces. But what about List-3? It does not have surrounding whitespaces. It contains collapsible whitespaces between list items. In order to preserve those whitespaces, you also need to use CH [untyped] for List-3. Then the criteria should now be rephrased as "having any collapsible whitespaces". Am I thinking correct? Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com>>] Sent: Monday, February 29, 2016 3:03 PM To: Peintner, Daniel (ext); public-exi@w3.org Subject: RE: Whitespace preservation mode Hi Daniel, Let's use as an example the following XML snippets, and assume in both cases the value is typed as xsd:int. 1. <A> 123 </A> 2. <A>123</A> In case #1, the data "123" is surrounded by whitespaces. When xml:space="preserve" is in effect, and the EXI grammar in use is *not* strict, the case #1 will be encoded using CH [untyped] production. On the other hand, case #2 will be encoded using CH [typed] production because it does not contain whitespaces around the number. When EXI grammar in use *is* strict, then the encoding #1 will fail as you mentioned in the document. Do you share the same understanding? Thank you, Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>] Sent: Monday, January 11, 2016 9:27 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode All, I started to define whitespace handling rules in the spirit of the current TTFMS rules [1]. Please find a first draft here [2]. I think we could add advise for users * to use preserve.LexicalValue if encoding fails * to use xml:space="preserve" if canonicalization is expected to preserve as much whitespaces as possible Do you have any comment and/or feedback. Thanks, -- Daniel [1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html>> [2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling>> ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Dienstag, 1. Dezember 2015 03:51 An: public-exi@w3.org Betreff: Whitespace preservation mode Hi, When there is a type associated with an element, content type information gives you an idea as to what to do with whitespaces during encoding. However, in schema-less situations, the best you can do is to guess what is expected to do, unless xml:space is specified. I am not very sure if this heuristics is always correct. I think we may need to provide a canonicalization mode where canonicalization is expected to preserve as much whitespaces as possible. Thank you, Takuki Kamiya Fujitsu Laboratories of America
Received on Friday, 4 March 2016 11:08:19 UTC