- From: Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com>
- Date: Wed, 23 Mar 2016 13:57:45 +0000
- To: Takuki Kamiya <tkamiya@us.fujitsu.com>, "public-exi@w3.org" <public-exi@w3.org>
Hi Taki, the proposed change makes perfect sense to me. I implemented your updates already (see [1]). Thanks, -- Daniel [1] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceSimpleData ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Mittwoch, 23. März 2016 04:36 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi Daniel, Section "4.3.3.1 Simple Whitespace Data" in Canonical EXI [1] says: "When the grammar in effect is a schema-informed grammar apply lexical rule and use whiteSpace facet." I think we should slightly change this to something like: "When the grammar in effect is a schema-informed grammar apply lexical rule and use whiteSpace facet if any to normalize whitespaces." This change is to cover anySimpleType and union datatypes which don't have associated whiteSpace facet. [1] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceSimpleData<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceSimpleData> Thank you, Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>] Sent: Wednesday, March 16, 2016 8:51 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode Hi Taki, all, I tried to take a look at the issue again and modified some of the prose in the editor draft [1]. > On the other hand, for structural whitespaces, an encoding process in strict > mode does not need to fail simply because it did not find CH event that would > have allowed it to encode structural whitespaces. We should let it continue to > process. We just need to make sure to encode structural whitespaces when CH > events are available. The updates do not require anymore to fail because the process did not find a CH event. Moreover, it states that according to the rules it SHOULD preserve as many whitespaces as possible. Also, it notes that in certain cases preserving all whitespaces is not possible. I hope that resolves the issue also for "for structural whitespaces". An alternative for "Complex Whitespace Data in mixed content" I see is to add a rule like this. "When the grammar in effect is a schema-informed grammar and the content model is with mixed content (we would need to define how mixed content w.r.t. to the EXI grammars is detected) all whitespaces MUST be preserved." That said, I think this is over complicated and I would rather not like to see that. Thanks, -- Daniel [1] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling> ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Montag, 7. März 2016 22:24 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi Daniel, I think we'd better approach this issue more consistently both for cases of structural whitespaces and simple data whitespaces. We can think of xml:space="preserve" as a desire to keep as much whitespaces as it is possible. For simple data whitespaces, I think your proposed approach works well enough. On the other hand, for structural whitespaces, an encoding process in strict mode does not need to fail simply because it did not find CH event that would have allowed it to encode structural whitespaces. We should let it continue to process. We just need to make sure to encode structural whitespaces when CH events are available. Therefore, I propose to revisit the issue [1] again, and make it more consistent with the approach we are taking for simple data whitespaces. [1] https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html>> Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>] Sent: Friday, March 04, 2016 3:08 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode Hi Taki, my proposal is much simpler. I would purely warn the readers of our spec document about xml:space="preserve" and the consequences (maybe suggesting to use Preserve.LexicalValues option). Technically I propose doing the following: When lexicalPreserve is true CH[typed] can and SHOULD be used in all cases given that the restricted character sets can represent any string value. When lexicalPreserve is not true first try [typed] production and only if this fails use [untyped] production. I think this matches with what OpenEXI, EXIficient and other implementations would naturally do... Hope this clarifies my proposal, -- Daniel ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Freitag, 4. März 2016 00:05 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi Daniel, In both plan A and B, when xml:space="preserve" is in effect, are implementations supposed to pick up different productions depending on whether lexicalPreserve is used or not? When lexicalPreserve is true, CH [typed] is used, and otherwise CH [untyped] is used. Regarding the differences between plan A and B, B allows the use of CH [typed] when it is strict mode. Is this a correct understanding? If so, I think this makes the processing more complex because it essentially reverses the natural production selection order. OpenEXI, for example, first try to use CH [typed], then only if it fails, fallbacks to CH [untyped]. However, plan B requires it to do the other way around when xml:space="preserve" is in effect (i.e. try CH [untyped] first, then fallback to CH [typed] if CH [untyped] was not available). Thank you, Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>>] Sent: Thursday, March 03, 2016 1:34 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode Hi Taki, all, I do see the issues you raised and I am not sure if I have the "one" answer. I think we can follow different paths which all provide pros and cons. A) If xml:space="preserve" is in effect the only thing that can really guarantee the requested preservation is either lexicalPreserve set to true OR using AT[untyped] or CH[untyped] productions. This also means that in strict mode you would only have lexicalPreserve feature remaining and if this is not set encoding fails. B) A less accurate approach is to "warn" users in the document about xml:space="preserve" and inform that lexicalPreserve should be used. If it is not the case we still allow (and in Canonical EXI require) to use CH/AT[typed] production as long as possible. This means that all your list examples are mapped to the same canonical EXI representation. List-1. "A⬚B⬚C" List-2. "⬚A⬚B⬚C⬚" List-3. "A⬚⬚B⬚⬚C" CH/AT[untyped] would be used if the type does not match at all (e.g., value "X12" for xsd:int) C) Yet another possibility is to require implementations to follow a given behavior like collapsing whitespaces and doing all other checks. Practically I think this is not feasible. Having said that, I tend to be in favor of the most simple approach B). I believe this is also the approach followed by most implementations so far. Any thoughts/opinions? Thanks, -- Daniel ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Dienstag, 1. März 2016 22:27 An: Peintner, Daniel (ext); public-exi@w3.org Betreff: RE: Whitespace preservation mode Hi, Assume we are encoding schema-informed EXI with the following settings. - Non-strict mode. - xml:space="preserve" is in effect. When the associated type is xsd:int, the followings are examples of valid instances. Int-1. <A>123</A> Int-2. <A>⬚123⬚⬚</A> CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2. What distinguishes the two cases? One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number. Let's next take a look at another example using a list datatype. List-1. "A⬚B⬚C" List-2. "⬚A⬚B⬚C⬚" List-3. "A⬚⬚B⬚⬚C" CH [typed] can be used for List-1. One has to use CH [untyped] for List-2 because it has surrounding whitespaces. But what about List-3? It does not have surrounding whitespaces. It contains collapsible whitespaces between list items. In order to preserve those whitespaces, you also need to use CH [untyped] for List-3. Then the criteria should now be rephrased as "having any collapsible whitespaces". Am I thinking correct? Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com>>>>] Sent: Monday, February 29, 2016 3:03 PM To: Peintner, Daniel (ext); public-exi@w3.org Subject: RE: Whitespace preservation mode Hi Daniel, Let's use as an example the following XML snippets, and assume in both cases the value is typed as xsd:int. 1. <A> 123 </A> 2. <A>123</A> In case #1, the data "123" is surrounded by whitespaces. When xml:space="preserve" is in effect, and the EXI grammar in use is *not* strict, the case #1 will be encoded using CH [untyped] production. On the other hand, case #2 will be encoded using CH [typed] production because it does not contain whitespaces around the number. When EXI grammar in use *is* strict, then the encoding #1 will fail as you mentioned in the document. Do you share the same understanding? Thank you, Takuki Kamiya Fujitsu Laboratories of America -----Original Message----- From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>>>] Sent: Monday, January 11, 2016 9:27 AM To: Takuki Kamiya; public-exi@w3.org Subject: AW: Whitespace preservation mode All, I started to define whitespace handling rules in the spirit of the current TTFMS rules [1]. Please find a first draft here [2]. I think we could add advise for users * to use preserve.LexicalValue if encoding fails * to use xml:space="preserve" if canonicalization is expected to preserve as much whitespaces as possible Do you have any comment and/or feedback. Thanks, -- Daniel [1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html>>>> [2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling>>>> ________________________________ Von: Takuki Kamiya [tkamiya@us.fujitsu.com] Gesendet: Dienstag, 1. Dezember 2015 03:51 An: public-exi@w3.org Betreff: Whitespace preservation mode Hi, When there is a type associated with an element, content type information gives you an idea as to what to do with whitespaces during encoding. However, in schema-less situations, the best you can do is to guess what is expected to do, unless xml:space is specified. I am not very sure if this heuristics is always correct. I think we may need to provide a canonicalization mode where canonicalization is expected to preserve as much whitespaces as possible. Thank you, Takuki Kamiya Fujitsu Laboratories of America
Received on Wednesday, 23 March 2016 13:58:23 UTC