W3C home > Mailing lists > Public > public-exi@w3.org > March 2016

RE: Whitespace preservation mode

From: Takuki Kamiya <tkamiya@us.fujitsu.com>
Date: Mon, 7 Mar 2016 13:24:30 -0800
To: "Peintner, Daniel (ext)" <daniel.peintner.ext@siemens.com>, "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <23204FACB677D84EBD57175AB7B5A71C038269CF2A07@FMSAMAIL.fmsa.local>
Hi Daniel,

I think we'd better approach this issue more consistently both for cases
of structural whitespaces and simple data whitespaces.

We can think of xml:space="preserve" as a desire to keep as much whitespaces
as it is possible.

For simple data whitespaces, I think your proposed approach works well enough.

On the other hand, for structural whitespaces, an encoding process in strict 
mode does not need to fail simply because it did not find CH event that would
have allowed it to encode structural whitespaces. We should let it continue to 
process. We just need to make sure to encode structural whitespaces when CH 
events are available.

Therefore, I propose to revisit the issue [1] again, and make it more
consistent with the approach we are taking for simple data whitespaces.

[1] https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html


Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com] 
Sent: Friday, March 04, 2016 3:08 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

Hi Taki,



my proposal is much simpler.

I would purely warn the readers of our spec document about xml:space="preserve" and the consequences (maybe suggesting to use Preserve.LexicalValues option).



Technically I propose doing the following:



When lexicalPreserve is true CH[typed] can and SHOULD be used in all cases given that the restricted character sets can represent any string value.



When lexicalPreserve is not true first try [typed] production and only if this fails use [untyped] production.



I think this matches with what OpenEXI, EXIficient and other implementations would naturally do...



Hope this clarifies my proposal,



-- Daniel





________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Freitag, 4. März 2016 00:05
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi Daniel,

In both plan A and B, when xml:space="preserve" is in effect, are implementations
supposed to pick up different productions depending on whether lexicalPreserve
is used or not? When lexicalPreserve is true, CH [typed] is used, and
otherwise CH [untyped] is used.

Regarding the differences between plan A and B, B allows the use of CH [typed]
when it is strict mode. Is this a correct understanding? If so, I think this makes
the processing more complex because it essentially reverses the natural production
selection order. OpenEXI, for example, first try to use CH [typed], then only if it fails,
fallbacks to CH [untyped]. However, plan B requires it to do the other way around
when xml:space="preserve" is in effect (i.e. try CH [untyped] first, then fallback to
CH [typed] if CH [untyped] was not available).

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>]
Sent: Thursday, March 03, 2016 1:34 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

Hi Taki, all,



I do see the issues you raised and I am not sure if I have the "one" answer.

I think we can follow different paths which all provide pros and cons.

A)
If xml:space="preserve" is in effect the only thing that can really guarantee the requested preservation is either lexicalPreserve set to true OR using AT[untyped] or CH[untyped] productions.
This also means that in strict mode you would only have lexicalPreserve feature remaining and if this is not set encoding fails.



B)
A less accurate approach is to "warn" users in the document about  xml:space="preserve" and inform that lexicalPreserve  should be used. If it is not the case we still allow (and in Canonical EXI require) to use CH/AT[typed] production as long as possible.
This means that all your list examples are mapped to the same canonical EXI representation.
 List-1. "A⬚B⬚C"
 List-2. "⬚A⬚B⬚C⬚"
 List-3. "A⬚⬚B⬚⬚C"
CH/AT[untyped] would be used if the type does not match at all (e.g., value "X12" for xsd:int)



C)
Yet another possibility is to require implementations to follow a given behavior like collapsing whitespaces and doing all other checks.
Practically I think this is not feasible.

Having said that, I tend to be in favor of the most simple approach B). I believe this is also the approach followed by most implementations so far.

Any thoughts/opinions?



Thanks,

-- Daniel


________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. März 2016 22:27
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi,

Assume we are encoding schema-informed EXI with the following settings.

- Non-strict mode.
- xml:space="preserve" is in effect.

When the associated type is xsd:int, the followings are examples
of valid instances.

Int-1. <A>123</A>
Int-2. <A>⬚123⬚⬚</A>

CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2.

What distinguishes the two cases?

One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number.

Let's next take a look at another example using a list datatype.

List-1. "A⬚B⬚C"
List-2. "⬚A⬚B⬚C⬚"
List-3. "A⬚⬚B⬚⬚C"

CH [typed] can be used for List-1.
One has to use CH [untyped] for List-2 because it has surrounding whitespaces.

But what about List-3?
It does not have surrounding whitespaces. It contains collapsible whitespaces
between list items. In order to preserve those whitespaces, you also need to
use CH [untyped] for List-3.

Then the criteria should now be rephrased as "having any collapsible whitespaces".

Am I thinking correct?

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com>>]
Sent: Monday, February 29, 2016 3:03 PM
To: Peintner, Daniel (ext); public-exi@w3.org
Subject: RE: Whitespace preservation mode

Hi Daniel,

Let's use as an example the following XML snippets, and assume in both cases
the value is typed as xsd:int.

1. <A>  123   </A>
2. <A>123</A>

In case #1, the data "123" is surrounded by whitespaces.

When xml:space="preserve" is in effect, and the EXI grammar in use
is *not* strict, the case #1 will be encoded using CH [untyped] production.

On the other hand, case #2 will be encoded using CH [typed] production
because it does not contain whitespaces around the number.

When EXI grammar in use *is* strict, then the encoding #1 will fail as
you mentioned in the document.

Do you share the same understanding?

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>]
Sent: Monday, January 11, 2016 9:27 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

All,

I started to define whitespace handling rules in the spirit of the current TTFMS rules [1].

Please find a first draft here [2].

I think we could add advise for users
* to use preserve.LexicalValue if encoding fails
* to use xml:space="preserve" if canonicalization is
  expected to preserve as much whitespaces as possible

Do you have any comment and/or feedback.

Thanks,

-- Daniel

[1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html>>
[2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling>>





________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. Dezember 2015 03:51
An: public-exi@w3.org
Betreff: Whitespace preservation mode

Hi,

When there is a type associated with an element, content type information
gives you an idea as to what to do with whitespaces during encoding.

However, in schema-less situations, the best you can do is to guess what
is expected to do, unless xml:space is specified. I am not very sure if
this heuristics is always correct.

I think we may need to provide a canonicalization mode where canonicalization
is expected to preserve as much whitespaces as possible.

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America



Received on Monday, 7 March 2016 21:25:18 UTC

This archive was generated by hypermail 2.3.1 : Monday, 7 March 2016 21:25:18 UTC