AW: Whitespace preservation mode

Hi Taki, all,

I tried to take a look at the issue again and modified some of the prose in the editor draft [1].

> On the other hand, for structural whitespaces, an encoding process in strict
> mode does not need to fail simply because it did not find CH event that would
> have allowed it to encode structural whitespaces. We should let it continue to
> process. We just need to make sure to encode structural whitespaces when CH
> events are available.

The updates do not require anymore to fail because the process did not find a CH event.
Moreover, it states that according to the rules it SHOULD preserve as many whitespaces as possible.
Also, it notes that in certain cases preserving all whitespaces is not possible.

I hope that resolves the issue also for "for structural whitespaces".

An alternative for "Complex Whitespace Data in mixed content" I see is to add a rule like this.

"When the grammar in effect is a schema-informed grammar and the content model is with mixed content (we would need to define how mixed content w.r.t. to the EXI grammars is detected) all whitespaces MUST be preserved."

That said, I think this is over complicated and I would rather not like to see that.

Thanks,

-- Daniel

[1] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling



________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Montag, 7. März 2016 22:24
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi Daniel,

I think we'd better approach this issue more consistently both for cases
of structural whitespaces and simple data whitespaces.

We can think of xml:space="preserve" as a desire to keep as much whitespaces
as it is possible.

For simple data whitespaces, I think your proposed approach works well enough.

On the other hand, for structural whitespaces, an encoding process in strict
mode does not need to fail simply because it did not find CH event that would
have allowed it to encode structural whitespaces. We should let it continue to
process. We just need to make sure to encode structural whitespaces when CH
events are available.

Therefore, I propose to revisit the issue [1] again, and make it more
consistent with the approach we are taking for simple data whitespaces.

[1] https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Nov/0033.html>

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>]
Sent: Friday, March 04, 2016 3:08 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

Hi Taki,



my proposal is much simpler.

I would purely warn the readers of our spec document about xml:space="preserve" and the consequences (maybe suggesting to use Preserve.LexicalValues option).



Technically I propose doing the following:



When lexicalPreserve is true CH[typed] can and SHOULD be used in all cases given that the restricted character sets can represent any string value.



When lexicalPreserve is not true first try [typed] production and only if this fails use [untyped] production.



I think this matches with what OpenEXI, EXIficient and other implementations would naturally do...



Hope this clarifies my proposal,



-- Daniel





________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Freitag, 4. März 2016 00:05
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi Daniel,

In both plan A and B, when xml:space="preserve" is in effect, are implementations
supposed to pick up different productions depending on whether lexicalPreserve
is used or not? When lexicalPreserve is true, CH [typed] is used, and
otherwise CH [untyped] is used.

Regarding the differences between plan A and B, B allows the use of CH [typed]
when it is strict mode. Is this a correct understanding? If so, I think this makes
the processing more complex because it essentially reverses the natural production
selection order. OpenEXI, for example, first try to use CH [typed], then only if it fails,
fallbacks to CH [untyped]. However, plan B requires it to do the other way around
when xml:space="preserve" is in effect (i.e. try CH [untyped] first, then fallback to
CH [typed] if CH [untyped] was not available).

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>]
Sent: Thursday, March 03, 2016 1:34 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

Hi Taki, all,



I do see the issues you raised and I am not sure if I have the "one" answer.

I think we can follow different paths which all provide pros and cons.

A)
If xml:space="preserve" is in effect the only thing that can really guarantee the requested preservation is either lexicalPreserve set to true OR using AT[untyped] or CH[untyped] productions.
This also means that in strict mode you would only have lexicalPreserve feature remaining and if this is not set encoding fails.



B)
A less accurate approach is to "warn" users in the document about  xml:space="preserve" and inform that lexicalPreserve  should be used. If it is not the case we still allow (and in Canonical EXI require) to use CH/AT[typed] production as long as possible.
This means that all your list examples are mapped to the same canonical EXI representation.
 List-1. "A⬚B⬚C"
 List-2. "⬚A⬚B⬚C⬚"
 List-3. "A⬚⬚B⬚⬚C"
CH/AT[untyped] would be used if the type does not match at all (e.g., value "X12" for xsd:int)



C)
Yet another possibility is to require implementations to follow a given behavior like collapsing whitespaces and doing all other checks.
Practically I think this is not feasible.

Having said that, I tend to be in favor of the most simple approach B). I believe this is also the approach followed by most implementations so far.

Any thoughts/opinions?



Thanks,

-- Daniel


________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. März 2016 22:27
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi,

Assume we are encoding schema-informed EXI with the following settings.

- Non-strict mode.
- xml:space="preserve" is in effect.

When the associated type is xsd:int, the followings are examples
of valid instances.

Int-1. <A>123</A>
Int-2. <A>⬚123⬚⬚</A>

CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2.

What distinguishes the two cases?

One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number.

Let's next take a look at another example using a list datatype.

List-1. "A⬚B⬚C"
List-2. "⬚A⬚B⬚C⬚"
List-3. "A⬚⬚B⬚⬚C"

CH [typed] can be used for List-1.
One has to use CH [untyped] for List-2 because it has surrounding whitespaces.

But what about List-3?
It does not have surrounding whitespaces. It contains collapsible whitespaces
between list items. In order to preserve those whitespaces, you also need to
use CH [untyped] for List-3.

Then the criteria should now be rephrased as "having any collapsible whitespaces".

Am I thinking correct?

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com%3C&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com>>>]
Sent: Monday, February 29, 2016 3:03 PM
To: Peintner, Daniel (ext); public-exi@w3.org
Subject: RE: Whitespace preservation mode

Hi Daniel,

Let's use as an example the following XML snippets, and assume in both cases
the value is typed as xsd:int.

1. <A>  123   </A>
2. <A>123</A>

In case #1, the data "123" is surrounded by whitespaces.

When xml:space="preserve" is in effect, and the EXI grammar in use
is *not* strict, the case #1 will be encoded using CH [untyped] production.

On the other hand, case #2 will be encoded using CH [typed] production
because it does not contain whitespaces around the number.

When EXI grammar in use *is* strict, then the encoding #1 will fail as
you mentioned in the document.

Do you share the same understanding?

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com%3C&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>>>]
Sent: Monday, January 11, 2016 9:27 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

All,

I started to define whitespace handling rules in the spirit of the current TTFMS rules [1].

Please find a first draft here [2].

I think we could add advise for users
* to use preserve.LexicalValue if encoding fails
* to use xml:space="preserve" if canonicalization is
  expected to preserve as much whitespaces as possible

Do you have any comment and/or feedback.

Thanks,

-- Daniel

[1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html%3C&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html>>>
[2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling>>>





________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. Dezember 2015 03:51
An: public-exi@w3.org
Betreff: Whitespace preservation mode

Hi,

When there is a type associated with an element, content type information
gives you an idea as to what to do with whitespaces during encoding.

However, in schema-less situations, the best you can do is to guess what
is expected to do, unless xml:space is specified. I am not very sure if
this heuristics is always correct.

I think we may need to provide a canonicalization mode where canonicalization
is expected to preserve as much whitespaces as possible.

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America

Received on Wednesday, 16 March 2016 15:51:55 UTC