AW: Whitespace preservation mode

Hi Taki, all,



I do see the issues you raised and I am not sure if I have the "one" answer.

I think we can follow different paths which all provide pros and cons.

A)
If xml:space="preserve" is in effect the only thing that can really guarantee the requested preservation is either lexicalPreserve set to true OR using AT[untyped] or CH[untyped] productions.
This also means that in strict mode you would only have lexicalPreserve feature remaining and if this is not set encoding fails.



B)
A less accurate approach is to "warn" users in the document about  xml:space="preserve" and inform that lexicalPreserve  should be used. If it is not the case we still allow (and in Canonical EXI require) to use CH/AT[typed] production as long as possible.
This means that all your list examples are mapped to the same canonical EXI representation.
 List-1. "A⬚B⬚C"
 List-2. "⬚A⬚B⬚C⬚"
 List-3. "A⬚⬚B⬚⬚C"
CH/AT[untyped] would be used if the type does not match at all (e.g., value "X12" for xsd:int)



C)
Yet another possibility is to require implementations to follow a given behavior like collapsing whitespaces and doing all other checks.
Practically I think this is not feasible.

Having said that, I tend to be in favor of the most simple approach B). I believe this is also the approach followed by most implementations so far.

Any thoughts/opinions?



Thanks,

-- Daniel


________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. März 2016 22:27
An: Peintner, Daniel (ext); public-exi@w3.org
Betreff: RE: Whitespace preservation mode

Hi,

Assume we are encoding schema-informed EXI with the following settings.

- Non-strict mode.
- xml:space="preserve" is in effect.

When the associated type is xsd:int, the followings are examples
of valid instances.

Int-1. <A>123</A>
Int-2. <A>⬚123⬚⬚</A>

CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2.

What distinguishes the two cases?

One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number.

Let's next take a look at another example using a list datatype.

List-1. "A⬚B⬚C"
List-2. "⬚A⬚B⬚C⬚"
List-3. "A⬚⬚B⬚⬚C"

CH [typed] can be used for List-1.
One has to use CH [untyped] for List-2 because it has surrounding whitespaces.

But what about List-3?
It does not have surrounding whitespaces. It contains collapsible whitespaces
between list items. In order to preserve those whitespaces, you also need to
use CH [untyped] for List-3.

Then the criteria should now be rephrased as "having any collapsible whitespaces".

Am I thinking correct?

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com<&smime=14.3.123.2mailto:tkamiya@us.fujitsu.com>]
Sent: Monday, February 29, 2016 3:03 PM
To: Peintner, Daniel (ext); public-exi@w3.org
Subject: RE: Whitespace preservation mode

Hi Daniel,

Let's use as an example the following XML snippets, and assume in both cases
the value is typed as xsd:int.

1. <A>  123   </A>
2. <A>123</A>

In case #1, the data "123" is surrounded by whitespaces.

When xml:space="preserve" is in effect, and the EXI grammar in use
is *not* strict, the case #1 will be encoded using CH [untyped] production.

On the other hand, case #2 will be encoded using CH [typed] production
because it does not contain whitespaces around the number.

When EXI grammar in use *is* strict, then the encoding #1 will fail as
you mentioned in the document.

Do you share the same understanding?

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com<&smime=14.3.123.2mailto:daniel.peintner.ext@siemens.com>]
Sent: Monday, January 11, 2016 9:27 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

All,

I started to define whitespace handling rules in the spirit of the current TTFMS rules [1].

Please find a first draft here [2].

I think we could add advise for users
* to use preserve.LexicalValue if encoding fails
* to use xml:space="preserve" if canonicalization is
  expected to preserve as much whitespaces as possible

Do you have any comment and/or feedback.

Thanks,

-- Daniel

[1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html<&smime=14.3.123.2https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html>
[2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling<&smime=14.3.123.2https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling>





________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. Dezember 2015 03:51
An: public-exi@w3.org
Betreff: Whitespace preservation mode

Hi,

When there is a type associated with an element, content type information
gives you an idea as to what to do with whitespaces during encoding.

However, in schema-less situations, the best you can do is to guess what
is expected to do, unless xml:space is specified. I am not very sure if
this heuristics is always correct.

I think we may need to provide a canonicalization mode where canonicalization
is expected to preserve as much whitespaces as possible.

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America

Received on Thursday, 3 March 2016 09:34:17 UTC