W3C home > Mailing lists > Public > public-exi@w3.org > March 2016

RE: Whitespace preservation mode

From: Takuki Kamiya <tkamiya@us.fujitsu.com>
Date: Tue, 1 Mar 2016 13:27:49 -0800
To: "Peintner, Daniel (ext)" <daniel.peintner.ext@siemens.com>, "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <23204FACB677D84EBD57175AB7B5A71C03749CE27406@FMSAMAIL.fmsa.local>
Hi,

Assume we are encoding schema-informed EXI with the following settings.

- Non-strict mode.
- xml:space="preserve" is in effect.

When the associated type is xsd:int, the followings are examples
of valid instances.

Int-1. <A>123</A>
Int-2. <A>⬚123⬚⬚</A>

CH [typed] can be used for Int-1, while you have to use CH [untyped] for Int-2.

What distinguishes the two cases?

One can say Int-2 has whitespaces surrounding (i.e. leading and trailing) the number.

Let's next take a look at another example using a list datatype.

List-1. "A⬚B⬚C"
List-2. "⬚A⬚B⬚C⬚"
List-3. "A⬚⬚B⬚⬚C"

CH [typed] can be used for List-1.
One has to use CH [untyped] for List-2 because it has surrounding whitespaces.

But what about List-3?
It does not have surrounding whitespaces. It contains collapsible whitespaces
between list items. In order to preserve those whitespaces, you also need to
use CH [untyped] for List-3.

Then the criteria should now be rephrased as "having any collapsible whitespaces".

Am I thinking correct?

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Takuki Kamiya [mailto:tkamiya@us.fujitsu.com] 
Sent: Monday, February 29, 2016 3:03 PM
To: Peintner, Daniel (ext); public-exi@w3.org
Subject: RE: Whitespace preservation mode

Hi Daniel,

Let's use as an example the following XML snippets, and assume in both cases 
the value is typed as xsd:int.

1. <A>  123   </A>
2. <A>123</A>

In case #1, the data "123" is surrounded by whitespaces.

When xml:space="preserve" is in effect, and the EXI grammar in use
is *not* strict, the case #1 will be encoded using CH [untyped] production.

On the other hand, case #2 will be encoded using CH [typed] production
because it does not contain whitespaces around the number.

When EXI grammar in use *is* strict, then the encoding #1 will fail as
you mentioned in the document. 

Do you share the same understanding?

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America


-----Original Message-----
From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com] 
Sent: Monday, January 11, 2016 9:27 AM
To: Takuki Kamiya; public-exi@w3.org
Subject: AW: Whitespace preservation mode

All,

I started to define whitespace handling rules in the spirit of the current TTFMS rules [1].

Please find a first draft here [2].

I think we could add advise for users
* to use preserve.LexicalValue if encoding fails
* to use xml:space="preserve" if canonicalization is
  expected to preserve as much whitespaces as possible

Do you have any comment and/or feedback.

Thanks,

-- Daniel

[1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html

[2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling






________________________________
Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
Gesendet: Dienstag, 1. Dezember 2015 03:51
An: public-exi@w3.org
Betreff: Whitespace preservation mode

Hi,

When there is a type associated with an element, content type information
gives you an idea as to what to do with whitespaces during encoding.

However, in schema-less situations, the best you can do is to guess what
is expected to do, unless xml:space is specified. I am not very sure if
this heuristics is always correct.

I think we may need to provide a canonicalization mode where canonicalization
is expected to preserve as much whitespaces as possible.

Thank you,

Takuki Kamiya
Fujitsu Laboratories of America



Received on Tuesday, 1 March 2016 21:28:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 21:28:35 UTC