RE: type in strict mode from Taki Kamiya on 2010-03-31 (public-exi-comments@w3.org from March 2010)

From: Taki Kamiya <tkamiya@us.fujitsu.com>
Date: Wed, 31 Mar 2010 00:21:24 -0700
To: "'FABLET Youenn'" <Youenn.Fablet@crf.canon.fr>, <public-exi-comments@w3.org>
Message-ID: <A62E0F8E9D9243869063A2F5AFCE83A7@homunculus>
Hi Youenn,

The strict schema-informed mode has been formulated to be competitive with some
existing highly optimized formats. Indeed, it should not be an overstatement
to say that this is the primary purpose of the strict schema-informed mode --
to be competitive in situations where every bit counts.

Please also note that such occurrences of xsi:type are recoverable at
receiver's end. Since a decoder knows which elements in an EXI stream use their
natural global types, some decoder implementations may want to decorate such
elements with xsi:type information when it serves applications accessing the
data.

Hope this helps,

-taki


  _____

From: public-exi-comments-request@w3.org [mailto:public-exi-comments-request@w3.org] On Behalf Of FABLET Youenn
Sent: Friday, January 29, 2010 6:14 AM
To: public-exi-comments@w3.org
Subject: xsi:type in strict mode



Dear all,



This is feedback related to the EXI specification.



Currently, according http://www.w3.org/TR/exi/#addingProductionsStrict, a @xsi:type production is added only when named subtypes are
known to the EXI processor.
The intention, AIUI, is that as few @xsi:type productions as possible are actually added to grammars so as to get some compression
gain.



I see a  practical drawback with this current approach.

Some XML documents, valid according the XML schema components used to generate the grammars, will not be encodable by EXI encoders.

Given the following schema and instance:

<xs:schema>

                < xs:element name="test" type="xs:base64Binary"/>

</ xs:schema>

<test xsi:type="xs:base64Binary" ./>

The instance is valid as per the schema but, according our current interpretation, is not encodable in strict mode.

If that is not correct, could you clarify the specification?

If that is correct, this is clearly not practical since one of the strict mode design goal was to encode at least all XML schema
valid documents.

This case is not happening often currently. However, it may happen that applications put more often @xsi:type information to ensure
value typing, even in schemaless mode.



The additional issue is that, in some environments, it may be tempting to use a global schema at the application level and a subset
for the EXI transmission (to fit the lightest devices).

In those cases, this issue may cause trouble since perfectly application-schema-valid documents will not be encodable in strict
mode, depending of course on the exact EXI schema subset. It could be expected that documents valid according a particular schema
could be encoded in strict mode with a subset of the particular schema.

Also, adding new grammars to a given grammar set would require potential modifications of the grammars themselves.



Getting back to the actual compression gain of the current approach, the benefit is not high.

At max, it gains 1 bit per element (in bit-packed mode only, no difference for the compression mode).

In practice, I even doubt that the compression gain is that high:

@xsi:type production is often added to simple typed elements ("xs:string" typed elements e.g.)

                @xsi:type production has minor impact to the code length as soon as element content has optional attribute/child
items



At the very least, the rule could be changed so that a @xsi:type production is added for all elements whose content is defined by a
global type definition.

This seems more inline with the XML Schema specification. In addition, schema writers that want to squeeze as much bits as possible
could inline their schema type definitions.



Regards,

                youenn
Received on Wednesday, 31 March 2010 07:22:11 UTC