xsi:type feature simplification from FABLET Youenn on 2010-01-29 (public-exi-comments@w3.org from January 2010)

From: FABLET Youenn <Youenn.Fablet@crf.canon.fr>
Date: Fri, 29 Jan 2010 15:19:51 +0100
To: "public-exi-comments@w3.org" <public-exi-comments@w3.org>
Message-ID: <C1797CB6A125334AB23C5A0A160944AD3615666F69@cressida.crf.canon.fr>

Dear all,

This is a mail about the @xsi:type feature in the EXI specification.
These thoughts are based on @xsi:type implementation feedback, which is what the CR period is all about.

Currently, @xsi:type is used as a dynamic typing mechanism that can improve compression at the EXI level.
This is a good idea. I would also point that other dynamic typing mechanism, e.g. typed APIs, could also be very useful for compressing XML content.

To illustrate the issue we have with the current @xsi:type behavior, I will take the XML signature schema as an example.
Currently we have the following XML signature definition:
<simpleType name="CryptoBinary">
<restriction base="base64Binary"/>
</simpleType>
<element name="DSAKeyValue" type="ds:DSAKeyValueType"/>
<complexType name="DSAKeyValueType">
<sequence>
<sequence minOccurs="0">
<element name="P" type="ds:CryptoBinary"/>
<element name="Q" type="ds:CryptoBinary"/>
</sequence>
...
</sequence>
</complexType>
Basically this means that elements P, Q and some others have base64Binary content (through CryptoBinary simple type definition).
For some applications, it will be especially important that these element contents be encoded using the base64Binary EXI encoding, since this is the bulk of the XML data.
An existing application that wants to ensure that these elements are correctly compressed using the base64Binary EXI encoding have some options:

1) Verify the EXI setup

a. Ensure that schema mode is in action and that at least the aforementioned elements (P, Q...) have a schema-informed grammar associated to them

i. This fine-grained check is not practical, it may be typically: does the EXI processor have the full XML signature schema or not?

b. But the application may not have the choice and/or the knowledge of the schema

2) Try to put @xsi:type within the produced XML documents so that EXI encoders will always encode data using the base64 codec

a. The application can set @xsi:type to xs:base64Binary
<DSAKeyValue>
<P xsi:type="xs:base64Binary">15188048...</P>
...
</DSAKeyValue>
This is actually working well at the EXI level since base64Binary is available with all EXI processors.
Unfortunately, this document is not valid since CryptoBinary type is deriving from base64Binary and not the reverse.
This may cause issues within applications. This simple solution is therefore unavailable :(

b. The only valid solution is the following:
<DSAKeyValue>
<P xsi:type="dsig:CryptoBinary">15188048...</P>
...
</DSAKeyValue>
But this requires the use/sharing of the CryptoBinary grammar and we are back to case 1
Even worse , the xsi:type may be useless in these cases:

- full schema is in use: this will be already encoded efficiently without @xsi:type

- no schema is in use: no grammar is retrieved from @xsi:type.
Note that this issue would not happen if we could modify the schema.
The definition of CryptoBinary has its own benefits (editorial, DTR usefulness, reuse, semantics...) and at the end, it is in the EXI technology flexibility that we should count on.

>From this example, the current link between @xsi:type and the grammar selection seems too tight.
One possibility would be to update the @xsi:type production behavior to be slightly more generic:

- This updated @xsi:type production enables the grammar selection

o Using the existing QName mechanism defined by the specification

- This updated @xsi:type production does not automatically carry any infoset implication

o Somehow similar to SC productions which do not have any infoset implication

- This updated @xsi:type production may modify the infoset

o for instance through the encoding a boolean to state whether @xsi:type is included or not in the infoset

This modification has several benefits:

- This is a very small modification so current EXI implementations and current spec would be upgradable very easily

o Added complexity on EXI decoders is minimal and added complexity on EXI encoders can also be minimal

- Dynamic typing becomes even more usable

o The previous issue is solved

o EXI encoders could actually use value typing information given by XML typed APIs

o EXI encoders can have various ways to select the best compression-wise grammar

§ EXI decoders will just follow the decision

- This gives additional compression and efficiency improvements for some common cases
I do not see any real drawback in terms of complexity or compression for general use cases since a boolean does not cost a lot compared to the actual encoding cost of a @xsi:type.
In addition, applications that want to achieve the best compression in their environments should probably, as a first step, remove any non-necessary @xsi:type.

Regards,
Youenn

Received on Friday, 29 January 2010 14:20:27 UTC