RE: [LC-2363] xsi:type feature simplification

Hi Youenn,
 
There are certain decisions that were made in the XBC WG, which guided the 
formulation of the EXI format. Among the many format properties that were 
suggested by XBC WG members, certain number of properties did not survive 
into the set of format properties required directly of the altertive format 
representing the XML infoset, if such a format would ever going to be 
starndardized by W3C. The EXI WG later was launched with a charter to take 
on the task of formulating such a format.
 
A property called "Explicit Typing" was one of such removed requirements.
Talking specifically about this property, it was not only deemed unnecessary,
but also considered not suitable as an intrinsic feature of the format because
the XML Schema already provides such a mechanism via xsi:type attribute
semantics and having a parallel mechanism in EXI would raise issues such as
how the two mechanisms interacts each other, or the concern about its 
affinity to the XML Stack (i.e. the XML Stack not only is unable to tap into it, 
but also would lose it when XML Stack writes it back into EXI). This is why 
EXI does not support a native mechanism to indicate the representation of 
values in-stream.
 
Therefore, we strongly suggest to further pursue your option 1 (i.e. "verify 
the EXI setup") by facilitating the sharing of schema information in a 
manner that best suits your system architecture.
 
Schema information is most commonly shared either statically or dynamically 
by actually exchanging schemas, however, there can be other ways to achieve
the sharing of schema information. Sometimes, both the sender and the
receiver can pre-agree on the way to infer the schema information, an example
of which we describe below.
 
Let's say the sender knows the document being encoded uses certain 
user-defined simple types for which it wants to use specific datatype
representations. Let such types be "tA", "tB", "tC" and their desired
datatype representations be "exi:base64Binary", you can have a datatype
representation map in the header option as following.
 
<exi:datatypeRepresentationMap>
  <tA/><exi:base64Binary/>
  <tB/><exi:base64Binary/>
  <tC/><exi:base64Binary/>
</exi:datatypeRepresentationMap>
 
The sender and the receiver can both construct a schema snippet as shown
below from the above datatype representation map. 
 
<xsd:simpleType name="tA"/>
<xsd:simpleType name="tB"/>
<xsd:simpleType name="tC"/>
 
In order to generate the exact grammar corresponding to the above, you do 
not necessarily have to have the a grammar for each types. You just need
to have a single parameterized grammar as a common template that can be
used for all the simple types.
 
Since both the sender and the receiver can get to the same schema information
from the datatype representation map, xsi:type can be used to indicate the 
type (e.g. xsi:type="tA") that would cause the value of the element to be 
encoded according to the type specified in the datatype representation map
(e.g. exi:base64Binary). In essence, you can infer a partial schema from a
datatype representation map in the absence of the schema itself, and the
sender and the receiver just need to share the way to get to the same partial
schema information out of it. 
 
Please be reminded that this is just one possible way to achieve the sharing 
of schema information, hence is neither required by the specification nor is
a recommended practice to implement. It is shown here merely as an example 
in the hope that it will spur further inventions of the kind to best suit particular 
use cases.
 
Hope it helps,
 
-taki


  _____  

From: public-exi-comments-request@w3.org [mailto:public-exi-comments-request@w3.org] On Behalf Of FABLET Youenn
Sent: Friday, January 29, 2010 6:20 AM
To: public-exi-comments@w3.org
Subject: [LC-2363] xsi:type feature simplification



Dear all,

 

This is a mail about the @xsi:type feature in the EXI specification.
These thoughts are based on @xsi:type implementation feedback, which is what the CR period is all about.

 

Currently, @xsi:type is used as a dynamic typing mechanism that can improve compression at the EXI level.
This is a good idea. I would also point that other dynamic typing mechanism, e.g. typed APIs, could also be very useful for
compressing XML content.



To illustrate the issue we have with the current @xsi:type behavior, I will take the XML signature schema as an example.

Currently we have the following XML signature definition:

<simpleType name="CryptoBinary">

<restriction base="base64Binary"/>

</simpleType>

<element name="DSAKeyValue" type="ds:DSAKeyValueType"/>

<complexType name="DSAKeyValueType">

<sequence>

<sequence minOccurs="0">

<element name="P" type="ds:CryptoBinary"/>

<element name="Q" type="ds:CryptoBinary"/>

</sequence>

…

</sequence>

</complexType>

Basically this means that elements P, Q and some others  have base64Binary content (through CryptoBinary simple type definition).

For some applications, it will be especially important that these element contents be encoded using the base64Binary EXI encoding,
since this is the bulk of the XML data. 

An existing application that wants to ensure that these elements are correctly compressed using the base64Binary EXI encoding have
some options:

1)      Verify the EXI setup

a.       Ensure that schema mode is in action and that at least the aforementioned elements (P, Q…) have a schema-informed grammar
associated to them

                                                               i.      This fine-grained check is not practical, it may be
typically: does the EXI processor have the full XML signature schema or not?

b.      But the application may not have the choice and/or the knowledge of the schema

2)      Try to put @xsi:type within the produced XML documents so that EXI encoders will always encode data using the base64 codec

a.       The application can set @xsi:type to xs:base64Binary

<DSAKeyValue>

                                                                <P xsi:type=”xs:base64Binary">15188048…</P>

                                                                …

</DSAKeyValue>

This is actually working well at the EXI level since base64Binary is available with all EXI processors.

Unfortunately, this document is not valid since CryptoBinary type is deriving from base64Binary and not the reverse.

This may cause issues within applications. This simple solution is therefore unavailable L

b.      The only valid solution is the following:

<DSAKeyValue>

                <P xsi:type=”dsig:CryptoBinary">15188048…</P>

                …

</DSAKeyValue>

But this requires the use/sharing of the CryptoBinary grammar and we are back to case 1

Even worse , the xsi:type may be useless in these cases:

-          full schema is in use: this will be already encoded efficiently without @xsi:type

-          no schema is in use: no grammar is retrieved from @xsi:type.

Note that this issue would not happen if we could modify the schema.

The definition of CryptoBinary has its own benefits (editorial, DTR usefulness, reuse, semantics…) and at the end, it is in the EXI
technology flexibility that we should count on.

 

>From this example, the current link between @xsi:type and the grammar selection seems too tight.

One possibility would be to update the @xsi:type production behavior to be slightly more generic:

-          This updated @xsi:type production enables the grammar selection

o   Using the existing QName mechanism defined by the specification

-          This updated @xsi:type production does not automatically carry any infoset implication

o   Somehow similar to SC productions which do not have any infoset implication

-          This updated @xsi:type production may modify the infoset

o   for instance through the encoding a boolean to state whether @xsi:type is included or not in the infoset 

 

This modification has several benefits:

-          This is a very small modification so current EXI implementations and current spec would be upgradable very easily

o   Added complexity on EXI decoders is minimal and added complexity on EXI encoders can also be minimal

-          Dynamic typing becomes even more usable

o   The previous issue is solved

o   EXI encoders could actually use value typing information given by XML typed APIs

o   EXI encoders can have various ways to select the best compression-wise grammar

§  EXI decoders will just follow the decision

-          This gives additional compression and efficiency improvements for some common cases

I do not see any real drawback in terms of complexity or compression for general use cases since a boolean does not cost a lot
compared to the actual encoding cost of a @xsi:type.

In addition, applications that want to achieve the best compression in their environments should probably, as a first step, remove
any non-necessary @xsi:type.

 

Regards,

                Youenn

Received on Friday, 27 August 2010 22:16:39 UTC