RE: [LC-2367] General EXI attributes handling

Hi Youenn,

The occurrence order of xsi:type and xsi:nil attributes in schema-informed EXI
streams (but not in schema-less EXI streams) relative to other attributes is
inherent and significant in the design of EXI. The semantics available in EXI
associated with occurrences of these two attributes in schema-informed streams
are consistent with what is given definition to them in the XML Schema
specification, gives the EXI grammar system the ability to switch the grammar
in use by default to the one that is more likely better suited as signalled by
those attributes, and helps to achieve further improved compactness of EXI
stream size. In fact, the cost of not having such semantics in schema-informed
EXI streams will not be cheap. Not only would it incur the loss of the
otherwise achievable improved compactness which can make EXI less competitive
in some applications, but also would cause many schema-valid instances using
xsi:type or xsi:nil to fail to be encoded in strict schema-informed mode.

When you are transcribing XML (such as SAX events) into EXI, what that generally
(with exceptions described in the next paragraph) means is that xsi:type and
xsi:nil need to be identified in the attributes list and be processed before
others in schema-informed EXI streams. One way to mitigate the cost associated
with this is to do so only when there is a namespace declaration available for
the namespace " <http://www.w3.org/2001/XMLSchema-instance> http://www.w3.org/2001/XMLSchema-instance" because in no other
circumstances it is possible for those attributes to legitimately appear in XML.
When you are starting with a data structure in stead that gets directly
serialized into EXI, bypassing XML representation or XML API (such as SAX),
you generally should have full control over the order in which those attributes
are processed, and it only makes sense to enunciate the actual type with xsi:type
and xsi:nil upfront before other attributes are processed.

In schema-less streams, xsi:type and xsi:nil do not effect the grammars. The
working group is currently discussing whether it might be appropriate to relax
the ordering constraint in these circumstances and possibly other circumstances
where xsi:type and xsi:nil have no impact on the grammars.

Hope this helps,

-taki


  _____

From: public-exi-comments-request@w3.org [mailto:public-exi-comments-request@w3.org] On Behalf Of FABLET Youenn
Sent: Friday, January 29, 2010 6:58 AM
To: public-exi-comments@w3.org
Subject: [LC-2367] General EXI attributes handling



Dear all,



Based on internal feedback, I would like to make the following observations on the current EXI specification.
Note that this is not a request for change of the EXI specification.

I think however that this may be of some use/interest for the community.



Currently, all attributes of an XML element are stored by EXI encoders before being actually written.

This is a different behavior from text XML writers that can write attributes as soon as applications provide them.

In some environments, this attribute storage behavior has a real processing cost that does not appear with traditional StAX-like
text XML writers.



There are two main technical reasons for storing attributes:

-          In schema mode, it is better to give specific attributes order so as to get good compression

-          @xsi:type and/or @xsi:nil must appear first in schema and schemaless modes

The first reason is a strong reason. I also note that EXI enables some flexibility in the attribute ordering if that better suits
the application needs.  This seems very reasonable as this added flexibility does not impact performances nor interoperability.



The second reason seems weaker since, at least in our scenarios, @xsi:type/@xsi:nil do not appear very often anyway.

It would have been good to have the flexibility to put @xsi:type and @xsi:nil in the order desired by applications.

Of course, this would need changing the way these attributes actually impact on the EXI grammars.

I did not do the full exercise, but I am confident that there are some reasonably simple workarounds that would get us back to a
similar functionality level anyway.



The advantages would have been:

-          No more special attribute behavior handling at the codec runtimes level

o   General spec simplification

o   Smaller and potentially faster EXI codec runtimes

-          Performances improvement by enabling streamed encoding of attributes

o   At least in the case of  built-in grammars but also in schema-deviation mode

o   More consistent with some text XML writer behavior



Regards,

                Youenn

Received on Wednesday, 23 June 2010 21:40:58 UTC