RE: EXI in XMPP: schema negotiations from Peter Waher on 2013-04-18 (public-exi@w3.org from April 2013)

From: Peter Waher <Peter.Waher@clayster.com>
Date: Thu, 18 Apr 2013 01:00:28 +0000
To: Rumen Kyusakov <rumen.kyusakov@ltu.se>
CC: "standards@xmpp.org" <standards@xmpp.org>, "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <1693EFE1FD641C42A0D542FCBC732DE698E763F7@EX3.YODA.UTOPIA.LOCAL>
Hello Rumen.

Sorry for the late reply. Thanks for the input. I'll respond to each item in turn.

> Hello Peter, all,
>
> Thank you for your comments on the EXI options exchange and adding the EXI compressed schema use case in the specification!

Thank you for taking your time to read it and provide input.

> I think the text in section §2.5 captures the case and can be easily extended to incorporate new standard EXI schema formats when and if such become available.
> Please consider the following comments:
> - Predefining the EXI options to be used is very good approach as it simplifies the process and outweighs some small performance benefits that can be gained by for e.g. allowing communicating of the schema in EXI schema-mode.
> - I agree with the choice of the default values of the EXI options except for preserve - most schemas use qualified names in attribute values (e.g. ref and type attribute when specifying a type definition). According to the EXI specification (section §6.3): "When qualified names are used in the values of AT or CH events in an EXI Stream, the Preserve.prefixes fidelity option SHOULD be turned on to enable the preservation of the NS prefix declarations used by these values". So Preserve.prefixes should be on or a large class of schemas will not be able to be presented by the EXI compressed schema mechanism.

Ok. Now I understand the comment made by Daniel Peintner. I'll change this right away. It will be changed in the next revision coming soon.

> - In my opinion the use of ExiBody and ExiDocument as a separate options of the compressed schema negotiations is unnecessary. When the EXI options are communicated in an out-of-band mechanism, which is the case of predefined options, the Options Presence Bit in the header can be set to 0 and the EXI Options omitted. Thus the difference in size of ExiBody and ExiDocument is 1 byte when EXI Cookie is excluded and 5 bytes otherwise. My suggestion is to merge the ExiBody and ExiDocument into a single option "EXI" and explicitly specifying that the options should not be included in the header as they are predefined. 

The thought here was not to save space, but to be able to use a standard file format for EXI-compressed schema files without having to modify them. However, I've not come across any such generally accepted format, but seen examples of libraries having EXI-compressed schema files in proprietary formats.  There will be time to think about this bit during the experimentation phase. ExiDocument might be changed to the specific file format instead if a generally accepted format is not created.

> I am also in favor of implementing the same approach for the transmission of EXI-compressed stanzas. Instead of presenting them as a sequence of EXI bodies it is EXI standard compliant to transmit them as standard EXI documents with header and out-of-band options without including the optional EXI Cookie. This will make each stanza fully compliant EXI document with 1 byte header.

There's a layer between the EXI engine and that what is transmitted and received on the transport layer. If the implementation chooses to add/remove a leading byte before/after decompression/compression using an EXI-engine is an implementation specific issue. I see no reason to constantly add an unnecessary byte in communication packets. It might seem tiny, but in WSN cases 1 byte extra for every packet is not insignificant.

> - I think the information on how the client and the server use the negotiated schema information is not enough and allows for ambiguous behavior by the implementations. The issue is that it is not clear which of the negotiated schema is the main one for a particular stanza - all others should be linked through the main one with <xs:import> statement. This is a standard way of defining constrains and validation checks for XML and is not unique to EXI - each XML can be validated against one XML schema at a time. This XML schema can have multiple namespaces added through <xs:import> and be spread in multiple files but there is a single root XML schema document from which all these definitions are linked. The unique identification of the root schema for each stanza is needed to correctly build the EXI body document grammar (http://www.w3.org/TR/2011/REC-exi-20110310/#informedDocGrammars) for which a list of all global definitions are needed. It might be useful to consider using the schemaID header option + the schema md5 hash for each stanza if other mechanism is not better suited.

This item will be expanded in the next revision that we're currently working on. First, in XMPP, the root element defines the schema to use by the namespace classification and local name. So, if namespaces are uniquely defined, a sequence of namespaces is sufficient to define compression and decompression. And as described previously, schema ID is not a good identity, since everything is identified already using namespace. Introducing an arbitrary third ID parameter (apart from namespace and location) makes no sense since it does not provide additional information.

What needs to be expanded however, in an implementation note, that shows how the XMPP server creates the so called main schema in a canonical fashion from the list of schemas defined during negotiation. Such a main schema would include a sequence of xs:import elements importing the schemas. This internally generated schema can then be sent to the EXI engine.

Automatic generation of such main schemas in a well-defined and canonical manner is desired to avoid need to publish it, as well as avoid problems from badly created main schemas. Also, it can adapt itself to partial results during schema negotiation.

> -- 
> Best Regards,
> Rumen Kyusakov
> PhD student 
> EISLAB, Luleå University of Technology

Thanks for your time and good input,
Sincerely,
Peter Waher
Received on Thursday, 18 April 2013 01:00:44 UTC