Comments: Canonical EXI -- Last Call Working Draft

Dear EXI Friends and Colleagues,

Thank you for the opportunity to review the Last Call Working Draft of the Canonical EXI specification dated 21 May 2015. It is rewarding to see the work we started together long ago nearing completion. We’ve completed a comprehensive review of the specification and have provided our comments below. We have also implemented canonical EXI in selected Efficient XML products, deployed it with a set of users and incorporated their feedback and experience into our comments below. 

Our comments are enumerated to facilitate discussion. I hope they are helpful and will contribute to the creation of a high-quality standard that will address critical needs in the EXI and XML Security domains. 

Please let me know if I you have any questions or if I can help to clarify any of the comments. 

 All the best!,

 John

AgileDelta, Inc.
john.schneider@agiledelta.com
http://www.agiledelta.com

——— Specific Comments ——— 

1. Architecture & Design: The specification defines canonical EXI with respect to an input EXI stream. This limits one’s ability to use canonical EXI with traditional XML or other XML Infoset representations and creates a poor architectural fit with the rest of the XML stack of technologies that are defined with respect to the XML Infoset. The strict dependency on an EXI input stream, the EXI options document and the EXI schemaId creates intrinsic incompatibilities with XML, which does not support these EXI-specific artifacts. This leads to practical implementation problems, such as the inability for canonical EXI to support digital signatures through XML intermediary nodes, which you identified at the end of section A.1.  

To be useful in all XML contexts and with all XML technologies, EXI canonicalization must be defined with respect to the XML Infoset. We recommend you update the specification to define canonical EXI with respect to a given XML Infoset, a given XML Schema and a given set of EXI options. The schema and EXI options may be provided any number of ways, as you describe well in section C.2. As with EXI, the user should be allowed to embed these in the EXI header when it is advantageous, but should not be required to do so when it is not. Mandating the inclusion of the EXI options and a schemaID in every message is at odds with EXI’s efficiency objectives and makes it onerous to use canonical EXI as a transmission format. As you point out in section C.1., using canonical EXI as a transmission format can eliminate the need to perform [redundant] canonicalization at the receiver — further increasing efficiency. We have users that currently employ canonical EXI this way and it is very advantageous to them. However, requiring the EXI options and schemaId in every message would quickly overwhelm the benefits of using canonical EXI as a transmission format.

2. Section 1, last sentence: Change “… whether two documents are identical …” to “… whether two documents are equivalent …”

3. Section 1.2: We agree EXI canonicalization is important for EXI environments that cannot afford to revert to traditional XML canonicalization methods. In addition, we recommend you mention some of the ways EXI canonicalization is useful for traditional XML users. For example, EXI canonicalization provides the first type-aware canonicalization scheme that can discern that +1, 1, 1.0, 1e0 and 1E0 are equivalent representations of the same floating-point value. This allows intermediaries to use binding-models and/or type-aware processing without breaking signatures. In addition, with a fast EXI processor, EXI canonicalization can be much faster than traditional XML canonicalization and can help cure some of the well-known XML security bottlenecks.

4. Section 3: As mentioned above, making the EXI options document and the EXI schemaId mandatory in every canonical EXI document is at odds with the efficiency objectives of EXI. In many or perhaps even most use cases that require efficiency, these can be (and are) provided out of band or specified by a higher-level protocol. As such, including them in every canonical EXI message introduces unnecessary overhead and provides no value since all cooperating nodes already have this information. 

Furthermore, forcing the inclusion of a schemaId in every message does not actually solve the problem of ensuring the sender and receiver use the same schemas. The EXI schemaId is not guaranteed to be unique and would be easy for a sender and receiver to end up using the same schemaId for two different versions of the same schema or even two completely different schemas (breaking any signature that depends on schemaId).  There are more reliable ways to ensure senders and receivers are using the same schemas for encoding/decoding EXI documents. This problem is not unique to EXI canonicalization and the EXI canonicalization specification should not force a specific, sub-optimal solution on EXI users. As with EXI, users should be allowed to use the EXI options document and schemaId to address this issue, but they should not be forced to do so if they have a better, more efficient solution that is already working. 

5. Section 4: As stated above, to be useful in all XML contexts and with all XML technologies, EXI canonicalization must be defined with respect to a given XML Infoset rather than a given EXI stream. The semantics of the specification should be specified with respect to a given XML Infoset, a given XML Schema and a given set of EXI options (independent on how these are acquired). 

6. Section 4.2.1: Change “Prune productions” to “Select productions” in heading. Pruning productions will remove them from the grammars, changing the event codes of the following events and causing incompatibility with the EXI 1.0 specification. I expect the specification intends to specify which productions must be selected rather than removing productions from the grammars. 

7. Section 4.2.2: Change “Prune productions” to “Select productions” in heading. The word “prune” should also be replaced in the body of this section. See above rationale.

8. Section 4.2.2: The meaning of this section is not entirely clear. Presumably, it is not possible with the current EXI specification to use a production that is not capable of representing the content value (by definition). Are there circumstances that this section is attempting to prohibit that are currently allowed by the EXI 1.0 specification?

9. Section 4.2.3: Change heading “Use the event with the most accurate event” to “Use the event that matches most precisely” or something similar. Current wording is unclear. 

10. Section 4.4: The last sentences of this section indicates that Canonical EXI processors SHOULD be able to convert an untyped value to each datatype representation defined in EXI 1.0. This special language would not be required if EXI canonicalization were defined more generally with respect to the XML Infoset rather than an input EXI stream. 

11. Section 4.4.1: The last sentence of this section specifies that all canonical EXI processors MUST support arbitrarily large integer values. This means there will be some canonical EXI documents that devices without support for arbitrarily large integers cannot process. Recommend you consider updating this definition so it is possible to generate a canonical representation for any EXI document that any device that meets the minimum EXI processing requirements can handle. In particular, recommend you consider changing this definition such that canonical EXI processors MUST represent all Unsigned Integer values using the Unsigned Integer datatype representation when strict is true. However, when strict is false canonical EXI processors must represent Unsigned Integer values greater than 2147483647 using the String datatype representation. This would enable devices with limited capabilities to at least read, display and retransmit arbitrarily large values — even if they don’t have the capability to process them. 

12. Section 4.4.5: This section states that EXI Date-Time values MUST be canonicalized according to the XML Schema dateTime canonical representation. While this definition might be convenient, it is not entirely appropriate for canonicalization and will lead to surprising results for some. The canonical form for XML Schema dateTime values is defined to make it easy to determine whether two Date-Time values refer to the same instant, regardless of the timezone used. However, for many applications, the Date-Time timezone is an important piece of information that should be preserved. As such, it will be surprising if the digital signature is not able to detect changes to this information. In addition, those using canonical EXI as a transmission format will be surprised if the canonical EXI format loses all their timezone information and changes all Date-Time values to GMT. Recommend this section be updated to exclude canonicalization of timezones in Date-Time values. 

13. Section 4.4.6: The W3C is standardizing on Unicode Normalization Form C and recommending all web data be stored and transmitted in this form. It may be useful to state this and reference the relevant W3C specification here: http://www.w3.org/TR/charmod-norm/ <http://www.w3.org/TR/charmod-norm/>. 

14. Section 4.4.6: The last sentence in the second paragraph states that EXI processors must first try to represent the string value as a local hit and when this is not successful as a global hit. It might be useful to clarify that one of the reasons the attempt to represent the string value as a local hit may fail is because the string has already been used as a local hit previously. EXI supports only one local table hit per value.

15. Section A.1: The second paragraph states that Canonical EXI deals with EXI documents. As alluded in the third paragraph of this section, this is not strictly true. Canonical EXI should be usable with and provide benefits to XML, EXI or any other XML Infoset representation. However, as stated earlier in these comments, canonical EXI must be defined with respect to the XML Infoset rather than an EXI input document to achieve this. Defining EXI canonicalization with respect to only EXI is limiting and fails to realize the full potential of the technology. 

The last sentence in this section also states that it is not possible to use XML on intermediary nodes when Canonical EXI has been used for signing. This is a limitation of the current specification and not of canonical EXI in general. If you define canonical EXI with regard to a given XML Infoset, XML Schema and given set of EXI options and ensure all EXI nodes use the same XML Schema and EXI options, this limitation goes away. As stated earlier, there are more reliable and efficient ways to ensure cooperating nodes use the same XML Schemas and EXI options than including the EXI options document and schemaId in every message. And these methods do not fail when transcoding to XML because they do not depend on the XML/EXI message for the schema and EXI options. The reason the current specification fails in this regard is because it depends strictly on the EXI document to carry the options and schemaId and transcoding to XML loses this information. As discussed earlier, this is a design flaw that should be fixed.

16. Section C.2: It is interesting and encouraging to see a good description of best practices for sharing EXI options without the EXI options document. This is the flexibility the specification should allow rather than mandating that the EXI options and schemaId be specified inside every canonical EXI stream. 

Received on Thursday, 16 July 2015 04:40:20 UTC