"RE: Support of IEEE float; Canonical XML"

Thank you all very much for posting the summary of the difficult deliberations surrounding IEEE floating point. It makes it much easier for those of us who feel it deserves more public consideration to directly address all the proper issues instead of just whining in the dark. (Also thank you for the response to my comment on number represenation, given the upcoming XML Schema feature you pointed out that fully covers this case I see no reason to pursue it further, but that response also highlights the problem I wish to address in this message.)

In the tension between EXI being XML enough versus binary enough, a most important decision was to keep EXI compatible with Canonical XML so that the existing XML security mechanisms would work. This is an extreme constraint that I feel will ultimately cripple the intended extensibility of EXI to the extent that EXI will fail to reach anything like its real potential. It also creates an environment in which non-conforming implementations are likely. But more important is that it is poor practice.

In security, good practice is to "compress first, then encrypt". Documents should be hashed, signed or encrypted in their most compact form, and the form in which they are published (except for encryption.)

With XML, different forms are allowed mostly for readability. An infoset can be represented in many forms, some more readable than others. Canonical XML defines a single representation for an infoset that is quite compact (and not very readable) so that its possible to generate a repeatable cryptographic hash for signing. Good security practice would be to transform an infoset into Canonical XML, sign it, and publish or transmit it in that form. If its desirable to make it more readable down the line, it can be done without losing the ability to convert it back to canonical form to check the signature. But publishing the document in canonical form is the best way to help recipients check the signature.

This good practice is not possible with EXI as its now defined. A signed document cannot be published in EXI in the same form as was hashed. A signature cannot be checked without (temporarily) expanding the document to many times its original size, using conversions to characters that may be meaningless to the data carried within.

There is a good alternative path for EXI. Its perhaps more difficult to define but should be much easier to implement reliably. If you step back from Canonical XML and look at how it is used in the XML Security framework, I think you will find that EXI has all the necessary features, such as fragment support. The challenge then is to find the EXI replacement for Canonical XML.

Since EXI does not have any need for readability, it is not necessary to allow multiple forms except to accommodate generation options and make use of available schema. It is tempting, given an XML infoset, a set of EXI options and user-defined data types, and the set of schema referenced by the infoset, to declare that exactly one valid EXI representation is possible. That bit stream can be signed, and as long as the options etc. are retained the infoset can be used to regenerate the same bit stream for signature verification.

There is a confounding issue in that the formal XML Infoset defines all values in terms of characters, so that it rather than Canonical XML is the real problem for those of us who would wish EXI to be more binary. This problem must also be addressed for any future binary API for XML. But if EXI steps back from Canonical XML to the XML Infoset, then the resolution of this issue for any binary API or whatever could apply immediately to EXI. Otherwise EXI will be left in the dust.

In any case, its vital to align EXI better with good practice in security. The current situation amounts to paying lip service to compatibility with XML Security and will certainly hurt EXI in the long run.

Paul Pierce

Received on Tuesday, 12 May 2009 23:37:24 UTC