- From: <dee3@us.ibm.com>
- Date: Fri, 21 May 1999 16:50:46 -0400
- To: www-xml-schema-comments@w3.org, "XML-DSig Workshop" <w3c-xml-sig-ws@w3.org>
Having a canonical form of an entity is very important for comparison and digital signature purposes. XML is sufficiently rich that canonicalization needs to be considered at several levels. For example, the character set used in two XML documents needs to be converted to a standard if they are to be usefully compared for many purposes. There are also canonicalization considerations related to white space, namespace prefixes, etc, which are being considered by the XML Syntax WG. Similarly, I believe that canonicalization of datatype representation must be considered and the schema WG seems like the place to do it. I think the need for datatype's to have a designated canonical lexical form should be fairly clear for comparison purposes. It relieves the comparitor from the burden of having to be able to parse every form of every datatype and covert it to a canonical form the comparitor has selected. The need may not be as immediately obvious in the digital signature arena, depending on your mental picture of the "typical" digital signature application. If you picture is very document/object oriented, you might wonder what all the fuss is about since any lump of bits can be signed and, if faithfully transmitted, this signature can be verified later on the same lump of bits. On the other hand, if you have a transactional/protocol point of view, where pieces of messages are being signed, data is processed and forwarded by intermediate parties, and the signature verified by later recipients, etc., canonicalization is essential. I have been involved with too many systems where people thought that all they were doing was verifying signatures on unchanged data being sent through multi-party but faithful transmission channels only to find that there was some circumstance where a signed object had to be partly or fully re-constituted or some transmission channel was not as faithful as they thought. As a result, some incredibly stupid thing like capitalization, padding, line ending character sequences, etc., etc., at least temporarily derailed their entire effort as, on a crash basis, they designed and painfully retrofitted canonicalization into their system. Also witness the diddly little lack of canonicalization in the original ASN.1 time and date format: As soon as there was substantial real world use of this, a new, almost identical, fundamental data type, had to be added to ASN.1, with significant disruption and confusion, just to squeeze out the last case of alternative representations of the same date and time. There is no problem with the Schema Datatypes document providing multiple lexical representations as long as exactly one form is designated as the canonical form. I believe that the XML Schema Datatypes document should be changed to do this and perhaps this should be added to the XML Schema requirements document. Thanks, Donald Donald E. Eastlake, 3rd 17 Skyline Drive, Hawthorne, NY 10532 USA dee3@us.ibm.com tel: 1-914-784-7913, fax: 1-914-784-3833 home: 65 Shindegan Hill Road, RR#1, Carmel, NY 10512 USA dee3@torque.pothole.com tel: 1-914-276-2668
Received on Friday, 21 May 1999 17:10:19 UTC